count unique values for each column per ID in R -
i have dataset quite big (140000 obs * 125 attributes). each ob associated id (which can unique or not). want count unique values each attribute (columns) per id.
i tried aggregate(. ~ id, mydata, function(x) length(unique(x))
. doesn't work. given size of data frame, feel works may take long it. knows better way it?
the dataset:
id attr1 attr2 attr3 attr125 1 x y 123 1 b z y 345 1 b x y 134 2 z y abc 2 c y y def 3 d y n xyz 4 b z y 789
the result want:
id attr1 attr2 attr3 attr125 1 2 2 1 3 2 2 2 1 2 3 1 1 1 1 4 1 1 1 1
i hesitated posting because similar @mgriebe's answer, different way use data.table
. find data.table
useful these operations (however, aggregate
call worked fine me):
# load data.table package require( data.table ) # first copy data.frame data.table dt <- data.table( mydata ) # count length of id unique id values each column using .sd operator of data.table dt[ , lapply( .sd , function(x) length(unique(x)) ) , by=id , .sdcols=2:5 ]` # id attr1 attr2 attr3 attr125 #1: 1 2 2 1 3 #2: 2 2 2 1 2 #3: 3 1 1 1 1 #4: 4 1 1 1 1
remember adjust .sdcols
column numbers attributes stored....
Comments
Post a Comment