R: Replace multiple values in multiple columns of dataframes with NA -
i trying achieve similar this question multiple values must replaced na, , in large dataset.
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = rep(1:9), var2 = rep(3:5, each = 3))
which generates dataframe:
df name foo var1 var2 1 1 1 3 2 2 2 3 3 3 3 3 4 b 4 4 4 5 b 5 5 4 6 b 6 6 4 7 c 7 7 5 8 c 8 8 5 9 c 9 9 5
i replace occurrences of, say, 3 , 4 na, in columns start "var".
i know can use combination of []
operators achieve result want:
df[,grep("^var[:alnum:]?",colnames(df))][ df[,grep("^var[:alnum:]?",colnames(df))] == 3 | df[,grep("^var[:alnum:]?",colnames(df))] == 4 ] <- na df name foo var1 var2 1 1 1 na 2 2 2 na 3 3 na na 4 b 4 na na 5 b 5 5 na 6 b 6 6 na 7 c 7 7 5 8 c 8 8 5 9 c 9 9 5
now questions following:
- is there way in efficient way, given actual dataset has 100.000 lines, , 400 out of 500 variables start "var". seems (subjectively) slow on computer when use double brackets technique.
- how approach problem if instead of 2 values (3 , 4) replaced na, had long list of, say, 100 various values? there way specify multiple values having clumsy series of conditions separated
|
operator?
you can using replace
:
sel <- grepl("var",names(df)) df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, na) ) df # name foo var1 var2 #1 1 1 na #2 2 2 na #3 3 na na #4 b 4 na na #5 b 5 5 na #6 b 6 6 na #7 c 7 7 5 #8 c 8 8 5 #9 c 9 9 5
some quick benchmarking using million row sample of data suggests quicker other answers.
Comments
Post a Comment