r - Extracting unique partial elements from vector -
i need list of unique subject ids (the part before _ , after /) contents of folder below.
[1] "." "./4101_0" "./4101_0/4101 baseline" [4] "./4101_1" "./4101_2" "./4101_2_2" [7] "./4101_3" "./4101_4" "./4101_5" [10] "./4101_6"
right i'm doing (using packages stringr , foreach).
# create list of contents folder.list <- list.dirs() # split entries "/" subids <- str_split(folder.list, "/") # each entry in list, retrieve second element subids <- unlist(foreach(i=1:length(subids)) %do% subids[[i]][2]) # split entries "_" subids <- str_split(subids, "_") # take second element after splitting, unlist it, find unique entries, remove na , coerce numeric subids <- as.numeric(na.omit(unique(unlist(foreach(i=1:length(subids)) %do% subids[[i]][1]))))
this job seems unnecessarily horrible. what's cleaner way of getting point point b?
stringr
has str_extract
function, can used extract substrings match regex pattern. positive lookbehind /
, positive lookahead _
, can achieve aim.
beginning @andrie's x
:
str_extract(x, perl('(?<=/)\\d+(?=_)')) # [1] na "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"
the pattern above matches 1 or more numerals (i.e. \\d+
) preceded forward slash , followed underscore. wrapping pattern in perl()
required lookarounds.
Comments
Post a Comment