r - Extracting unique partial elements from vector -


i need list of unique subject ids (the part before _ , after /) contents of folder below.

[1] "."                      "./4101_0"               "./4101_0/4101 baseline" [4] "./4101_1"               "./4101_2"               "./4101_2_2"             [7] "./4101_3"               "./4101_4"               "./4101_5"               [10] "./4101_6"     

right i'm doing (using packages stringr , foreach).

# create list of contents folder.list <- list.dirs() # split entries "/" subids <- str_split(folder.list, "/") # each entry in list, retrieve second element subids <- unlist(foreach(i=1:length(subids)) %do% subids[[i]][2]) # split entries "_" subids <- str_split(subids, "_") # take second element after splitting, unlist it, find unique entries, remove na , coerce numeric subids <- as.numeric(na.omit(unique(unlist(foreach(i=1:length(subids)) %do% subids[[i]][1])))) 

this job seems unnecessarily horrible. what's cleaner way of getting point point b?

stringr has str_extract function, can used extract substrings match regex pattern. positive lookbehind / , positive lookahead _, can achieve aim.

beginning @andrie's x:

str_extract(x, perl('(?<=/)\\d+(?=_)'))  # [1] na     "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" 

the pattern above matches 1 or more numerals (i.e. \\d+) preceded forward slash , followed underscore. wrapping pattern in perl() required lookarounds.


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

android - Associate same looper with different threads -

visual studio 2010 - Connect to informix database windows form application -