Dictionary style replace multiple items

map = setNames(c("0101", "0102", "0103"), c("AA", "AC", "AG"))
foo[] <- map[unlist(foo)]

assuming that map covers all the cases in foo. This would feel less like a 'hack' and be more efficient in both space and time if foo were a matrix (of character()), then

matrix(map[foo], nrow=nrow(foo), dimnames=dimnames(foo))

Both matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.


Here is a quick solution

dict = list(AA = '0101', AC = '0102', AG = '0103')
foo2 = foo
for (i in 1:3){foo2 <- replace(foo2, foo2 == names(dict[i]), dict[i])}

One of the most readable way to replace value in a string or a vector of string with a dictionary is stringr::str_replace_all, from the stringr package. Beware: this method is based on regex (see here). The pattern needed by str_replace_all can be a dictionnary, expressed as a list: c("regex" = "desired value").

# 1. Made your dictionnary
dictio_replace= c("AA"= "0101", 
                  "AC"= "0102",
                  "AG"= "0103") # short example of dictionnary.

 # 2. Replace all pattern, according to the dictionary-values (only a single vector of string, or a single string)
 foo$snp1 <- stringr::str_replace_all(string = foo$snp1,
                                      pattern= dictio_replace)  # we only use the 'pattern' option here: 'replacement' is useless since we provide a dictionnary.

Repeat step 2 with foo$snp2 & foo$snp3. If you have more vectors to transform it's a good idea to use another func', in order to replace values in each of the columns/vector in the dataframe without repeating yourself.


If you're open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you're looking for:

foo <- mapvalues(foo, from=c("AA", "AC", "AG"), to=c("0101", "0102", "0103"))

Note that it works for data types of all kinds, not just strings.