Match substring of two vectors and create a new vector combining them

A possible solution:

a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))

df <- merge(stack(a), stack(b), by = 'ind')
paste0(substr(df$values.x, 1, 1), df$values.y)

which gives:

[1] "1234" "1238" "2234" "2238" "4325" "4326" "2342"

A second alternative:

a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))

l <- lapply(names(a), function(x) b[x == names(b)])
paste0(substr(rep(a, lengths(l)), 1, 1), unlist(l))

which gives the same result and is considerably faster (see the benchmark).


Probably a little complex but works:

unlist( sapply( a, function(x) {
  regex <- paste0( substr(x, 2, 3), '(\\d)')
  z <- sub(regex, paste0(x, "\\1"), b)
  z[!b %in% z] 
} ))

which give: [1] "1234" "1238" "2342" "4325" "4326" "2234" "2238"

The main idea is to create a regex for each entry in a, apply this regex to b and replace the values with the current a value and append only the last digit captured (the (\\d) part of the regex, then filter the resulting vector to get back only the modified values.

Out of curiosity, I did a small benchmark (adding sub_a and sub_b creation into Sotos and Heikki answers so everyone start on the same initial vectors a of 400 observations and b of 500 observations):

Unit: milliseconds
            expr      min       lq     mean   median       uq      max neval
      Jaap(a, b) 341.0224 342.6853 345.2182 344.3482 347.3161 350.2840     3
     Tensi(a, b) 415.9175 416.2672 421.9148 416.6168 424.9134 433.2100     3
    Heikki(a, b) 126.9859 139.6727 149.3252 152.3594 160.4948 168.6302     3
     Sotos(a, b) 151.1264 164.9869 172.0310 178.8474 182.4833 186.1191     3
 MattWBase(a, b) 286.9651 290.8923 293.3795 294.8195 296.5867 298.3538     3

Tags:

R

Vector