How can I use back references with `grep` in R?

The gsubfn package is more general than the grep and regexpr functions and has ways for you to return the backrefrences, see the strapply function.


The stringr package has a function exactly for this purpose:

library(stringr)
x <- c("May, 1, 2011", "30 June 2011", "June 2012")
str_extract(x, "May|^June")
# [1] "May"  NA     "June"

It's a fairly thin wrapper around regexpr, but stringr generally makes string handling easier by being more consistent than base R functions.


regexpr is similar to grep, but returns the position and length of the (first) match in each string:

> x <- c("May, 1, 2011", "30 June 2011", "June 2012")
> m <- regexpr("May|^June", x)
> m
[1]  1 -1  1
attr(,"match.length")
[1]  3 -1  4

This means that the first string had a match of length 3 staring at position 1, the second string had no match, and the third string had a match of length 4 at position 1.

To extract the matches, you could use something like:

> m[m < 0] = NA
> substr(x, m, m + attr(m, "match.length") - 1)
[1] "May"  NA     "June"

Tags:

Regex

R