grepl in R to find matches to any of a list of character strings

Not sure what you tried but this seems to work:

data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")

Similar to the answer you linked to.

The trick is using the paste:

paste(matches, collapse = "|")
#[1] "cat|dog"

So it creates a regular expression with either dog OR cat and would also work with a long list of patterns without typing each.

Edit:

In case you are doing this to later on subset the data.frame according to "Keep" and "Discard" entries, you could do this more directly using:

data[grepl(paste(matches, collapse = "|"), data$animal),]

This way, the results of grepl which are TRUE or FALSE are used for the subset.


You can use an "or" (|) statement inside the regular expression of grepl.

ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard"
# [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"   
#[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"   
#[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"   
#[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[49] "keep"    "discard"

The regular expression dog|cat tells the regular expression engine to look for either "dog" or "cat", and return the matches for both.


Try to avoid ifelse as much as possible. This, for example, works nicely

c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]

For a 123 seed you will get

##  [1] "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Discard" "Keep"   
##  [9] "Discard" "Discard" "Keep"    "Discard" "Keep"    "Discard" "Keep"    "Keep"   
## [17] "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"   
## [25] "Keep"    "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [33] "Keep"    "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [41] "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"    "Discard"
## [49] "Keep"    "Keep"   

Tags:

Regex

R

Grepl