Extracting a certain substring (email address)

Those look like what R might call a "person". There is an as.person() function that can split out the email address. For example

v1 <- c("Persons Name <[email protected]>","person 2 <[email protected]>")
unlist(as.person(v1)$email)
# [1] "[email protected]" "[email protected]"

For more information, see the ?person help page.


One option with str_extract from stringr

library(stringr)
str_extract(v1, "(?<=\\<)[^>]+")
#[1] "[email protected]" "[email protected]"  

You can look for the pattern "anything**, then <, then (anything), then >, then anything" and replace that pattern with the part between the parentheses, indicated by \1 (and an extra \ to escape).

sub('.*<(.*)>.*', '\\1', v1)
# [1] "[email protected]" "[email protected]" 

** "anything" actually means anything but line breaks

Tags:

Regex

R

Substring