Remove text after the second space

split the string on " " and then extrat the first 2 and paste them together

x <- c("Agarista revoluta (Spreng.) Hook. f. ex Nied.", "Amaioua intermedia Mart.", 
       "Baccharis reticularia DC.")
sapply(x, function(y) paste(unlist(strsplit(y, " "))[1:2], collapse = " "))

Another possible regex could be,

sub('^(\\w+\\s+\\w+).*', '\\1', x)
#[1] "Agarista revoluta"     "Amaioua intermedia"    "Baccharis reticularia"

Alternatively, stringr package has some nice functions for these type of operations. For example,

library(stringr)
word(x, 1, 2)
#[1] "Agarista revoluta"     "Amaioua intermedia"    "Baccharis reticularia"

This uses no regular expressions or packages:

with(read.table(text = m, fill = TRUE), trimws(paste(V1, V2)))

giving:

[1] "Agarista revoluta"     "Amaioua intermedia"    "Baccharis reticularia"

If every input has at least two words then you can omit the trimws.


You may use

x <- c("Agarista revoluta (Spreng.) Hook. f. ex Nied.", "Amaioua intermedia Mart.", "Baccharis reticularia DC.")
sub("^(\\S*\\s+\\S+).*", "\\1", x)
## => [1] "Agarista revoluta"     "Amaioua intermedia"    "Baccharis reticularia"

See the regex demo and an online R demo.

Pattern details:

  • ^ - start of string
  • (\\S*\\s+\\S+) - Group 1 capturing 0+ non-whitespace chars, then 1+ whitespaces, and then 1+ non-whitespaces
  • .* - any 0+ chars, as many as possible (up to the end of string).

Note that in case your strings might have leading whitespace, and you do not want to count that whitespace in, you should use

sub("^\\s*(\\S+\\s+\\S+).*", "\\1", x)

See another R demo

Tags:

String

Regex

R