How to remove extra white space between words inside a character vector using?

Another option is the squish function from the stringr library

library(stringr)
string <- "Hi,  this is a   good  time to   start working   together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""

The package textclean has many useful tools for processing text. replace_white would be useful here:

v <- "Hi,  this is a   good  time to   start working   together."

textclean::replace_white(v)
# [1] "Hi, this is a good time to start working together."

gsub is your friend:

test <- "Hi,  this is a   good  time to   start working   together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."

\\s+ will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " ".


Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)

gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is  the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")

See regex demo #1 and regex demo #2 and this R demo.

Regex details:

  • (\S) - Capturing group 1 (\1 refers to this group value from the replacement pattern): a non-whitespace char
  • \s{2,} - two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2))
  • (?=\S) - a positive lookahead that requires a non-whitespace char immediately to the right of the current location.

Tags:

Regex

R