How to trim white spaces when trimws is not working?

The character with ASCII code 160 is called a "non-breaking space." One can read about it in Wikipedia:

https://en.wikipedia.org/wiki/Non-breaking_space

The trimws() function does not include it in the list of characters that are removed by the function:

x <- intToUtf8(c(160,49,49,57,57,46,48,48))
x
#[1] " 1199.00"

trimws(x)
#[1] " 1199.00"

One way to get rid of it is by using str_trim() function from the stringr library:

library(stringr)
y <- str_trim(x)
trimws(y)
[1] "1199.00"

Another way is by applying iconv() function first:

y <- iconv(x, from = 'UTF-8', to = 'ASCII//TRANSLIT')
trimws(y)
#[1] "1199.00"

UPDATE To explain why trimws() does not remove the "invisible" character described above and stringr::str_trim() does.

Here is what we read from trimws() help:

For portability, ‘whitespace’ is taken as the character class [ \t\r\n] (space, horizontal tab, line feed, carriage return)

For stringr::str_trim() help topic itself does not specify what is considered a "white space" but if you look at the help for stri_trim_both which is called by str_trim() you will see: stri_trim_both(str, pattern = "\\P{Wspace}") Basically in this case it is using a wider range of characters that are considered as a white space.

UPDATE 2

As @H1 noted, version 3.6.0 provides an option to specify what to consider a whitespace character:

Internally, 'sub(re, "", *, perl = TRUE)', i.e., PCRE library regular expressions are used. For portability, the default 'whitespace' is the character class '[ \t\r\n]' (space, horizontal tab, carriage return, newline). Alternatively, '[\h\v]' is a good (PCRE) generalization to match all Unicode horizontal and vertical white space characters, see also <URL: https://www.pcre.org>.

So if you are using version 3.6.0 or later you can simply do:

> trimws(x,whitespace = "[\\h\\v]")
#[1] "1199.00"

From R version 3.6.0 trimws() has an argument allowing you to define what is considered whitespace which in this case is a no break space.

trimws(x, whitespace = "\u00A0|\\s")
[1] "1199.00"

Tags:

R