R: Capitalizing everything after a certain character

You were very close:

gsub("(_.*)","\\U\\1",x,perl=TRUE)

seems to work. You just needed to use _.* (underscore followed by zero or more other characters) rather than _* (zero or more underscores) ...

To take this apart a bit more:

  • _.* gives a regular expression pattern that matches an underscore _ followed by any number (including 0) of additional characters; . denotes "any character" and * denotes "zero or more repeats of the previous element"
  • surrounding this regular expression with parentheses () denotes that it is a pattern we want to store
  • \\1 in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched _.*
  • \\U, in conjunction with perl=TRUE, says "put what follows in upper case" (uppercasing _ has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern: gsub("g(.*)","g\\U\\1",x,perl=TRUE))

For more details, search for "replacement" and "capitalizing" in ?gsub (and ?regexp for general information about regular expressions)


gsubfn in the gsubfn package is like gsub except the replacement string can be a function. Here we match _ and everything afterwards feeding the match through toupper :

> library(gsubfn)
>
> gsubfn("_.*", toupper, x)
[1] "NYC_23DF"  "BOS_3_RB"  "mgh_3_3_F"

Note that this approach involves a particularly simple regular expression.