# Format numbers with million (M) and billion (B) suffixes

If you begin with this numeric vector x,

x <- c(6e+06, 75000400, 743450000, 340000, 4300000)


you could do the following.

paste(format(round(x / 1e6, 1), trim = TRUE), "M")
# [1] "6.0 M"   "75.0 M"  "743.5 M" "0.3 M"   "4.3 M"


And if you're not concerned about trailing zeros, just remove the format() call.

paste(round(x / 1e6, 1), "M")
# [1] "6 M"     "75 M"    "743.5 M" "0.3 M"   "4.3 M"


Alternatively, you could assign an S3 class with print method and keep y as numeric underneath. Here I use paste0() to make the result a bit more legible.

print.million <- function(x, quote = FALSE, ...) {
x <- paste0(round(x / 1e6, 1), "M")
NextMethod(x, quote = quote, ...)
}
## assign the 'million' class to 'x'
class(x) <- "million"
x
# [1] 6M     75M    743.5M 0.3M   4.3M
x[]
# [1]   6000000  75000400 743450000    340000   4300000


You could do the same for billions and trillions as well. For information on how to put this into a data frame, see this answer, as you'll need both a format() and an as.data.frame() method.

Recent versions of the scales package include functionality to print readable labels. If you're using ggplot or tidyverse, scales is probably already installed. You might have to update the package though.

In this case, label_number_si can be used:

> library(scales)
> inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
> label_number_si(accuracy=0.1)(inp)
[1] "6.0M"   "75.0M"  "743.4M" "340.0K" "4.3M"


Obviously you first need to get rid of the commas in the formatted numbers, and gsub("\\,", ...) is the way to go. This uses findInterval to select the appropriate suffix for labeling and determine the denominator for a more compact display. Can be easily extended in either direction if one wanted to go below 1.0 or above 1 trillion:

comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\\,", "", tx)),
c(0, 1e3, 1e6, 1e9, 1e12) )  # modify this if negative numbers are possible
paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 2),
c("","K","M","B","T")[div] )}


You don't need to remove the as.numeric or gsub if the input is numeric. It's admittedly superfluous, but would succeed. This is the result with Gregor's example:

> comprss (big_x)
[1] "123 "     "500 "     "999 "     "1.05 K"   "9 K"
[6] "49 K"     "105.4 K"  "998 K"    "1.5 M"    "20 M"
[11] "313.4 M"  "453.12 B"


And with the original input (which was probably a factor variable if entered with read.table, read.csv or created with data.frame.)

comprss (dat\$V2)
[1] "6 M"      "75 M"     "743.45 M" "340 K"    "4.3 M"


And of course these can be printed without the quotes using either an explicit print command using quotes=FALSE or by using cat.