Digit sum function in R

This should be better:

digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)

I wondered which of the three suggested methods (plus a fourth one) is the fastest so I did some benchmarking.

  1. digitsum1 <- function(x) sum(as.numeric(unlist(strsplit(as.character(x), split = ""))))

  2. digitsum2 <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)

  3. Using function digitsBase from package GLDEX:

    library(GLDEX, quietly = TRUE)
    digitsum3 <-  function(x) sum(digitsBase(x, base = 10))
    
  4. Based on a function by Greg Snow in the R-help mailing list:

    digitsum4 <- function(x) sum(x %/% 10^seq(0, length.out = nchar(x)) %% 10)

Benchmark code:

library(microbenchmark, quietly = TRUE)
# define check function
my_check <- function(values) {
  all(sapply(values[-1], function(x) identical(values[[1]], x)))
}
x <- 1001L:2000L
microbenchmark(
  sapply(x, digitsum1),
  sapply(x, digitsum2),
  sapply(x, digitsum3),
  sapply(x, digitsum4),
  times = 100L, check = my_check
)

Benchmarks results:

#> Unit: milliseconds
#>                  expr   min    lq  mean median    uq   max neval
#>  sapply(x, digitsum1)  3.41  3.59  3.86   3.68  3.89  5.49   100
#>  sapply(x, digitsum2)  3.00  3.19  3.41   3.25  3.34  4.83   100
#>  sapply(x, digitsum3) 15.07 15.85 16.59  16.22 17.09 24.89   100
#>  sapply(x, digitsum4)  9.76 10.29 11.18  10.56 11.48 45.20   100

Variant 2 is slightly faster than variant 1 while variants 4 and 3 are much slower. Although the code of variant 4 seems to be similar to variant 2, variant 4 is less efficient (but still better than variant 3).

Full benchmark results (including graphs) are on github.


I'm not sure why you would think there would be an inbuilt function to do that. It not really a statistical operation. More of a number theory sort of procedure. (There are many examples that can be found with a search of the Rhelp Archives. I use Markmail for that purpose but there are other search engines like RSeek, GMane, and the Newcastle webpage. Your function would take a series of numbers and return a single number that was the digit sum of all of them. If that were the goal then it looks reasonably designed. I would have guessed that one would want the digit sums from each number:

sapply( c(1,2,123), 
        function(x) sum( as.numeric(unlist(strsplit(as.character(x), split=""))) ))
[1] 1 2 6

There is a "digitizing" funciton digitsBase in pkg:GLDEX, and you could replace your as.numeric(unlist(split(as.character(x),""))) with that function:

digitsBase(x, 10)

Tags:

R