There is pmin and pmax each taking na.rm, why no psum?

Following @JoshUlrich's comment on the previous question,

psum <- function(...,na.rm=FALSE) { 
    rowSums(do.call(cbind,list(...)),na.rm=na.rm) } 

edit: from Sven Hohenstein:

psum2 <- function(...,na.rm=FALSE) { 
    dat <- do.call(cbind,list(...))
    res <- rowSums(dat, na.rm=na.rm) 
    idx_na <- !rowSums(!is.na(dat))
    res[idx_na] <- NA
    res 
}

x = c(1,3,NA,5,NA)
y = c(2,NA,4,1,NA)
z = c(1,2,3,4,NA)

psum(x,y,na.rm=TRUE)
## [1] 3 3 4 6 0
psum2(x,y,na.rm=TRUE)
## [1] 3 3 4 6 NA

n = 1e7
x = sample(c(1:10,NA),n,replace=TRUE)
y = sample(c(1:10,NA),n,replace=TRUE)
z = sample(c(1:10,NA),n,replace=TRUE)

library(rbenchmark)
benchmark(psum(x,y,z,na.rm=TRUE),
          psum2(x,y,z,na.rm=TRUE),
          pmin(x,y,z,na.rm=TRUE), 
          pmax(x,y,z,na.rm=TRUE), replications=20)

##                          test replications elapsed relative 
## 4  pmax(x, y, z, na.rm = TRUE)           20  26.114    1.019 
## 3  pmin(x, y, z, na.rm = TRUE)           20  25.632    1.000 
## 2 psum2(x, y, z, na.rm = TRUE)           20 164.476    6.417
## 1  psum(x, y, z, na.rm = TRUE)           20  63.719    2.486

Sven's version (which arguably is the correct one) is quite a bit slower, although whether it matters obviously depends on the application. Anyone want to hack up an inline/Rcpp version?

As for why this doesn't exist: don't know, but good luck getting R-core to make additions like this ... I can't offhand think of a sufficiently widespread *misc package into which this could go ...

Follow up thread by Matthew on r-devel is here (which seems to confirm) :
r-devel: There is pmin and pmax each taking na.rm, how about psum?


After a quick search on CRAN, there are at least 3 packages that have a psum function. rccmisc, incadata and kit. kit seems to be the fastest. Below reproducing the example of Ben Bolker.

benchmark(
  rccmisc::psum(x,y,z,na.rm=TRUE),
  incadata::psum(x,y,z,na.rm=TRUE),
  kit::psum(x,y,z,na.rm=TRUE), 
  psum(x,y,z,na.rm=TRUE),
  psum2(x,y,z,na.rm=TRUE),
  replications=20
)
#                                    test replications elapsed relative
# 2 incadata::psum(x, y, z, na.rm = TRUE)           20   20.05   14.220
# 3      kit::psum(x, y, z, na.rm = TRUE)           20    1.41    1.000
# 4           psum(x, y, z, na.rm = TRUE)           20    8.04    5.702
# 5          psum2(x, y, z, na.rm = TRUE)           20   20.44   14.496
# 1  rccmisc::psum(x, y, z, na.rm = TRUE)           20   23.24   16.482

Tags:

R