Most efficient way to calculate function with large number of parameter combinations

@josliber's answer is very good. Yet, it makes it look like R is bad... and you have to switch to C++ for performance.

There are three tricks that are implemented in their answer:

  • precompute the vector of threshold
  • precompute the absolute value of dX_i
  • sort these values to stop the sum early

The first two tricks are just an R trick called "vectorization" -> basically do your operations (e.g. gamma * a * sigma_hat * delta_j^(1/2) or abs()) on the whole vectors instead of on a single element within a loop.

This is exactly what you do when using sum( dX_i^p * vec_boolean ); it is vectorized (the * and sum) so that it should be very fast.

If we implement just these two tricks (we can't really do the third one the same way because it breaks vectorization), it gives:

abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
  in_sum <- (abs_dX_i < thresh[i])
  sum(abs_dX_i[in_sum]^p[i])
})
all.equal(result, result3) # TRUE

If we benchmark all three solutions:

microbenchmark::microbenchmark(
  OP = {
    result <- sapply(1:nrow(parameters), function(x) {
      tmp <- parameters[x,]
      p <- tmp$p
      a <- tmp$a
      gamma <- tmp$gamma
      sigma_hat <- tmp$sigma_hat
      delta_j <- tmp$delta_j

      B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))

      return(B)
    })
  },
  RCPP = {
    result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a *
                      parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
  },
  R_VEC = {
    abs_dX_i <- abs(dX_i)
    thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
    p <- parameters$p
    result3 <- sapply(1:nrow(parameters), function(i) {
      in_sum <- (abs_dX_i < thresh[i])
      sum(abs_dX_i[in_sum]^p[i])
    })
  },
  times = 10
)

We get:

Unit: milliseconds
  expr      min       lq      mean   median       uq      max neval
    OP 224.8414 235.4075 289.90096 270.2767 347.1727 399.3262    10
  RCPP  14.8172  15.4691  18.83703  16.3979  20.3829  29.6624    10
 R_VEC  28.3136  29.5964  32.82456  31.4124  33.2542  45.8199    10

It gives a huge speedup by just slightly modifying your original code in R. This is less than twice as slow as the Rcpp code and can be easily parallelized as you previously did with parSapply().


When I want to speed up hard-to-vectorize code, I often turn to Rcpp. At the end of the day you are trying to sum up abs(dX_i)^p, limiting to values of abs(dX_i) smaller than threshold gamma * a * sigma_hat * delta_j^(1/2). You want to do this for a bunch of pairs of p and a threshold. You could accomplish this with:

library(Rcpp)
cppFunction(
"NumericVector proc(NumericVector dX_i, NumericVector thresh, NumericVector p) {
  const int n = thresh.size();
  const int m = dX_i.size();
  NumericVector B(n);
  for (int i=0; i < n; ++i) {
    B[i] = 0;
    for (int j=0; j < m; ++j) {
      if (dX_i[j] < thresh[i]) {
        B[i] += pow(dX_i[j], p[i]);
      } else {
        break;
      }
    }
  }
  return B;
}"
)
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
all.equal(result, result2)
# [1] TRUE

Note that my code sorts the absolute value of dX_i so it can stop the calculation once it encounters the first value exceeding the threshold.

On my machine, I see a 20x speedup from 0.158 seconds for your code to 0.007 seconds for the Rcpp code (measured using system.time).


One observation is that you actually have a huge number of repeats of each p value within your parameter set. You might separately process each p value; in this way, you only need to sum dX_i raised to a particular p value once.

result4 <- rep(NA, nrow(parameters))
sa_dX_i <- sort(abs(dX_i))
thresh <- parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2)
loc <- findInterval(thresh, sa_dX_i)
loc[loc == 0] <- NA  # Handle threshold smaller than everything in dX_i
for (pval in unique(parameters$p)) {
  this.p <- parameters$p == pval
  cs_dX_i_p <- cumsum(sa_dX_i^pval)
  result4[this.p] <- cs_dX_i_p[loc[this.p]]
}
result4[is.na(result4)] <- 0  # Handle threshold smaller than everything in dX_i
all.equal(result, result4)
# [1] TRUE

To see this in action, let's scale up the original dataset to what is described in the question (~600k rows of parameters and ~80k values in dX_i):

set.seed(144)
dX_i <- rnorm(80000, 0, 0.0002540362)
p_vec <- seq(0, 1, 0.025)  
gamma_vec <- seq(1, 2, 0.025)     
a_vec <- seq(2, 6, 0.3)
sigma_hat_vec <- c(0.03201636, 0.05771143, 0.07932116, 0.12262327, 0.15074560)
delta_j_vec <- c(0.0000005850109, 0.0000011700217, 0.0000017550326, 0.0000035100651, 0.0000052650977)
parameters <- expand.grid("p" = p_vec, "gamma" = gamma_vec, "a" = a_vec, "sigma_hat" = sigma_hat_vec, "delta_j" = delta_j_vec)
dim(parameters)
# [1] 588350      5
length(unique(parameters$p))
# [1] 41

The speedup is rather dramatic -- this code takes 0.27 seconds on my computer, while the Rcpp code posted in my other answer to this question takes 655 seconds (a 2400x speedup, using pure R!). Clearly this speedup only works if there are relatively few p values (each repeated many times) in the parameters data frame. If each p value is unique, this would likely be much slower than the other approaches proposed.