data.table alternative for dplyr case_when

1) If the conditions are mutually exclusive with a default if all conditions are false then this works:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := c("e", "c", "d")[1 +
                             1 * (a == "a" & b == "b") + 
                             2 * (a == "b" & b == "a")]
]

giving:

> DT
   a b new
1: a b   c
2: b a   d
3: a a   e

2) If the results of the conditions are numeric then it is even easier. For example suppose instead of c and d we want 10 and 17 with a default of 3. Then:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := 3 + 
            (10 - 3) * (a == "a" & b == "b") + 
            (17 - 3) * (a == "b" & b == "a")]

3) Note that adding a 1-liner is sufficient to implement this. It assumes that there is at least one TRUE leg for each row.

when <- function(...) names(match.call()[-1])[apply(cbind(...), 1, which.max)]

# test
DT[, new := when(c = a == 'a' & b == 'b', 
                 d = a == 'b' & b == 'a', 
                 e = TRUE)]

This is not really an answer, but a bit too long for a comment. If deemed inappropriate I'm happy to remove the post.

There exists an interesting post on RStudio Community that discusses options to use dplyr::case_when without the usual tidyverse dependencies.

To summarise, three alternatives seem to exist:

  1. Stefan Fleck isolated case_when from dplyr and build a new package lest that depends only on base.
  2. yonicd developed noplyr, which "provides basic dplyr and tidyr functionality without the tidyverse dependencies".
  3. Bob Rudis (hrbrmstr) is the creator of freebase, a "A 'usethis'-like Package for Base R Pseudo-equivalents of 'tidyverse' Code", which might also be worth checking out.

If it is only case_when that you're after, I imagine lest might be an attractive & minimal option in combination with data.table.


Update [29 October 2019]

Tyson Barrett recently made the package tidyfast available (currently as version 0.1.0) on GitHub, which provides function "dt_case_when for dplyr::case_when() syntax with the speed of data.table::fifelse()".

Update [25 February 2020]

There is also dtplyr, authored by Lionel Henry and maintained by Hadley Wickham, which "provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.".


FYI, a more recent answer for those coming across this post 2019. data.table versions above 1.13.0 have the fcase function that can be used. Note that it is not a drop-in replacement for dplyr::case_when as the syntax is different, but will be a "native" data.table way of calculation.

# Lazy evaluation
x = 1:10
data.table::fcase(
    x < 5L, 1L,
    x >= 5L, 3L,
    x == 5L, stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3

dplyr::case_when(
    x < 5L ~ 1L,
    x >= 5L ~ 3L,
    x == 5L ~ stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
#  provided value is an unexpected one!

# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
  x < 10L ~ 0L,
  x < 20L ~ 10L,
  x < 30L ~ 20L,
  x < 40L ~ 30L,
  x < 50L ~ 40L,
  x < 60L ~ 50L,
  x > 60L ~ 60L
),
data.table::fcase(
  x < 10L, 0L,
  x < 20L, 10L,
  x < 30L, 20L,
  x < 40L, 30L,
  x < 50L, 40L,
  x < 60L, 50L,
  x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
#               expr   min    lq  mean   median    uq    max neval
# dplyr::case_when   11.57 11.71 12.22    11.82 12.00  14.02     5
# data.table::fcase   1.49  1.55  1.67     1.71  1.73   1.86     5

Source, data.table NEWS for 1.13.0, released (24 Jul 2020).