data.table operations by column name

This is similar to :

How to generate a linear combination of variables and update table using data.table in a loop call?

but you want to combine this with by= too, so set() isn't flexible enough. That's a deliberate design design and set() is unlikely to change in that regard.

I sometimes use the EVAL helper at the end of that answer.
https://stackoverflow.com/a/20808573/403310
Some wince at that approach but I just think of it like constructing a dynamic SQL statement, which is quite common practice. The EVAL approach gives ultimate flexibility without head scratching about eval() and quote(). To see the dynamic query that's been constructed (to check it) you can add a print inside your EVAL helper function.

However, in this simple example you can wrap the LHS of := with brackets to tell data.table to lookup the value (clearer than with=FALSE), and the RHS needs a get().

for (n in c("a","b")) {
  s <- paste0("s",n)
  a[, (s) := sum(get(n)), by="id"]
}

Edit 2020-02-15 about ..

data.table also supports the .. syntax to "look up a level", obviating the need for with=FALSE in most cases, e.g. dt[ , ..n1] and dt[ , ..n2] in the below


have a look at with in ? data.table:

dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[, n3 := dt[ , n1, with = FALSE ] * dt[ , n2, with = FALSE ], with = FALSE ]

EDIT:

Or you just change the colnames forth and back:

dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[ , dt.names["n3"] := 1L, with = FALSE ]

dt.names <- c( n1 = "a", n2 = "b", n3 = "c" )
setnames( dt, dt.names, names(dt.names) )

dt[ , n3 := n1 * n2, by = "id" ]
setnames( dt, names(dt.names), dt.names )

which works together with by.


You can also do this:

a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")

a[, c("sa", "sb") := lapply(.SD, sum), by = id]

Or slightly more generally:

cols.to.sum = c("a", "b")
a[, paste0("s", cols.to.sum) := lapply(.SD, sum), by = id, .SDcols = cols.to.sum]

Tags:

R

Data.Table