Select a sequence of columns: `:` works but not `seq`

The lesson I learned is to use list instead of c:

 DT[ ,list(ID,Capacity)]
 #---------------------------
     ID Capacity
  1:  1      483
  2:  2      703
  3:  3      924
  4:  4      267
  5:  5      588
 ---            
196: 46      761
197: 47      584
198: 48      402
199: 49      416
200: 50      130

It lets you ignore those pesky quotations, and it also moves you in the direction of seeing the j argument as an evaluated expression with an environment of the datatable itself.

To 'get' the named columns by number use the mget function and the names function. R 'names' are language elements, i.e., data objects in the search path from the current environment. Column names of dataframes are not actually R names. So you need a function that will take a character value and cause the interpreter to consider it a fully qualified name. Datatable-[-function syntax for the j item does handle column names as language objects rather than character values as would the [.data.frame-function:

DT[ ,mget(names(DT)[c(1,2)])]
     ID Capacity
  1:  1      483
  2:  2      703
  3:  3      924
  4:  4      267
  5:  5      588
 ---            
196: 46      761
197: 47      584
198: 48      402
199: 49      416
200: 50      130

On recent versions of data.table, numbers can be used in j to specify columns. This behaviour includes formats such as DT[,1:2] to specify a numeric range of columns. (Note that this syntax does not work on older versions of data.table).

So why does DT[ , 1:2] work, but DT[ , seq(1:2)] does not? The answer is buried in the code for data.table:::[.data.table, which includes the lines:

  if (!missing(j)) {
    jsub = replace_dot_alias(substitute(j))
    root = if (is.call(jsub)) 
      as.character(jsub[[1L]])[1L]
    else ""
    if (root == ":" || (root %chin% c("-", "!") && is.call(jsub[[2L]]) && 
        jsub[[2L]][[1L]] == "(" && is.call(jsub[[2L]][[2L]]) && 
        jsub[[2L]][[2L]][[1L]] == ":") || (!length(all.vars(jsub)) && 
            root %chin% c("", "c", "paste", "paste0", "-", "!") && 
            missing(by))) {
      with = FALSE
    }

We can see here that data.table is automatically setting the with = FALSE parameter for you when it detects the use of function : in j. It doesn't have the same functionality built in for seq, so we have to specify with = FALSE ourselves if we want to use the seq syntax.

DT[ , seq(1:2), with = FALSE]

Tags:

R

Data.Table