Translating dplyr to data.table

Might I recommend the rowid function? It does the grouping step "under the hood" you might find it looks cleaner:

unique(DT, by='mpg')[order(am, mpg), row_num := LETTERS[rowid(am)]]

if you love chaining, you could also get everything inside []:

DT[ , .SD[1L], by = mpg
   ][order(am, mpg), row_num := LETTERS[rowid(am)]]

I'm experimenting with some tweaks to the translation so that dtplyr will automatically produce something more like what you want:

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

dt <- lazy_dt(mtcars)

dt %>% 
  distinct(mpg, .keep_all = TRUE) %>% 
  group_by(am) %>% 
  arrange(mpg, .by_group = TRUE) %>% 
  mutate(row_num = LETTERS[row_number()]) %>% 
  ungroup() %>% 
  show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[seq_len(.N)]), 
#>    keyby = .(am)]

Or avoiding the grouping as @MichaelChirico suggests:

dt %>% 
  distinct(mpg, .keep_all = TRUE) %>% 
  arrange(am, mpg) %>% 
  mutate(row_num = LETTERS[row_number(am)]) %>% 
  ungroup() %>% 
  show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num =  ..LETTERS[frank(am, 
#>    ties.method = "first", na.last = "keep")])]

(Using the .. in front of LETTERS is a data.table feature that makes it clear that you're referring to a variable outside of the data frame; it's probably not necessary here but I think it's better to be safe than sorry.)


We can use seq_len(.N)

unique(DT, by = "mpg")[order(am, mpg)][, 
     `:=`(row_num = LETTERS[seq_len(.N)]), by = .(am)][]