How does one change the levels of a factor column in a data.table

I would rather go the traditional way of re-assignment to the factors

> mydt$value # This we what we had originally
[1] A A B B B C
Levels: A B C
> levels(mydt$value) # just checking the levels
[1] "A" "B" "C"
**# Meat of the re-assignment**
> levels(mydt$value)[levels(mydt$value)=="A"] <- "X"
> levels(mydt$value)[levels(mydt$value)=="B"] <- "Y"
> levels(mydt$value)[levels(mydt$value)=="C"] <- "Z"
> levels(mydt$value)
[1] "X" "Y" "Z"
> mydt # This is what we wanted
   id value
1:  1     X
2:  2     X
3:  3     Y
4:  4     Y
5:  5     Y
6:  6     Z

As you probably notices, the meat of the re-assignment is very intuitive, it checks for the exact level(use grepl in case there's a fuzzy math, regular expressions or likewise)

levels(mydt$value)[levels(mydt$value)=="A"] <- "X" This explicitly checks the value in the levels of the variable under consideration and then reassigns X (and so on) to it - The advantage- you explicitly KNOW what labeled what.

I find renaming levels as here levels(mydt$value) <- c("X","Y","Z") very non-intuitive, since it just assigns X to the 1st level it SEES in the data (so the order really matters)

PPS : In case of too many levels, use looping constructs.


You can still set them the traditional way:

levels(mydt$value) <- c(...)

This should be plenty fast unless mydt is very large since that traditional syntax copies the entire object. You could also play the un-factoring and refactoring game... but no one likes that game anyway.

To change the levels by reference with no copy of mydt :

setattr(mydt$value,"levels",c(...))

but be sure to assign a valid levels vector (type character of sufficient length) otherwise you'll end up with an invalid factor (levels<- does some checking as well as copying).

Tags:

R

Data.Table