How to factorize specific columns in a data.frame in R using apply

The result of apply is a vector or array or list of values (see ?apply).

For your problem, you should use lapply instead:

data(iris)
iris[, 2:3] <- lapply(iris[, 2:3], as.factor)
str(iris)

'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
 $ Petal.Length: Factor w/ 43 levels "1","1.1","1.2",..: 5 5 4 6 5 8 5 6 5 6 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Notice that this is one place where lapply will be much faster than a for loop. In general a loop and lapply will have similar performance, but the <-.data.frame operation is very slow. By using lapply one avoids the <- operation in each iteration, and replaces it with a single assign. This is much faster.


That is because apply() works completely different. It will first carry out the function as.factor in a local environment, collect the results from that, and then try to merge them in to an array and not a dataframe. This array is in your case a matrix. R meets different factors and has no other way to cbind them than to convert them to character first. That character matrix is used to fill up your dataframe.

You can use lapply for that (see Andrie's answer) or colwise from the plyr function.

require(plyr)
Df[,ids] <- colwise(as.factor)(Df[,ids])

Tags:

R

Apply

Dataframe