How to select all factor variables in R

This (almost) appears the perfect time to use the seldom-used function rapply

rapply(insurance, class = "factor", f = levels, how = "list")

Or

Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list"))

To remove the NULL elements (that weren't factors)

Or simply

lapply(Filter(is.factor,insurance), levels))

insurance %>% select_if(~class(.) == 'factor')

I would suggest to use dplyr and purrr here. First select the factor columns and then use purrr::map to show the factor levels for each column.

library(tidyverse)

insurance %>%
  select(where(is.factor)) %>%
  map(levels)

Some data:

insurance <- data.frame(
  int   = 1:5,
  fact1 = letters[1:5],
  fact2 = factor(1:5),
  fact3 = LETTERS[3:7]
)

I would use sapply like you did, but combined with is.factor to return a logical vector:

is.fact <- sapply(insurance, is.factor)
#   int fact1 fact2 fact3 
# FALSE  TRUE  TRUE  TRUE

Then use [ to extract these columns:

factors.df <- insurance[, is.fact]
#   fact1 fact2 fact3
# 1     a     1     C
# 2     b     2     D
# 3     c     3     E
# 4     d     4     F
# 5     e     5     G

Finally, to get the levels, use lapply:

lapply(factors.df, levels)
# $fact1
# [1] "a" "b" "c" "d" "e"
# 
# $fact2
# [1] "1" "2" "3" "4" "5"
# 
# $fact3
# [1] "C" "D" "E" "F" "G"

You might also find str(insurance) interesting as a short summary.

Tags:

R