R: Remove multiple empty columns of character variables

If you're talking about columns where all values are NA, use remove_empty("cols") from the janitor package.

If you have character vectors where every value is the empty string "", you can first convert those values to NA throughout your data.frame with na_if from the dplyr package:

dat <- data.frame(
  x = c("a", "b", "c"),
  y = c("", "", ""),
  z = c(NA, NA, NA),
  stringsAsFactors = FALSE
)

dat
#>   x y  z
#> 1 a   NA
#> 2 b   NA
#> 3 c   NA

library(dplyr)
library(janitor)

dat %>%
  mutate_all(funs(na_if(., ""))) %>%
  remove_empty("cols")
#>   x
#> 1 a
#> 2 b
#> 3 c

If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.

Sample data:

mydf <- data.frame(
  A = c("a", "b"),
  B = c("y", ""),
  C = c("", ""),
  D = c("", ""),
  E = c("", "z")
)
mydf
#   A B C D E
# 1 a y      
# 2 b       z

Identifying and removing the "empty" columns.

mydf[!sapply(mydf, function(x) all(x == ""))]
#   A B E
# 1 a y  
# 2 b   z

Alternatively, as recommended by @Roland:

> mydf[, colSums(mydf != "") != 0]
  A B E
1 a y  
2 b   z

You can do either of the following:

emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]

or:

emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]

If by empty you mean they are "", the second approach can be adapted like so:

emptycols <- colSums(df == "") == nrow(df)

I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.

The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:

df[!sapply(df, function (x) all(is.na(x) | x == ""))]

R: Remove multiple empty columns of character variables

Tags:

R

Is Empty

Isnullorempty

Related

Recent Posts