Excluding columns from a dataframe based on column sums

I've just had the same problem. Here's a solution using Tidyverse, in case you have both numeric and non numeric columns :

library(tidyverse)    
set.seed(123)
dat <- data.frame(var1 = runif(10), var2 = rnorm(10), var3 = rlnorm(10), var4 = "notNumeric", var5 =0, var6 = FALSE )

dat %>% 
  select_if(negate(function(col) is.numeric(col) && sum(col) < 15))

What about a simple subset? First, we create a simple data frameL

R> dd = data.frame(x = runif(5), y = 20*runif(5), z=20*runif(5))

Then select the columns where the sum is greater than 15

R> dd1 = dd[,colSums(dd) > 15]
R> ncol(dd1)
[1] 2

In your data set, you only want to subset columns 6 onwards, so something like:

 ##Drop the first five columns
 dd[,colSums(dd[,6:ncol(dd)]) > 15]

or

 #Keep the first six columns
 cols_to_drop = c(rep(TRUE, 5), dd[,6:ncol(dd)]>15)
 dd[,cols_to_drop]

should work.


The key part to note is that in the square brackets, we want a vector of logicals, i.e. a vector of TRUE and FALSE. So if you wanted to subset using something a bit more complicated, then create a function that returns TRUE or FALSE and subset as usual.


You should be able to directly index your data.frame with a boolean and colSums(). For example:

set.seed(123)
dat <- data.frame(var1 = runif(10), var2 = rnorm(10), var3 = rlnorm(10))
colSums(dat)
#-----
     var1      var2      var3 
 5.782475  1.317914 12.91987
#Let's set the threshold at 5, so we should get var1 and var3
> dat[, colSums(dat) > 5]
#-----
        var1      var3
1  0.2875775 5.9709924
2  0.7883051 1.6451811
3  0.4089769 0.1399294
...

EDIT - to address non-numeric columns

set.seed(123)
dat <- data.frame(var1 = runif(10), var2 = rnorm(10), var3 = rlnorm(10), var4 = "notNumeric")

require(plyr)
dat[, -which(numcolwise(sum)(dat) < 5)]

Consolec:/documents and settings/charles/desktop/

> dat[, -which(numcolwise(sum)(dat) < 5)]
        var1      var3       var4
1  0.2875775 5.9709924 notNumeric
2  0.7883051 1.6451811 notNumeric
3  0.4089769 0.1399294 notNumeric
.....

Tags:

R