How to preserve base data frame rownames upon filtering in dplyr chain

For gene counts, you often want to know if at least x samples have more than y counts, rather than just across all samples.

Not as pretty as filter_if, but I'm not sure how you'd implement the same rowSums conditions using all_vars

   x <- sample_threshold  
   y <- count_threshold

   require(dplyr) 
   require(tibble)

   df %>%  
       tibble::rownames_to_column('gene') %>%  
       dplyr::filter(rowSums(dplyr::select(., -gene) > y) > x) %>%  
       tibble::column_to_rownames('gene')

Here is another base R method with Reduce

df[Reduce(`&`, lapply(df, `>=`, 8)),]
#       BoneMarrow Pulmonary
#ATP1B1         30      3380
#PRR11        2703        27

you can convert rownames to a column and revert back after filtering:

library(dplyr)
library(tibble)  # for `rownames_to_column` and `column_to_rownames`

df %>%
    rownames_to_column('gene') %>%
    filter_if(is.numeric, all_vars(. >= 8)) %>%
    column_to_rownames('gene')

#        BoneMarrow Pulmonary
# ATP1B1         30      3380
# PRR11        2703        27

How about try this by using base R Boolean

df[rowSums(df>8)==dim(df)[2],] 

       BoneMarrow Pulmonary
ATP1B1         30      3380
PRR11        2703        27

EDIT1: Or you can do df[!rowSums(df<8),] (as per @user20650) will give back you same result.

Tags:

R

Dplyr