What is the **tidyverse** method for splitting a df by multiple columns?

dplyr 0.8.0 has introduced the verb that you were looking for: group_split()

From the documentation:

group_split() works like base::split() but

  • it uses the grouping structure from group_by() and therefore is subject to the data mask

  • it does not name the elements of the list based on the grouping as this typically loses information and is confusing.

group_keys() explains the grouping structure, by returning a data frame that has one row per group and one column per grouping variable.

For your example:

mtcars %>% 
  select(1:3) %>% 
  mutate(GRP_A = sample(LETTERS[1:2], n(), replace = TRUE),
         GRP_B = sample(c(1:2), n(), replace = TRUE)) %>% 
  group_split(GRP_A, GRP_B) %>% 
  map(summary)

EDIT: this answer is now outdated. See @MartijnVanAttekum's solution above.


The "tidy" solution seems to be a combination of "mutate + list-cols + purrr" according to Hadley.


library(tidyverse) 
library(magrittr) 

# group, nest, create a new col leveraging purrr::map()
mt_summary <- 
    mtcars %>% 
    select(1:3) %>% 
    mutate(GRP_A = sample(LETTERS[1:2],  n(), replace = TRUE), 
           GRP_B = sample(c(1:2), n(), replace = TRUE)) %>% 
    group_by(GRP_A, GRP_B) %>% 
    nest() %>% 
    mutate(SUMMARY = map(data, .f = summary))

# check the structure
mt_summary
#> # A tibble: 4 × 4
#>   GRP_A GRP_B              data     SUMMARY
#>   <chr> <int>            <list>      <list>
#> 1     A     1 <tibble [11 × 3]> <S3: table>
#> 2     B     2  <tibble [9 × 3]> <S3: table>
#> 3     A     2  <tibble [7 × 3]> <S3: table>
#> 4     B     1  <tibble [5 × 3]> <S3: table>

# extract the summaries
extract2(mt_summary, "SUMMARY") %>% 
    set_names(paste0(extract2(mt_summary, "GRP_A"), 
                     extract2(mt_summary, "GRP_B")))
#> $A1
#>       mpg             cyl             disp      
#>  Min.   :10.40   Min.   :4.000   Min.   : 75.7  
#>  1st Qu.:15.25   1st Qu.:4.000   1st Qu.:120.9  
#>  Median :19.20   Median :6.000   Median :167.6  
#>  Mean   :20.43   Mean   :6.182   Mean   :229.0  
#>  3rd Qu.:25.85   3rd Qu.:8.000   3rd Qu.:309.5  
#>  Max.   :30.40   Max.   :8.000   Max.   :460.0  
#> 
#> $B2
#>       mpg             cyl             disp      
#>  Min.   :15.20   Min.   :4.000   Min.   : 78.7  
#>  1st Qu.:17.80   1st Qu.:4.000   1st Qu.:120.3  
#>  Median :19.20   Median :6.000   Median :167.6  
#>  Mean   :20.84   Mean   :6.222   Mean   :225.9  
#>  3rd Qu.:21.50   3rd Qu.:8.000   3rd Qu.:351.0  
#>  Max.   :32.40   Max.   :8.000   Max.   :400.0  
#> 
#> $A2
#>       mpg             cyl             disp      
#>  Min.   :15.20   Min.   :4.000   Min.   : 71.1  
#>  1st Qu.:18.90   1st Qu.:4.000   1st Qu.:114.5  
#>  Median :21.40   Median :6.000   Median :145.0  
#>  Mean   :21.79   Mean   :5.429   Mean   :176.0  
#>  3rd Qu.:22.10   3rd Qu.:6.000   3rd Qu.:241.5  
#>  Max.   :33.90   Max.   :8.000   Max.   :304.0  
#> 
#> $B1
#>       mpg             cyl           disp      
#>  Min.   :10.40   Min.   :4.0   Min.   :140.8  
#>  1st Qu.:13.30   1st Qu.:8.0   1st Qu.:275.8  
#>  Median :14.30   Median :8.0   Median :350.0  
#>  Mean   :15.62   Mean   :7.2   Mean   :319.7  
#>  3rd Qu.:17.30   3rd Qu.:8.0   3rd Qu.:360.0  
#>  Max.   :22.80   Max.   :8.0   Max.   :472.0

Tags:

R

Tidyverse

Purrr