Collapsing rows by user with dplyr

The way that I would approach this would be to convert your data to long form first, then do the aggregation, and convert back out to wide form if necessary for display purposes.

So, using tidyr,

df %>% gather(rating, count, -User) %>%
  group_by(User, rating) %>%
  summarise(count = max(count)) %>% 
  spread(rating, count)

The first gather converts to long form (using p instead of +):

> df <- read.table(header=TRUE, text='User  p1  p2  p3  p4  p5
   A   1   0   0   0   0
   A   0   1   0   0   0
   A   0   0   0   0   1
   B   0   0   1   0   0 
   B   0   0   0   1   0
')
> df %>% gather(rating, count, -User)
   User rating count
1     A     p1     1
2     A     p1     0
3     A     p1     0
4     B     p1     0
5     B     p1     0
6     A     p2     0
...

And the remaining steps perform the aggregation, then transform back to wide format.

Looks like you can use summarise_each:

df %>% group_by(User) %>% summarise_all(funs(sum))

Edit note: replaced summarise_each whicih is now deprecated with summarise_all

Here's alternatve dplyr solution

df %>% group_by(User) %>% do(as.list(colSums(.)))

Or a data.table possible implementation

library(data.table)
setDT(df)[, lapply(.SD, sum), User]

setDT(df)[, as.list(colSums(.SD)), User]

Or with base R, even simpler

aggregate(. ~ User, df, sum)

Collapsing rows by user with dplyr

Tags:

R

Dplyr

Related

Recent Posts