Combine result from top_n with an "Other" category in dplyr

Instead of top_n, this seems like a good case for the convenience function tally. It uses summarise, sum and arrange under the hood.

Then use factor to create an "Other" category. Use the levels argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).

If "Country" is factor in your original data, you may wrap Country[1:3] in as.character.

group_by(df, Country) %>%
  tally(Count, sort = TRUE) %>%
  group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
                            levels = c(Country[1:3], "Other"))) %>%
  tally(n) 

#  Country     n
#   (fctr) (int)
#1     AUS     6
#2     JPN     5
#3     USA     5
#4   Other     7

We could do this in two steps: first create a sorted data.frame, and then rbind the top three rows with a summary of the last rows:

d <- df %>% group_by(Country) %>% summarise(Count = sum(Count)) %>% arrange(desc(Count))

rbind(top_n(d,3),
      slice(d,4:n()) %>% summarise(Country="other",Count=sum(Count))
      )

output

  Country Count
   (fctr) (int)
1     AUS     6
2     JPN     5
3     USA     5
4   other     7

Here is an option using data.table. We convert the 'data.frame' to 'data.table' (setDT(dat1)), grouped by 'Country we get the sum of 'Count', then order by 'Count', we rbind the first three observations with the list of 'Others' and the sum of 'Count' of the rest of the observations.

library(data.table)
setDT(dat1)[, list(Count=sum(Count)), Country][order(-Count),
  rbind(.SD[1:3], list(Country='Others', Count=sum(.SD[[2]][4:.N]))) ]
#   Country Count
#1:     AUS     6
#2:     USA     5
#3:     JPN     5
#4:  Others     7

Or using base R

 d1 <- aggregate(.~Country, dat1, FUN=sum)
 i1 <- order(-d1$Count)
 rbind(d1[i1,][1:3,], data.frame(Country='Others', 
     Count=sum(d1$Count[i1][4:nrow(d1)])))

You can use fct_lump from the forcats library

dat1 %>%
  group_by(fct_lump(Country, n = 3, w = Count)) %>%
  summarize(Count = sum(Count))

This should do it, also you can change the "Other" label using the other_level param inside fct_lump

Tags:

R

Dplyr