Creating a contingency table using multiple columns in a data frame in R

One way using dplyr would be:

library(dplyr)
df %>% 
  #group by the varialbe cl
  group_by(cl) %>%
  #sum every column
  summarize_each(funs(sum)) %>%
  #select the three needed columns
  select(ab, bc, de) %>%
  #transpose the df
  t

Output:

   [,1] [,2] [,3]
ab    1    3    2
bc    2    3    1
de    2    3    1

Your data is in a half-long half-wide format, and you want it in a fully wide format. This is easiest if we first covert it to a fully long format:

library(reshape2)
df_long = melt(df, id.vars = "cl")
head(df_long)
#    cl variable value
# 1   1       ab     0
# 2   2       ab     1
# 3   3       ab     1
# 4   1       ab     1
# 5   2       ab     1
# 6   3       ab     0

Then we can turn it into a wide format, using sum as the aggregating function:

dcast(df_long, variable ~ cl, fun.aggregate = sum)
#   variable 1 2 3
# 1       ab 1 3 2
# 2       bc 2 3 1
# 3       de 2 3 1

In base R:

t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum)))
#   1 2 3
#ab 1 3 2
#bc 2 3 1
#de 2 3 1

Tags:

R

Contingency