Reshape multiple categorical variables to binary response variables

Since they say variety is the spice of life, here's an approach in base R using table:

table(cbind(mydata[1], 
            actor = unlist(mydata[-1], use.names=FALSE)))
#           actor
# movie      Jack Leo Kate
#   Departed    1   1    0
#   Titanic     0   1    1

The above output is a matrix of class table. To get a data.frame, use as.data.frame.matrix.

as.data.frame.matrix(table(
  cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
#          Jack Leo Kate
# Departed    1   1    0
# Titanic     0   1    1

One way to reshape your data.frame is with the reshape2 package, using melt and dcast. For example:

library(reshape2)
long.mydata <- melt(mydata, id.vars = "movie")
wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)

Pay attention to the fun.aggregate and fill parameters in dcast, which control what goes to fill in the interior after casting.

How much spice is too much? Here is a solution via tidyr:

library(dplyr)
library(tidyr)

mydata %>%
  gather(actor,name,starts_with("actor")) %>%
  mutate(present = 1) %>%
  select(-actor) %>%
  spread(name,present,fill = 0)

       movie Jack Kate Leo
 1 Departed    1    0   1
 2  Titanic    0    1   1

Reshape multiple categorical variables to binary response variables

Tags:

R

Plyr

Reshape

Reshape2

Related

Recent Posts