ddply with lm() function

What Ramnath explanted is exactly right. But I'll elaborate a bit.

ddply expects a data frame in and then returns a data frame out. The lm() function takes a data frame as an input but returns a linear model object in return. You can see that by looking at the docs for lm via ?lm:

Value

lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm").

So you can't just shove the lm objects into a data frame. Your choices are either to coerce the output of lm into a data frame or you can shove the lm objects into a list instead of a data frame.

So to illustrate both options:

Here's how to shove the lm objects into a list (very much like what Ramnath illustrated):

outlist <- dlply(mydf, "x3", function(df)  lm(y ~ x1 + x2, data=df))

On the flip side, if you want to extract only the coefficients you can create a function that runs the regression and then returns only the coefficients in the form of a data frame like this:

myLm <- function( formula, df ){
  lmList <- lm(formula, data=df)
  lmOut <- data.frame(t(lmList$coefficients))
  names(lmOut) <- c("intercept","x1coef","x2coef")
  return(lmOut)
}

outDf <- ddply(mydf, "x3", function(df)  myLm(y ~ x1 + x2, df))

Here is what you need to do.

mods = dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)

mods is a list of two objects containing the regression results. you can extract what you need from mods. for example, if you want to extract the coefficients, you could write

coefs = ldply(mods, coef)

This gives you

  x3 (Intercept)         x1 x2
1  1    11.71015 -0.3193146 NA
2  2    21.83969 -1.4677690 NA

EDIT. If you want ANOVA, then you can just do

ldply(mods, anova)

  x3 Df    Sum Sq   Mean Sq   F value     Pr(>F)
1  1  1  2.039237  2.039237 0.4450663 0.52345980
2  1  8 36.654982  4.581873        NA         NA
3  2  1 43.086916 43.086916 4.4273907 0.06849533
4  2  8 77.855187  9.731898        NA         NA

Use this

mods <- dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)
coefs <- llply(mods, coef)

$`1`
(Intercept)          x1          x2 
 11.7101519  -0.3193146          NA 

$`2`
(Intercept)          x1          x2 
  21.839687   -1.467769          NA 



anovas <- llply(mods, anova)

$`1`
Analysis of Variance Table

Response: y
      Df Sum Sq Mean Sq F value Pr(>F)
x1         1  2.039  2.0392  0.4451 0.5235
Residuals  8 36.655  4.5819               

$`2`
Analysis of Variance Table

Response: y
      Df Sum Sq Mean Sq F value Pr(>F)  
x1         1 43.087  43.087  4.4274 0.0685 .
Residuals  8 77.855   9.732                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Tags:

R

Dataframe

Plyr