How calculate growth rate in long format data frame?

Using R base function (ave)

> dfdf$Growth <- with(df, ave(Value, Category, 
                      FUN=function(x) c(NA, diff(x)/x[-length(x)]) ))
> df
   Category Year Value     Growth
1         A 2010     1         NA
2         A 2011     2 1.00000000
3         A 2012     3 0.50000000
4         A 2013     4 0.33333333
5         A 2014     5 0.25000000
6         A 2015     6 0.20000000
7         B 2010     7         NA
8         B 2011     8 0.14285714
9         B 2012     9 0.12500000
10        B 2013    10 0.11111111
11        B 2014    11 0.10000000
12        B 2015    12 0.09090909

@Ben Bolker's answer is easily adapted to ave:

transform(df, Growth=ave(Value, Category, 
                         FUN=function(x) c(NA,exp(diff(log(x)))-1)))

You can simply use dplyr package:

> df %>% group_by(Category) %>% mutate(Growth = (Value - lag(Value))/lag(Value))  

which will produce the following result:

# A tibble: 12 x 4
# Groups:   Category [2]
   Category  Year Value  Growth
   <fct>    <int> <int>   <dbl>
 1 A         2010     1 NA     
 2 A         2011     2  1     
 3 A         2012     3  0.5   
 4 A         2013     4  0.333 
 5 A         2014     5  0.25  
 6 A         2015     6  0.2   
 7 B         2010     7 NA     
 8 B         2011     8  0.143 
 9 B         2012     9  0.125 
10 B         2013    10  0.111 
11 B         2014    11  0.1   
12 B         2015    12  0.0909

For these sorts of questions ("how do I compute XXX by category YYY")? there are always solutions based on by(), the data.table() package, and plyr. I generally prefer plyr, which is often slower, but (to me) more transparent/elegant.

df <- data.frame(Category=c(rep("A",6),rep("B",6)),
  Year=rep(2010:2015,2),Value=1:12)


library(plyr)
ddply(df,"Category",transform,
         Growth=c(NA,exp(diff(log(Value)))-1))

The main difference between this answer and @krlmr's is that I am using a geometric-mean trick (taking differences of logs and then exponentiating) while @krlmr computes an explicit ratio.

Mathematically, diff(log(Value)) is taking the differences of the logs, i.e. log(x[t+1])-log(x[t]) for all t. When we exponentiate that we get the ratio x[t+1]/x[t] (because exp(log(x[t+1])-log(x[t])) = exp(log(x[t+1]))/exp(log(x[t])) = x[t+1]/x[t]). The OP wanted the fractional change rather than the multiplicative growth rate (i.e. x[t+1]==x[t] corresponds to a fractional change of zero rather than a multiplicative growth rate of 1.0), so we subtract 1.

I am also using transform() for a little bit of extra "syntactic sugar", to avoid creating a new anonymous function.

Tags:

Math

R

Dataframe