dealing with NA in seasonal cycle analysis R

My first solution, simply manually calculating the seasonal cycle, converting to a dataframe to subtract the vector and then transforming back.

# seasonal cycle
scycle=tapply(c2,cycle(c2),mean,na.rm=T) 
# converting to df
df=tapply(c2, list(year=floor(time(c2)), month = cycle(c2)), c)
# subtract seasonal cycle
for (i in 1:nrow(df)){df[i,]=df[i,]-scycle}
# convert back to timeseries
anomco2=ts(c(t(df)),start=start(c2),freq=12)

Not very pretty, and not very efficient either.

The comment of missuse lead me to another Seasonal decompose of monthly data including NA in r I missed with a near duplicate question and this suggested the package zoo, which seems to work really well for additive series

library(zoo)
c2=co2
c2[c2>330&c2<350]=NA
d=decompose(na.StructTS(c2)) 
plot(co2)
lines(d$x,col="red")

shows that the series is very well reconstructed through the missing period.

black lines shows Co2 series with missing chunk and the red line is the reconstructed series

The output of deconstruct has the trend and seasonal cycle available. I wish I could transfer my bounty to user https://stackoverflow.com/users/516548/g-grothendieck for this helpful response. Thanks to user missuse too.

However, if the missing portion is at the end of the series, the software has to extrapolate the trend and has more difficulties. The original series (in black) maintains the trend, while the trend is smaller in the reconstructed series (red):

c2=co2
c2[c2>350]=NA
d=decompose(na.StructTS(c2)) 
plot(co2)
lines(d$x,col="red")

extrapolation of data using zoo

Lastly, if instead the missing portion is at the start of the series, the software is unable to extrapolate backwards in time and throws an error... I feel another SO question coming on...

c2=co2
c2[c2<330]=NA
d=decompose(na.StructTS(c2)) 

Error in StructTS(y) :  
the first value of the time series must not be missing

You could just use some algorithm that fills the missing data before. (e.g. from package imputeTS or zoo)

imputeTS for example has extra imputation algorithms for seasonal time series e.g.:

x <- na.seadec(co2)

Another good option for seasonal data:

x <- na.kalman(co2)

And now just go on without the missing data.

An important hint from Adrian Tompkins (see also comment below): This will work best, when the missing data is somewhere in the middle. For a lot of leading NAs the method is no good choice. In this case it fills the NAs, but it is not able to extrapolate the trend backwards:

c2<-co2
c2[c2<330]<-NA
c3<-na.kalman(c2)
c4<-na.seadec(c2)
plot(co2)
lines(c3,col="blue")
lines(c4,col="red")

enter image description here

Tags:

R

Time Series

Xts