ACF confidence intervals in R vs python: why are they different?

It has been shown that the autocorrelation coefficient r(k) follows a Gaussian distribution with variance Var(r(k)).

As you have found, in R, the variance is simply calculated as Var(r(k)) = 1/N for all k. While, in python, the variance is calculated using Bartlett’s formula, where Var(r(k)) = 1/N (1 + 2(r(1)^2+r(2)^2+...+r(k-1)^2)). This results in the first increasing, then flattening confidence level shown above.

Source code of ACF variances in python:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)

These two distinct formulas are based on different assumptions. The former assumes an i.i.d process and r(k) = 0 for all k != 0, while the later assumes a MA process with order of k-1 where ACF "cuts tail" after lag k.


Not really an answer to the theory part of this (which might be better on CrossValidated), but maybe useful ... ?

If you go to the documentation page for statsmodels.tsa.stattools.acf it gives you an option to browse the source code. The code there is:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)
interval = stats.norm.ppf(1 - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(acf - interval, acf + interval))

In contrast, the R source code for plot.acf shows

clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used) else c(0, 0)

where ci is the confidence level (default=0.95).