Forecasting with statsmodels

It sounds like you are using an older version of statsmodels that does not support SARIMAX. You'll want to install the latest released version 0.8.0 see http://statsmodels.sourceforge.net/devel/install.html.

I'm using Anaconda and installed via pip.

pip install -U statsmodels

The results class from the SARIMAX model have a number of useful methods including forecast.

data['Forecast'] = results.forecast(100)

Will use your model to forecast 100 steps into the future.


ARIMA(1,0,0) is a one period autoregressive model. So it's a model that follows this formula:

enter image description here

What that means is that the value in time period t is equal to some constant (phi_0) plus a value determined by fitting the ARMA model (phi_1) multiplied by the value in the prior period r_(t-1), plus a white noise error term (a_t).

Your model only has a memory of 1 period, so the current prediction is entirely determined by the 1 value of the prior period. It's not a very complex model; it's not doing anything fancy with all the prior values. It's just taking yesterday's price, multiplying it by some value and adding a constant. You should expect it to quickly go to equilibrium and then stay there forever.

The reason why the forecast in the top picture looks so good is that it is just showing you hundreds of 1 period forecasts that are starting fresh with each new period. It's not showing a long period prediction like you probably think it is.

Looking at the link you sent:

http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/

read the section where he discusses why this model doesn't give you what you want.

"So at first glance it seems like this model is doing pretty well. But although it appears like the forecasts are really close (the lines are almost indistinguishable after all), remember that we used the un-differenced series! The index only fluctuates a small percentage day-to-day relative to the total absolute value. What we really want is to predict the first difference, or the day-to-day moves. We can either re-run the model using the differenced series, or add an "I" term to the ARIMA model (resulting in a (1, 1, 0) model) which should accomplish the same thing. Let's try using the differenced series."

To do what you're trying to do, you'll need to do more research into these models and figure out how to format your data, and what model will be appropriate. The most important thing is knowing what information you believe is contained in the data you're feeding into the model. What your model currently is trying to do is say, "Today the price is $45. What will the price be tomorrow?" That's it. It doesn't have any information about momentum, volatility, etc. That's not much to go off.