In mathematics, time series is a series of data points listed with respect to time; most commonly, it is a sequence taken at successive equal intervals point in time. Common examples of time series are daily closing values of the stock market, counts of sunspots etc. Time series analysis comprises methods for analysing time-series data to extract meaningful statistical information and other data characteristics. In contrast, time series forecasting uses a model to predict future values based on previously observed values. In this article, we are going to explore the following regression techniques used for time series forecasting;
- AR and MA
Code implementation of Regression Techniques
The dataset we are using for all the techniques remains the same and can be found here. The dataset contains weather data collected for the city of Delhi for four years, from 2013 to 2017.
import pandas as pd data = pd.read_csv('DailyDelhiClimateTrain.csv') data.head()
Lets plot the line chart for humidity.
import plotly.express as px fig = px.line(data, x=data.date, y='humidity', title='Humidity with slider') fig.update_xaxes(rangeslider_visible=True) fig.show()
1. Autoregressive and Moving Average (AR and MA):
In multiple regression models, we forecast variables of interest using a linear combination of predictors. Here in the autoregressive model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates it is a regression of variables against itself.
The model can be formulated as;
Where: Yt is the value of time series at time t
C is the intercept
Ø is the slope coefficient
Yt-p is the lagged values of time series
ε is the error term
This method is suitable for univariate time series without trend and a seasonal component.
# AR example from statsmodels.tsa.ar_model import AutoReg # fit model train,test = data[0:1000],data[1000:] model = AutoReg(train.humidity, lags=350) model_fit = model.fit() # make prediction pred = model_fit.predict(len(train),len(test)+len(train)-1,dynamic=False) plt.plot(test.humidity) plt.plot(pred,color='red')
Rather than using past forecast values in regression, a moving average model uses past forecast errors in a regression-like model. In other words, the moving average models the next sequence as a linear function of residual error from the mean process at an earlier time step. Thus, it combines both autoregressive and moving average models.
This method is suitable for univariate time series without trend and seasonal component.
#MA model from statsmodels.tsa.arima.model import ARIMA # fit model model = ARIMA(train.humidity,order=(300,0,0)) model_fit = model.fit() # make prediction pred = model_fit.predict(len(train),len(test)+len(train)-1) plt.plot(test.humidity) plt.plot(pred,color='red')
2. Autoregressive integrated moving average (ARIMA):
It explicitly creates a suite of standard structure in time series data and it provides a simple and powerful method for forecasting. It combines both autoregressive and moving average models as well as a differencing pre-processing step of the sequence to make the sequence stationary.
This method supports univariate time series with trend and without seasonal component.
The statsmodel library provides the capability to fit ARIMA models.
from statsmodels.tsa.arima.model import ARIMA train,test = data.humidity[0:1000],data.humidity[1000:] X = train size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() for i in range(len(test)): model = ARIMA(history, order=(5,1,0)) model_fit = model.fit() output = model_fit.forecast() pred = output predictions.append(pred) true = test[i] history.append(obs) print('predicted=%f, expected=%f' % (pred, true)) plt.plot(test) plt.plot(predictions, color='red')
3. Seasonal Autoregressive integrated moving average (SARIMA):
An extension of ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA. The problem with ARIMA is that it does not support seasonal data i.e repeating cycles. ARIMA expects data that is not seasonal or seasonal component removed
SARIMA adds the three hyperparameters to specify the AR, differencing and moving average for the seasonal component of series
This model suitable for univariate time series with trend and seasonal component.
from statsmodels.tsa.statespace.sarimax import SARIMAX size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() # walk-forward validation for t in range(len(test)): model = SARIMAX(history,seasonal_order=(3, 1, 0, 2)) model_fit = model.fit() output = model_fit.forecast() pred = output predictions.append(pred) true = test[t] history.append(true) print('predicted=%f, expected=%f' % (pred, true)) plt.plot(test) plt.plot(predictions, color='red')
4. Vector Autoregression (VAR):
The vector autoregression model can predict when two or more time series influence each other means the relationship involved in time series is bi-directional. This model considers each variable as a function of past values that are to be predicted, nothing but the time lag of the series. For all this, it considers an autoregressive model.
The main difference between the previous model and VAR is, those models are unidirectional, where predictors influence the Y but not vice-versa. Whereas the VAR model is bidirectional, variables influence each other.
This model is suitable for multivariate time series without trend and seasonal components.
Load multiple variables:
x1 = data.humidity.values x2 = data.meantemp.values list1 = list() for i in range(len(x1)): x3 = x1[i] x4 = x2[i] row1 = [x3,x4] list1.append(row1)
Fit and forecast to few steps
from statsmodels.tsa.vector_ar.var_model import VAR # fit model model = VAR(list1) model_fit = model.fit() # make prediction forecast = model_fit.forecast(model_fit.y, steps=5) print(forecast)
[[95.76561271 10.57589906] [92.08148688 11.10511153] [88.87374484 11.59330815] [86.07847799 12.04540676] [83.64040052 12.46567364]]
This article has seen the major techniques used to forecast time series entities with a practical use case. The most time-consuming thing in the univariate techniques is adjusting the lag values; the proper lag value decides the nature of forecasting. The rest of the techniques are straightforward.