Now Reading
A Comprehensive Guide To Regression Techniques For Time Series Forecasting

A Comprehensive Guide To Regression Techniques For Time Series Forecasting

In mathematics, time series is a series of data points listed with respect to time; most commonly, it is a sequence taken at successive equal intervals point in time. Common examples of time series are daily closing values of the stock market, counts of sunspots etc. Time series analysis comprises methods for analysing time-series data to extract meaningful statistical information and other data characteristics. In contrast, time series forecasting uses a model to predict future values based on previously observed values. In this article, we are going to explore the following regression techniques used for time series forecasting;

  1. AR and MA
  2. ARIMA
  3. SARIMA
  4. VAR

Code implementation of Regression Techniques

The dataset we are using for all the techniques remains the same and can be found here. The dataset contains weather data collected for the city of Delhi for four years, from 2013 to 2017.

 import pandas as pd
 data = pd.read_csv('DailyDelhiClimateTrain.csv')
 data.head() 

Lets plot the line chart for humidity.

import plotly.express as px
fig = px.line(data, x=data.date, y='humidity', title='Humidity with slider')
fig.update_xaxes(rangeslider_visible=True)
fig.show() 
1. Autoregressive and Moving Average (AR and MA):

In multiple regression models, we forecast variables of interest using a linear combination of predictors. Here in the autoregressive model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates it is a regression of variables against itself.   

The model can be formulated as;

Where: Yt is the value of time series at time t

C is the intercept 

Ø is the slope coefficient

Yt-p is the lagged values of time series

  ε is the error term 

This method is suitable for univariate time series without trend and a seasonal component.

Code Implementation

 # AR example
 from statsmodels.tsa.ar_model import AutoReg
 # fit model
 train,test = data[0:1000],data[1000:]
 model = AutoReg(train.humidity, lags=350)
 model_fit = model.fit()
 # make prediction
 pred = model_fit.predict(len(train),len(test)+len(train)-1,dynamic=False)
 plt.plot(test.humidity)
 plt.plot(pred,color='red') 

Rather than using past forecast values in regression, a moving average model uses past forecast errors in a regression-like model. In other words, the moving average models the next sequence as a linear function of residual error from the mean process at an earlier time step. Thus, it combines both autoregressive and moving average models. 

This method is suitable for univariate time series without trend and seasonal component.

Code Implementation: 

 #MA model
 from statsmodels.tsa.arima.model import ARIMA
 # fit model
 model = ARIMA(train.humidity,order=(300,0,0))
 model_fit = model.fit()
 # make prediction
 pred = model_fit.predict(len(train),len(test)+len(train)-1)
 plt.plot(test.humidity)
 plt.plot(pred,color='red') 
2. Autoregressive integrated moving average (ARIMA):

It explicitly creates a suite of standard structure in time series data and it provides a simple and powerful method for forecasting. It combines both autoregressive and moving average models as well as a differencing pre-processing step of the sequence to make the sequence stationary. 

This method supports univariate time series with trend and without seasonal component. 

The statsmodel library provides the capability to fit ARIMA models.

Code Implementation:

 from statsmodels.tsa.arima.model import ARIMA
 train,test = data.humidity[0:1000],data.humidity[1000:]
 X = train
 size = int(len(X) * 0.66)
 train, test = X[0:size], X[size:len(X)]
 history = [x for x in train]
 predictions = list()
 for i in range(len(test)):
   model = ARIMA(history, order=(5,1,0))
   model_fit = model.fit()
   output = model_fit.forecast()
   pred = output[0]
   predictions.append(pred)
   true = test[i]
   history.append(obs)
   print('predicted=%f, expected=%f' % (pred, true))
 plt.plot(test)
 plt.plot(predictions, color='red') 
3. Seasonal Autoregressive integrated moving average (SARIMA): 

An extension of ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA. The problem with ARIMA is that it does not support seasonal data i.e repeating cycles. ARIMA expects data that is not seasonal or seasonal component removed 

SARIMA adds the three hyperparameters to specify the AR, differencing and moving average for the seasonal component of series 

See Also

This model suitable for univariate time series with trend and seasonal component.

Code Implementation:

 from statsmodels.tsa.statespace.sarimax import SARIMAX
 size = int(len(X) * 0.66)
 train, test = X[0:size], X[size:len(X)]
 history = [x for x in train]
 predictions = list()
 # walk-forward validation
 for t in range(len(test)):
   model = SARIMAX(history,seasonal_order=(3, 1, 0, 2))
   model_fit = model.fit()
   output = model_fit.forecast()
   pred = output[0]
   predictions.append(pred)
   true = test[t]
   history.append(true)
   print('predicted=%f, expected=%f' % (pred, true))
 plt.plot(test)
 plt.plot(predictions, color='red') 
4. Vector Autoregression (VAR):

The vector autoregression model can predict when two or more time series influence each other means the relationship involved in time series is bi-directional. This model considers each variable as a function of past values that are to be predicted, nothing but the time lag of the series. For all this, it considers an autoregressive model. 

The main difference between the previous model and VAR is, those models are unidirectional, where predictors influence the Y but not vice-versa. Whereas the VAR model is bidirectional, variables influence each other.

This model is suitable for multivariate time series without trend and seasonal components.

Code Implementation:

Load multiple variables:

 x1 = data.humidity.values
 x2 = data.meantemp.values
 list1 = list()
 for i in range(len(x1)):
     x3 = x1[i]
     x4 = x2[i]
     row1 = [x3,x4]
     list1.append(row1) 

Fit and forecast to few steps

 from statsmodels.tsa.vector_ar.var_model import VAR
 # fit model
 model = VAR(list1)
 model_fit = model.fit()
 # make prediction
 forecast = model_fit.forecast(model_fit.y, steps=5)
 print(forecast) 

Output:

 [[95.76561271 10.57589906]
  [92.08148688 11.10511153]
  [88.87374484 11.59330815]
  [86.07847799 12.04540676]
  [83.64040052 12.46567364]] 

Conclusion

This article has seen the major techniques used to forecast time series entities with a practical use case. The most time-consuming thing in the univariate techniques is adjusting the lag values; the proper lag value decides the nature of forecasting. The rest of the techniques are straightforward.  

References

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top