A Comprehensive Guide To Regression Techniques For Time Series Forecasting

This article is about various regression techniques used to forecast timeseries problem

Advertisement

In mathematics, time series is a series of data points listed with respect to time; most commonly, it is a sequence taken at successive equal intervals point in time. Common examples of time series are daily closing values of the stock market, counts of sunspots etc. Time series analysis comprises methods for analysing time-series data to extract meaningful statistical information and other data characteristics. In contrast, time series forecasting uses a model to predict future values based on previously observed values. In this article, we are going to explore the following regression techniques used for time series forecasting;

  1. AR and MA
  2. ARIMA
  3. SARIMA
  4. VAR

Code implementation of Regression Techniques

The dataset we are using for all the techniques remains the same and can be found here. The dataset contains weather data collected for the city of Delhi for four years, from 2013 to 2017.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
 import pandas as pd
 data = pd.read_csv('DailyDelhiClimateTrain.csv')
 data.head() 

Lets plot the line chart for humidity.

import plotly.express as px
fig = px.line(data, x=data.date, y='humidity', title='Humidity with slider')
fig.update_xaxes(rangeslider_visible=True)
fig.show() 
1. Autoregressive and Moving Average (AR and MA):

In multiple regression models, we forecast variables of interest using a linear combination of predictors. Here in the autoregressive model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates it is a regression of variables against itself.   

The model can be formulated as;

Where: Yt is the value of time series at time t

C is the intercept 

Ø is the slope coefficient

Yt-p is the lagged values of time series

  ε is the error term 

This method is suitable for univariate time series without trend and a seasonal component.

Code Implementation

 # AR example
 from statsmodels.tsa.ar_model import AutoReg
 # fit model
 train,test = data[0:1000],data[1000:]
 model = AutoReg(train.humidity, lags=350)
 model_fit = model.fit()
 # make prediction
 pred = model_fit.predict(len(train),len(test)+len(train)-1,dynamic=False)
 plt.plot(test.humidity)
 plt.plot(pred,color='red') 

Rather than using past forecast values in regression, a moving average model uses past forecast errors in a regression-like model. In other words, the moving average models the next sequence as a linear function of residual error from the mean process at an earlier time step. Thus, it combines both autoregressive and moving average models. 

This method is suitable for univariate time series without trend and seasonal component.

Code Implementation: 

 #MA model
 from statsmodels.tsa.arima.model import ARIMA
 # fit model
 model = ARIMA(train.humidity,order=(300,0,0))
 model_fit = model.fit()
 # make prediction
 pred = model_fit.predict(len(train),len(test)+len(train)-1)
 plt.plot(test.humidity)
 plt.plot(pred,color='red') 
2. Autoregressive integrated moving average (ARIMA):

It explicitly creates a suite of standard structure in time series data and it provides a simple and powerful method for forecasting. It combines both autoregressive and moving average models as well as a differencing pre-processing step of the sequence to make the sequence stationary. 

This method supports univariate time series with trend and without seasonal component. 

The statsmodel library provides the capability to fit ARIMA models.

Code Implementation:

 from statsmodels.tsa.arima.model import ARIMA
 train,test = data.humidity[0:1000],data.humidity[1000:]
 X = train
 size = int(len(X) * 0.66)
 train, test = X[0:size], X[size:len(X)]
 history = [x for x in train]
 predictions = list()
 for i in range(len(test)):
   model = ARIMA(history, order=(5,1,0))
   model_fit = model.fit()
   output = model_fit.forecast()
   pred = output[0]
   predictions.append(pred)
   true = test[i]
   history.append(obs)
   print('predicted=%f, expected=%f' % (pred, true))
 plt.plot(test)
 plt.plot(predictions, color='red') 
3. Seasonal Autoregressive integrated moving average (SARIMA): 

An extension of ARIMA that supports the direct modeling of the seasonal component of the series is called SARIMA. The problem with ARIMA is that it does not support seasonal data i.e repeating cycles. ARIMA expects data that is not seasonal or seasonal component removed 

SARIMA adds the three hyperparameters to specify the AR, differencing and moving average for the seasonal component of series 

This model suitable for univariate time series with trend and seasonal component.

Code Implementation:

 from statsmodels.tsa.statespace.sarimax import SARIMAX
 size = int(len(X) * 0.66)
 train, test = X[0:size], X[size:len(X)]
 history = [x for x in train]
 predictions = list()
 # walk-forward validation
 for t in range(len(test)):
   model = SARIMAX(history,seasonal_order=(3, 1, 0, 2))
   model_fit = model.fit()
   output = model_fit.forecast()
   pred = output[0]
   predictions.append(pred)
   true = test[t]
   history.append(true)
   print('predicted=%f, expected=%f' % (pred, true))
 plt.plot(test)
 plt.plot(predictions, color='red') 
4. Vector Autoregression (VAR):

The vector autoregression model can predict when two or more time series influence each other means the relationship involved in time series is bi-directional. This model considers each variable as a function of past values that are to be predicted, nothing but the time lag of the series. For all this, it considers an autoregressive model. 

The main difference between the previous model and VAR is, those models are unidirectional, where predictors influence the Y but not vice-versa. Whereas the VAR model is bidirectional, variables influence each other.

This model is suitable for multivariate time series without trend and seasonal components.

Code Implementation:

Load multiple variables:

 x1 = data.humidity.values
 x2 = data.meantemp.values
 list1 = list()
 for i in range(len(x1)):
     x3 = x1[i]
     x4 = x2[i]
     row1 = [x3,x4]
     list1.append(row1) 

Fit and forecast to few steps

 from statsmodels.tsa.vector_ar.var_model import VAR
 # fit model
 model = VAR(list1)
 model_fit = model.fit()
 # make prediction
 forecast = model_fit.forecast(model_fit.y, steps=5)
 print(forecast) 

Output:

 [[95.76561271 10.57589906]
  [92.08148688 11.10511153]
  [88.87374484 11.59330815]
  [86.07847799 12.04540676]
  [83.64040052 12.46567364]] 

Conclusion

This article has seen the major techniques used to forecast time series entities with a practical use case. The most time-consuming thing in the univariate techniques is adjusting the lag values; the proper lag value decides the nature of forecasting. The rest of the techniques are straightforward.  

References

More Great AIM Stories

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM