Whether you want to predict the temperature of the environment or estimate electricity consumption for the next few months, stationarity of time series models is necessary if you want the forecasting model to work well. Most of the time series models are non-stationary either it gives an upward or downward trend or seasonal effects.
The main aim of this article is to discuss the methods for checking the stationarity in time series data. We will do the experiments on the time series data to check this.
In the above plot we observe that data do not change over time. It won’t show any trend or seasonal effect. Mean and variance remains constant over time.
Plot show data gives a trend or seasonal effect with respect to time. Mean and variance change over time.
About the Dataset
The dataset is taken from Kaggle, you can find it here. It has information about time and electric production. It has 2 columns(time and value) and 397 rows.
The code is implemented in Google Colab and .pynb file download to local device.
# Mounting drive from google.colab import drive drive.mount('/content/drive') #Path to directory import os os.chdir('/content/drive/My Drive/AIM')
First step is to mount the drive and set the path of the directory.
#Importing Train Dataset import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from statsmodels.tsa.stattools import adfuller train = pd.read_csv("Electric_Production.csv")
Procedure to Checking Stationarity
#Plot the Graph plt.plot(train["Value"],color="lightblue")
Looking at the plot we can observe there is an upward trend over the period of time.
#Plot the Histogram plt.hist(train["Value"],color="lightblue")
The plot shows a slightly skewed distribution. The histogram doesn’t show normal distribution over a period of time.
From the above plots, we can conclude the time series data is non-stationary.
- Summary Statistics
#Split the data train_1=train[0:199] train_2=train[200:397]
We will proceed by splitting the data into two parts so that we can then check the mean and variance of the data.
#Mean of data train_1.mean()
#Variance of data train_1.var()
Looking at the above results the mean and variance of the first part is very different from the second part. It shows another indication of non-stationary time series data.
- Augmented Dickey-Fuller Test
Augmented Dickey-Fuller Test is a common statistical test used to test whether a given Time series is stationary or not. We can achieve this by defining the null and alternate hypothesis.
- Null Hypothesis: Time Series is stationary. It gives a time-dependent trend.
- Alternate Hypothesis: Time Series is non-stationary. In another term, the series doesn’t depend on time.
- ADF or t Statistic < critical values: Accept the null hypothesis. Time series is stationary.
- ADF or t Statistic > critical values: Failed to reject the null hypothesis. The time series is non-stationary
#ADF statistic to check stationarity t = train["Value"].values timeseries = adfuller(t) print('ADF Statistic: %f' % result) print('p-value: %f' % result) print('Critical Values:') for key, value in timeseries.items(): print('\t%s: %.3f' % (key, value)) if timeseries > timeseries["5%"]: print ("Failed to Reject Ho - Time Series is Non-Stationary") else: print ("Reject Ho - Time Series is Stationary")
From the above result, we observe ADF statistic is greater than critical values. So we fail to reject the null hypothesis. Time series is non-stationary.
In this article, we have applied different techniques to check whether the time series is stationary or not. As most of the time-series data show a trend which is critical for the performance of the forecasting models. So we can expand our future research to make the time series stationary.
The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.