Active Hackathon

What are autocorrelation and partial autocorrelation in time series data?

In time series analysis and forecasting, autocorrelation and partial autocorrelation are frequently employed to analyze the data.

In time series analysis and forecasting, autocorrelation and partial autocorrelation are frequently employed to analyze the data. These are done through the plots that show the strength of a relationship between an observation in a time series and observations at previous time steps graphically. In this article, we will have a close look at the significance of both by combining theoretical and practical knowledge. Following are the major points listed that are to be discussed in this article.

Table of contents

  1. What is correlation?
  2. Basics of Autocorrelation and Partial Autocorrelation 
  3. Autocorrelation Function (ACF)
  4. Partial Autocorrelation Function (PACF)
  5. Implementing ACF and PACF in python

Let’s first discuss what correlation is.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

What is correlation?

In statistics, correlation or dependence refers to any statistical association between two random variables or bivariate data, whether causal or not. Correlation refers to any statistical association in the broadest sense, but it actually relates to the degree to which two variables are linearly connected. 

Correlations are helpful because they can reveal a predicted relationship that can be used in the real world. Based on the relationship between electricity demand and weather, an electrical company might produce less power on a mild day. Extreme weather causes individuals to consume more power for heating and cooling, therefore there is a causal relationship in this case.

To summarize the correlation between the variables, a statistical method known as Pearson’s correlation coefficient is frequently used to calculate the correlation. The Pearson’s correlation coefficient is a value between -1 and 1 that indicates whether a relationship is negative or positive. There is no association if the value is zero. We can see how distinct correlated data looks in the image below.

Basics of Autocorrelation and Partial Autocorrelation

The degree of resemblance between a certain time series and a lagged version of itself over subsequent time intervals is represented mathematically as autocorrelation. Autocorrelation is similar to the correlation between two different time series in theory, but it uses the same time series twice: once in its original form and again with one or more time periods added.

For example, If it is raining now, the autocorrelation implies that it will also rain tomorrow than if it is rainy today. When it comes to investment, a stock’s positive autocorrelation of returns may be strong, which implies that if it’s up today, it’s more likely to be up tomorrow.

A partial autocorrelation, on the other hand, is a description of the relationship between an observation in a time series and data from earlier time steps that do not take into account the correlations between the intervening observations. The correlation between observations at successive time steps is a linear function of the indirect correlations. These indirect connections are eliminated using the partial autocorrelation function.

Now let’s discuss these briefly which are often referred to as ACF and PACF.

Autocorrelation Function (ACF)

Autocorrelation is the relationship between two values in a time series. To put it another way, the time series data are correlated, hence the word. “Lags” are the term for these kinds of connections. When a characteristic is measured on a regular basis, such as daily, monthly, or yearly, time-series data is created. 

The number of intervals between two measurements is known as the lag. For example, there is a one-second lag between current and past observations. The lag grows to two if you go back to another interval, and so on.

The observations at yt and yt–k are separated by k time units in mathematical terms. The lag is denoted by K. Depending on the nature of the data, this lag can be measured in days, quarters, or years. When k=1, you’re evaluating observations that are next to each other. 

There is a correlation with each latency. The autocorrelation function (ACF) evaluates the correlation between observations in a time series over a given range of lags. Corr(yt,yt-k), k=1,2,…. gives the ACF for the time series y. We generally use graphs to demonstrate this function.

The ACF can be used to determine a time series’ randomness and stationarity. You may also examine if there are any seasonal patterns or tendencies. In an ACF plot, each bar represents the size and direction of the connection. Bars that cross the red line are statistically significant.

Partial Autocorrelation Function (PACF)

The partial autocorrelation function, like the ACF, indicates only the association between two data that the shorter lags between those observations do not explain. The partial autocorrelation for lag 3 is, for example, merely the correlation that lags 1 and 2 do not explain. In other words, the partial correlation for each lag is the unique correlation between the two observations after the intermediate correlations have been removed.

As previously stated, the autocorrelation function aids in determining the qualities of a time series. The partial autocorrelation function (PACF), on the other hand, is more beneficial during the definition phase for an autoregressive model. Partial autocorrelation plots can be used to specify regression models with time series data as well as Auto-Regressive Integrated Moving Average (ARIMA) models.

Implementing ACF and PACF in python

In this section, we’ll implement the ACF and PACF plots and interpret the same. For this, we’ll be using the functionality from stats models and Pandas. The dataset holds the information for electricity consumption (monthly consumption) from the year 1985 to 2018.  

Now, let’s import the dependencies required.

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

Below we’ll load and observe our series. 

data = pd.read_csv('/content/Electric_Production.csv',index_col='DATE', parse_dates=True)

data.plot(figsize=(8,4))

Plotting ACF

Now let’s plot ACF. As previously stated, autocorrelation depicts the association of a sequence with itself after a certain number of time units. When plotted, the X-axis represents the lag number, and the Y-axis represents the correlation of the sequence with a sequence at that lag. The Y-axis scales from -1 to 1.

From 1985 to 2018, we have a monthly consumption dataset. Autocorrelation can be used to answer questions such as “How connected is this month’s consumption with consumption in the previous month?” The lag value of 1 is indicated by the prior month.

The ACF can be calculated using the stats model function acf(), and then plotted using the plot_acf() function.

# calculate acf
acf_values = acf(data['Value'])

# keeping lag as 30
plot_acf(data['Value'], lags=30);

The above validates the assumption that how the consumption is correlated to the previous 12 lags and next 12 lags consumption and so on. The relation inside the shaded area is something that is statistically irrelevant.  

Plotting PACF

The plot of APCF is a little difficult to follow. It performs the same function as regular autocorrelation in that it displays the correlation of a sequence with itself after a certain number of time units have passed. However, there is a catch. All intermediary effects are erased, leaving only the direct effect visible.

You might, for example, be interested in the direct relationship between today’s consumption and that of a year ago. You don’t give a damn about what happens in between.

The consumption of the previous 12 months has an effect on the consumption of the previous 11 months, and the cycle continues until the most current period. In partial autocorrelation estimates, these indirect effects are ignored.

# PACF
pacf_values = (data['Value'])
 
# plot pacf
plot_pacf(data['Value'], lags=30)

From the above PACF plot if you observe the values at regular intervals, at the 12th lag it is correlated to 0th lag and for 24 lag correlation further decreases and further, it is getting weaker and weaker. 

Final words

Through this article, we have discussed the correlation. In the context of time series analysis, before modelling the time series one should analyze the autocorrelation and partial autocorrelation in order to get proper insight from the series. These steps are basically carried out to be sure and to have evidence that the given data is a function of time. By observing and interpreting the above plots one can decide which time series modelling technique to use further like autoregressive, moving average, etc.

Reference

More Great AIM Stories

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM