In time-series, data points are gathered in time order. Therefore, changes occurring in any variable with the time can cause the generation of time series. Furthermore, because changes are gathered with an adjacent time interval, there can be a relationship between the changes. This relationship can be considered as a correlation.
In statistical analysis, correlation is a mathematical representation of the relation between two variables, telling us how a variable affects another variable. For example, when we talk about the time-series data, many factors affect the time series, but the only thing that affects the lagged version of the variable is the time series data itself. For example, the ups and down of any share depend on its last day’s magnitude in the share market. Correlation in mathematics varies between -1 to 1 and +1 denotes the strong positive relationship, and -1 denotes the strong negative relationship.
Sign up for your weekly dose of what's up in emerging technology.
We use autocorrelation to represent the degree of similarity between changes in two successive time intervals in a time series. Conceptually it is similar to correlation; the only difference in the measurement is that in correlation, we find the degree of relationship between two variables, but in autocorrelation, we find the degree of relation between two versions of time series, which means the relation of a time series to its own lagged time series or older time series.
There can be many applications to use autocorrelation as a tool, like using autocorrelation. We can predict the stock situation for tomorrow or tell the prediction about the upcoming rain season.
Autocorrelation can measure the relationship between the present and its past values in a time series, and it also varies between 1 to -1, similar to the correlation. Autocorrelation becomes easier when we use a graph to measure it. Next, in the topic, we are going to discuss the autocorrelation graph. And make a graph using python and its library.
We can make an autocorrelation plot using matplotlib and statsmodel libraries. Let’s see how we can do that in google colab.
Importing the libraries.
import pandas as pd import matplotlib.pyplot as plt from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
We will discuss the autocorrelation in two kinds of data sets, one is increasing time series, and the other one is decreasing time series.
In increasing time series, we have Google share price data. In which we are provided with the price of Google shares for the year 2005. We will see in the data how it is increasing and how the time series is autocorrelated.
Reading the data
data = pd.read_csv("/content/drive/MyDrive/Yugesh/AC and PAC plots/google.txt",encoding="utf16",sep = "\t",index_col = 'date') data
Here we can see how we can draw a plot to know about its increasing nature.
Here we can see that the value of shares decreased for some time, and then the growth of prize is increasing with time.
Let’s make an autocorrelation plot to know about the degree of the relation between time series and its own lagged part.
Here we are using matplotlib to make the autocorrelation plot.
plt.acorr(data['price'], maxlags = 10) print("The Autocorreleation plot for the data is:") plt.grid(True) plt.show()
Here we can see that the degree of autocorrelation in time series is mostly on the positive side, and at some point, it has been of perfect positive correlated nature. So by this plot, we can infer that the present time series is highly correlated with the older time series.
The stats model gives a better-autocorrelated plot. In the statsmodel library, module tsapplots provides this facility. let’s try to make that also.
Here in the plot, the blue area shows the confidence interval, and the candles represent the autocorrelation levels at different points of the time series. We can also see that the relationship between two points in the time series follows a consecutively positive nature.
Let’s plot this also for our alcohol dataset, which is decreasing with time.
Reading the dataset.
data1 = pd_csv("/content/drive/MyDrive/Yugesh/Trend, Season, Cycle/AlcoholSale.csv",index_col = 'DATE') data1
Let’s draw the line graph to understand the nature of time series.
Here we can see the value of units of sale of alcohol is decreasing over time.
Let’s check for the autocorrelation of time series using the autocorrelation plot.
We can also see that the relationship between the points and the past time series has a strong positive nature.
Here we can say that the autocorrelation is not dependent on the behaviour or nature of time series if the time series is unidirectional. On the contrary, every point in the time series helps to define the nature of the whole time series.
Partial autocorrelation :
Mathematically Partial autocorrelation in a time series can be considered the degree of correlation between an observation in a given time with the observations own lagged value, unlike autocorrelation, where we find the relationship between a time series point to the whole past time series.
For example, in daily temperature prediction, we are predicting tomorrow’s temperature based on today’s temperature without considering any other older data points. Today’stemperature is the consecutive lagged data point, and tomorrow’s temperature prediction is the present value. The relationship between the temperature tomorrow and today without considering any older data is partial autocorrelation.
We can visualize the partial autocorrelation plot using the statsmodels library of python. Let’s see how we can do this with the data which we have imported before.
Visualising the partial autocorrelation plot for the google share price data.
Here we can see that the starting two points are highly correlated; after two points, we can see the correlation between the points. So there will always be a high co-relationship between the starting two points of the time series.
Let’s check the partial autocorrelation on alcohol data.
Here we can compare the partial autocorrelation plot of both data. For google share price data, in the line graph, we saw that the deviation of the time series with time was high. According to those deviations, the deviation in partial autocorrelation is varying so much from the zeroth line. Still, in alcohol data, we have seen in the line graph that decreasing order of UNIT was stable with time, so all the candles in the partial autocorrelation plot are mostly lying around the zeroth line.
Here in the article, we have seen how we can make autocorrelation(AC) plots and partial autocorrelation (PAC) plots and make inferences for the time series.
Here in the article all the information is gathered from