In data science, a time series is a series of information points gathered in time order. Thus, it is a sequence of changes accrued at successive equal time intervals and obtained through observation over time. Because changes are dependent on time, as time increases, the changes will occur, increasing, decreasing or neutral changes. There can be many examples of time series like weather information of two or more years, stock market data, etc.
Because in time series, information points are gathered at adjacent time-spaces, there is a relation between observations, whether they can be proportional or unproportioned. This is the feature of time series data that differentiate time series from normal data.
Visualisation is the best way to understand time-series data; converting data points into graphs can give us the overall overview. So let’s just try to understand the time series data with a line graph. To make a line graph visualization, we will use Python as our programming language and google colab as our notebook; we visualise the alcohol dataset.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Importing pandas for reading data and making line graphs of the dataset.
Download our Mobile App
import pandas as pd
Reading data, checking some head rows of it.
data = pd.read_csv('/content/drive/MyDrive/Yugesh/Trend, Season, Cycle/AlcoholSale.csv',index_col='DATE') data.index = pd.to_datetime(data.index) data.head()
Here we can see that we have chosen our date column as our index, as we discussed before. The date-time feature makes a time series data different from other normal data; next in the units column, we have the units of sales for different dates. Let’s make a line graph for this data.
Here in the line graph, we can see how sales are changing with different consecutive years. From this graph we can understand how the sales are going for the whole time, decreasing with time. For example, for the year 1992, the value of units is around 15000-16000 and for 2020 units value is around 4000-3000.
In time-series data analysis, we seek the reason behind the changes occurring over time. There might be many of them. For example, in the sales of umbrellas, the units sold by any vendor are on a hike in the rainy season, but without the rainy season, there is a decrease in the sales. This is the effect of the season in any time series. There can be many more examples like increasing the temperature of Earth. The main reason is global warming; finding out the reason for any time series and control on the future can be the whole perspective of this data analysis.
Mathematically a time-series data can be break into three components:
These three are one of the most important components of any time series data. In making any time series, these take a huge part. So let’s discuss these components.
What is Trend ?
The trend is a pattern in graphs of a time series that shows the movement of the time series. This movement can be observed by the data points of a long period. We can say the trend in any data series if any increasing or decreasing slope is present in the time-series graph. The trend usually comes and goes in any time series; it does not reside in any data set for the whole time. For example, in a time-series data of youtube viral video viewers, the trend will be in time series where the video goes viral once the video gets older, there is no trend in its time series.
The trend can be of three types:
- Increasing: when the general pattern is on an upward slope.
- Decreasing: when the general pattern is on a downward slope.
- Horizontal: when the general pattern is without slope.
Let’s see an example for having a visualization of the temperature data set.
data = pd.read_excel("/content/drive/MyDrive/Yugesh/Trend, Season, Cycle/temprature.xlsx", index_col= 'Date') data.index = pd.to_datetime(data.index) data.head()
In the data set, we have the temperature from the year 1981 to 1991. Temperature is a thing which whole over the year in winters it becomes low, and in summers it becomes higher to that can be a great example to understand the trend
Let’s make a graph for the whole data set.
Here we can see that the all-over pattern of the temperature is neither increased nor decreased; there is no slop on the temperature value. So we can consider it as a horizontal trend.
Let’s check the trend for the year 1982.
Here for a whole year, we can see that all over patterns have a downward slope for approximately half of the year, and for the next half of the year, the pattern slope is upward. So we can say the trend has been decreasing for the first half of the year and decreasing for another half year.
Let’s check for the month where the temperature is at its lowest, and the highest from the graph we can say at temperature is lower.
Here in the graph, we can see that the start temperature of July was at 4 degrees and at the end, it was around 7 degrees, so we can consider the trend was increasing for this whole month.
Let’s check for the January month of 1981.
Here in output, we can see that the whole temperature for the month has decreased from 20 to 15 degrees, so for the whole month, the temperature is in decreasing order.
So till now, we have seen the trends work in any time series, but sometimes we need to remove those trends for example in the sales of petrol we require the sales for a whole year we don’t require the information monthly, in that case, we can also detrend our data. To learn more about detrending, you can refer to this article.
What is Season?
Sometimes a time series dataset may contain a seasonal component. The season is nothing but a variation in data point repeating for time-space or interval. For example, in our data, we have seen the temperature decrease for half a year and another half year, it was increasing.
We have seen previously in the article that changes in data points occur with time. But what if the change is similar for different time zones? Then the changes can be considered as a seasonality. The seasonality can be of different types according to the time zone.
Types of seasons are:
- Time of day.
Identification of the seasonality can help us to understand our time series better. Like in the temperature data set, we have seen the increment and decrement in the temperature with time.
In forecasting modelling, understanding the seasonal component in a time series can improve the model’s performance. Removing the seasonal component gives a clearer image of the time series. In the next image, we are making a graph with the mean temperature of the months and will see the difference between the graphs.
import matplotlib.pyplot as plt resample = data.resample('M') temp_mean = resample.mean() print(temp_mean.head(13)) temp_mean.plot() plt.show()
Here we can see that after taking the mean, we have a clear and easily understood picture of the data set from the graph; we can extract a lot of inferences like we can tell how we got the lowest temperature in 1982; let’s look more deeply into the mean temperature data.
Here we can see the temperature at July 1982 went the lowest in 20 years. These kinds of inferences can be extracted after some deseasoning. To read about the removal of seasoning, you can go for this article.
Here in this article, we have seen the trends and increasing, decreasing, and horizontal trends. In addition, we have seen the basic idea of the season, how it affects the forecast modelling, and how we can remove it to see a clearer picture.
What is cycle?
In time-series data, some data points occurring periodically with similar time intervals. This type of data point can be considered as a cycle in time series data. For example, after every fifteen days, there is a full moon night in the time series of earth’s day and night, so we can say the cycle will be there till the end of the time series. The majority of prediction problems in forecasting occur because of trends and seasons. Cycle is what should be stable with time, but the trend and season can change the cycle overview in time series.
All the information in the article are gathered from:
- Google colab for codes.
- Temperature and alcohol data.