Guide to Giotto-Time: A Time-Series Forecasting Python Library

Giotto-Time is an open-source Python library to perform time-series forecasting in machine learning with simple codes and built-in pipelines.

Share

Published on April 21, 2021

by Rajkumar Lakshmanamoorthy

Giotto-Time is an open-source Python library to perform time-series forecasting in machine learning. It is built on top of SciKit-Learn with a few modifications and wrappings to do end-to-end time-series analysis in a single go. Giotto-Time gives importance to every task associated with the time-series analysis. With Giotto-Time library, Giotto spans its list of powerful open source tools to perform various machine learning tasks.

Time-series forecasting is a centuries-old method of predicting the future with past data in hand. It finds applications in a variety of domains, including e-commerce, finance, space science, weather forecasting and medical sciences. Unlike time-insensitive structured data, time-series data do need more care in every stage of problem solving. Preprocessing time-series data is one of the difficult tasks that needs field expertise to perform.

Giotto-Time is introduced to make the time-series modeling tasks simple. This library presents data preprocessing, data cleaning, data extraction, data analysis, forecast modeling and causality testing with very few lines of code. Data analysis in Giotto-Time is associated with data visualization, for which the library introduces special plots. The data visualization module is built on top of the MatPlotLib library.

We explore the Giotto-Time library in the sequel with some examples and hands-on codes. Giotto-Time is available as a PyPi package. We can simply pip install it.

!pip install giotto-time

Time-Series Forecasting with Giotto-Time

Import the necessary libraries and modules.

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 from sklearn.linear_model import LinearRegression
 from gtime.preprocessing import TimeSeriesPreparation
 from gtime.compose import FeatureCreation
 from gtime.feature_extraction import Shift, MovingAverage
 from gtime.feature_generation import PeriodicSeasonal, Constant, Calendar
 from gtime.model_selection import horizon_shift, FeatureSplitter
 from gtime.forecasting import GAR

Define a function to generate some synthetic time-series data using Pandas’ testing module.

 def test_time_series():
     from pandas.util import testing as testing
     testing.N, testing.K = 500, 1
     df = testing.makeTimeDataFrame( freq="D" )
     return df

Generate synthetic time-series data.

 time_series = test_time_series()
 print(f'Time series shape: {time_series.shape}')
 print(f'Time series index type: {time_series.index.__class__}')

Output:

The time-series data should be in PeriodIndex format to proceed further. The Giotto-Time library offers a time-series preprocessing module using which we can transform the data from DatetimeIndex to PeriodIndex.

 time_series_preparation = TimeSeriesPreparation()
 period_index_time_series = time_series_preparation.transform(time_series)
 print(f'Time series index type after the preprocessing: \n{period_index_time_series.index.__class__}')

Output:

Let’s visualize the time-series data.

 period_index_time_series.plot(figsize=(10, 5))
 plt.show()

Extract features and generate new features using the FeatureCreation API of Giotto-Time. Here, moving average of time period is determined and appended as a feature. In addition, a temporal shift is performed to generate two new features.

 # Feature generation pipeline
 dft = FeatureCreation(
     [('s0', Shift(0), ['time_series']), 
      ('s1', Shift(1), ['time_series']),
      ('ma3', MovingAverage(window_size=3), ['time_series']),
     ])

Fit the time-series data into the feature generation pipeline.

 X = dft.fit_transform(period_index_time_series)
 X.head(6)

Output:

Generate the ground truth (output variable) using horizon_shift method.

 y = horizon_shift(period_index_time_series, horizon=3)
 y.head()

Output:

Next, split the data into train and test sets using the FeatureSplitter method. Sample some data from each split part.

 feature_splitter = FeatureSplitter()
 X_train, y_train, X_test, y_test = feature_splitter.transform(X, y)
 X_train.tail()

Output:

X_test

Output:

y_train.tail()

Output:

y_test

Output:

Develop a simple linear regression model from the SciKit-Learn library. Build a Generalized Auto-Regressive (GAR) model on top of the linear regression model to perform a simple time-series forecasting, and train the model with the training dataset.

 lr = LinearRegression()
 model = GAR(lr)
 model = model.fit(X_train, y_train)

Once the model is trained, infer the future by predicting it.

 predictions = model.predict(X_test)
 predictions

Output:

Find the Colab Notebook here with above code implementation.

Time-Series Plotting with Giotto-Time

Import the necessary libraries and modules.

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline  
 from gtime.preprocessing import TimeSeriesPreparation
 from gtime.plotting import seasonal_plot, seasonal_subplots, lag_plot, acf_plot

Load the Kansas Wheat Index data from the Giotta-Time’s official google cloud storage.

 df_sp = pd.read_csv('https://storage.googleapis.com/l2f-open-models/giotto-time/examples/data/WheatTr.csv', sep='\t')
 df_column = df_sp.set_index('Effective date ')['S&P GSCI Kansas Wheat']

Transform the data into PeriodIndex format and fill the missing values.

 df_column.index = pd.to_datetime(df_column.index)
 time_series_preparation = TimeSeriesPreparation(output_name='Wheat price index')
 period_index_time_series = time_series_preparation.transform(df_column)
 df = period_index_time_series.resample('D').fillna(method='ffill')

Calculate logarithmic value of sales returns and generate a returns data.

 returns = (np.log(df / df.shift(1))).dropna()
 returns.columns = ['Wheat price returns']

Plot the Wheat price and returns to visualize the data.

 ax = df.plot(figsize=(10, 5))
 ax = returns.plot(ax=ax, secondary_y=True)

Output:

Seasonal plots are powerful tools in Giotto-Time library that give an overall picture of how the time-series data vary over seasons such as yearly, monthly, weekly, etc. The following codes generate seasonal plots for price index data.

 fig = plt.figure(figsize=(6,6))
 m1 = fig.add_subplot(111, title='Seasonal plot (year/monthly)')
 seasonal_plot(df, 'year', freq='M', agg='last', ax=m1)
 plt.plot()

Output:

Plot monthly returns with seasonal plot in polar form.

 fig = plt.figure(figsize=(6, 6))
 m2 = fig.add_subplot(111, projection='polar')
 seasonal_plot(returns, 'year', freq='M', agg='last', ax=m2, polar=True)
 m2.set_title('Monthly returns')
 plt.plot()

Output:

Seasonal plots can also be realized through Whisker’s box plot. This plot gives the basic statistical summary such as mean, mode, quartiles, minimum and maximum entries.

 seasonal_subplots(returns, 'year', 'M', agg='last', box=True)
 plt.show()

Output:

Lag plots have a prominent place in time-series analysis. It compares the data with its own temporal lags. Giotto-Time’s lag plots are simple to execute. Let’s visualize the price index data in a lag plot with three different lags, one day, one month and one year.

 lag_plot(df, lags=[1, 30, 365])
 plt.show()

Output:

Let’s visualize the lag plot for the returns data.

 lag_plot(returns, lags=[1, 30, 365])
 plt.show()

Output:

Autocorrelation of price index seems good even up to a lag of one month. But, in the case of returns, the plot is random irrespective of the lag.

Find the Colab Notebook here with the above code implementation.

Wrapping up

We discussed the open-source time-series forecasting Python library, Giotto-Time. We went through hands-on practice with Python codes on two tasks.

Time-series forecasting
Time-series data plotting

Giotto-Time’s full potential can be explored with real-world time-series problems consisting of data cleaning, data analysis, feature generation, forecasting and causality testing.