MITB Banner

Guide to Giotto-Time: A Time-Series Forecasting Python Library

Giotto-Time is an open-source Python library to perform time-series forecasting in machine learning with simple codes and built-in pipelines.

Share

Giotto-Time cover image

Giotto-Time is an open-source Python library to perform time-series forecasting in machine learning. It is built on top of SciKit-Learn with a few modifications and wrappings to do end-to-end time-series analysis in a single go. Giotto-Time gives importance to every task associated with the time-series analysis. With Giotto-Time library, Giotto spans its list of powerful open source tools to perform various machine learning tasks.

Time-series forecasting is a centuries-old method of predicting the future with past data in hand. It finds applications in a variety of domains, including e-commerce, finance, space science, weather forecasting and medical sciences. Unlike time-insensitive structured data, time-series data do need more care in every stage of problem solving. Preprocessing time-series data is one of the difficult tasks that needs field expertise to perform. 

Giotto-Time is introduced to make the time-series modeling tasks simple. This library presents data preprocessing, data cleaning, data extraction, data analysis, forecast modeling and causality testing with very few lines of code. Data analysis in Giotto-Time is associated with data visualization, for which the library introduces special plots. The data visualization module is built on top of the MatPlotLib library. 

We explore the Giotto-Time library in the sequel with some examples and hands-on codes. Giotto-Time is available as a PyPi package. We can simply pip install it.

!pip install giotto-time

Time-Series Forecasting with Giotto-Time

Import the necessary libraries and modules. 

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 from sklearn.linear_model import LinearRegression
 from gtime.preprocessing import TimeSeriesPreparation
 from gtime.compose import FeatureCreation
 from gtime.feature_extraction import Shift, MovingAverage
 from gtime.feature_generation import PeriodicSeasonal, Constant, Calendar
 from gtime.model_selection import horizon_shift, FeatureSplitter
 from gtime.forecasting import GAR 

Define a function to generate some synthetic time-series data using Pandas’ testing module.

 def test_time_series():
     from pandas.util import testing as testing
     testing.N, testing.K = 500, 1
     df = testing.makeTimeDataFrame( freq="D" )
     return df 

Generate synthetic time-series data.

 time_series = test_time_series()
 print(f'Time series shape: {time_series.shape}')
 print(f'Time series index type: {time_series.index.__class__}') 

Output:

The time-series data should be in PeriodIndex format to proceed further. The Giotto-Time library offers a time-series preprocessing module using which we can transform the data from DatetimeIndex to PeriodIndex.

 time_series_preparation = TimeSeriesPreparation()
 period_index_time_series = time_series_preparation.transform(time_series)
 print(f'Time series index type after the preprocessing: \n{period_index_time_series.index.__class__}') 

Output:

Let’s visualize the time-series data.

 period_index_time_series.plot(figsize=(10, 5))
 plt.show() 
giotto time series data

Extract features and generate new features using the FeatureCreation API of Giotto-Time. Here, moving average of time period is determined and appended as a feature. In addition, a temporal shift is performed to generate two new features.

 # Feature generation pipeline
 dft = FeatureCreation(
     [('s0', Shift(0), ['time_series']), 
      ('s1', Shift(1), ['time_series']),
      ('ma3', MovingAverage(window_size=3), ['time_series']),
     ]) 

Fit the time-series data into the feature generation pipeline.

 X = dft.fit_transform(period_index_time_series)
 X.head(6) 

Output:

giotto feature generation

Generate the ground truth (output variable) using horizon_shift method. 

 y = horizon_shift(period_index_time_series, horizon=3)
 y.head() 

Output:

ground truth data

Next, split the data into train and test sets using the FeatureSplitter method. Sample some data from each split part.

 feature_splitter = FeatureSplitter()
 X_train, y_train, X_test, y_test = feature_splitter.transform(X, y)
 X_train.tail() 

Output:

X_test

Output:

y_train.tail()

Output:

y_test

Output:

Develop a simple linear regression model from the SciKit-Learn library. Build a Generalized Auto-Regressive (GAR) model on top of the linear regression model to perform a simple time-series forecasting, and train the model with the training dataset.

 lr = LinearRegression()
 model = GAR(lr)
 model = model.fit(X_train, y_train) 

Once the model is trained, infer the future by predicting it.

 predictions = model.predict(X_test)
 predictions 

Output:

Find the Colab Notebook here with above code implementation.

Time-Series Plotting with Giotto-Time

Import the necessary libraries and modules. 

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 %matplotlib inline  
 from gtime.preprocessing import TimeSeriesPreparation
 from gtime.plotting import seasonal_plot, seasonal_subplots, lag_plot, acf_plot 

Load the Kansas Wheat Index data from the Giotta-Time’s official google cloud storage.

 df_sp = pd.read_csv('https://storage.googleapis.com/l2f-open-models/giotto-time/examples/data/WheatTr.csv', sep='\t')
 df_column = df_sp.set_index('Effective date ')['S&P GSCI Kansas Wheat'] 

Transform the data into PeriodIndex format and fill the missing values.

 df_column.index = pd.to_datetime(df_column.index)
 time_series_preparation = TimeSeriesPreparation(output_name='Wheat price index')
 period_index_time_series = time_series_preparation.transform(df_column)
 df = period_index_time_series.resample('D').fillna(method='ffill') 

Calculate logarithmic value of sales returns and generate a returns data.

 returns = (np.log(df / df.shift(1))).dropna()
 returns.columns = ['Wheat price returns'] 

Plot the Wheat price and returns to visualize the data.

 ax = df.plot(figsize=(10, 5))
 ax = returns.plot(ax=ax, secondary_y=True) 

Output:

Seasonal plots are powerful tools in Giotto-Time library that give an overall picture of how the time-series data vary over seasons such as yearly, monthly, weekly, etc. The following codes generate seasonal plots for price index data.

 fig = plt.figure(figsize=(6,6))
 m1 = fig.add_subplot(111, title='Seasonal plot (year/monthly)')
 seasonal_plot(df, 'year', freq='M', agg='last', ax=m1)
 plt.plot() 

Output:

Giotto seasonal plot

Plot monthly returns with seasonal plot in polar form.

 fig = plt.figure(figsize=(6, 6))
 m2 = fig.add_subplot(111, projection='polar')
 seasonal_plot(returns, 'year', freq='M', agg='last', ax=m2, polar=True)
 m2.set_title('Monthly returns')
 plt.plot() 

Output:

Giotto seasonal plot polar

Seasonal plots can also be realized through Whisker’s box plot. This plot gives the basic statistical summary such as mean, mode, quartiles, minimum and maximum entries.

 seasonal_subplots(returns, 'year', 'M', agg='last', box=True)
 plt.show() 

Output:

Giotto box plot

Lag plots have a prominent place in time-series analysis. It compares the data with its own temporal lags. Giotto-Time’s lag plots are simple to execute. Let’s visualize the price index data in a lag plot with three different lags, one day, one month and one year.

 lag_plot(df, lags=[1, 30, 365])
 plt.show() 

Output:

time-series autocorrelation
time-series autocorrelation
time-series autocorrelation

Let’s visualize the lag plot for the returns data. 

 lag_plot(returns, lags=[1, 30, 365])
 plt.show()

Output:

Autocorrelation of price index seems good even up to a lag of one month. But, in the case of returns, the plot is random irrespective of the lag.

Find the Colab Notebook here with the above code implementation.

Wrapping up

We discussed the open-source time-series forecasting Python library, Giotto-Time. We went through hands-on practice with Python codes on two tasks.

  1. Time-series forecasting
  2. Time-series data plotting

Giotto-Time’s full potential can be explored with real-world time-series problems consisting of data cleaning, data analysis, feature generation, forecasting and causality testing.

Further reading:

Share
Picture of Rajkumar Lakshmanamoorthy

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.