Giotto-Time is an open-source Python library to perform time-series forecasting in machine learning. It is built on top of SciKit-Learn with a few modifications and wrappings to do end-to-end time-series analysis in a single go. Giotto-Time gives importance to every task associated with the time-series analysis. With Giotto-Time library, Giotto spans its list of powerful open source tools to perform various machine learning tasks.
Time-series forecasting is a centuries-old method of predicting the future with past data in hand. It finds applications in a variety of domains, including e-commerce, finance, space science, weather forecasting and medical sciences. Unlike time-insensitive structured data, time-series data do need more care in every stage of problem solving. Preprocessing time-series data is one of the difficult tasks that needs field expertise to perform.
Giotto-Time is introduced to make the time-series modeling tasks simple. This library presents data preprocessing, data cleaning, data extraction, data analysis, forecast modeling and causality testing with very few lines of code. Data analysis in Giotto-Time is associated with data visualization, for which the library introduces special plots. The data visualization module is built on top of the MatPlotLib library.
We explore the Giotto-Time library in the sequel with some examples and hands-on codes. Giotto-Time is available as a PyPi package. We can simply pip install it.
!pip install giotto-time
Time-Series Forecasting with Giotto-Time
Import the necessary libraries and modules.
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from gtime.preprocessing import TimeSeriesPreparation from gtime.compose import FeatureCreation from gtime.feature_extraction import Shift, MovingAverage from gtime.feature_generation import PeriodicSeasonal, Constant, Calendar from gtime.model_selection import horizon_shift, FeatureSplitter from gtime.forecasting import GAR
Define a function to generate some synthetic time-series data using Pandas’ testing module.
def test_time_series(): from pandas.util import testing as testing testing.N, testing.K = 500, 1 df = testing.makeTimeDataFrame( freq="D" ) return df
Generate synthetic time-series data.
time_series = test_time_series() print(f'Time series shape: {time_series.shape}') print(f'Time series index type: {time_series.index.__class__}')
Output:
The time-series data should be in PeriodIndex
format to proceed further. The Giotto-Time library offers a time-series preprocessing module using which we can transform the data from DatetimeIndex
to PeriodIndex
.
time_series_preparation = TimeSeriesPreparation() period_index_time_series = time_series_preparation.transform(time_series) print(f'Time series index type after the preprocessing: \n{period_index_time_series.index.__class__}')
Output:
Let’s visualize the time-series data.
period_index_time_series.plot(figsize=(10, 5)) plt.show()
Extract features and generate new features using the FeatureCreation
API of Giotto-Time. Here, moving average of time period is determined and appended as a feature. In addition, a temporal shift is performed to generate two new features.
# Feature generation pipeline dft = FeatureCreation( [('s0', Shift(0), ['time_series']), ('s1', Shift(1), ['time_series']), ('ma3', MovingAverage(window_size=3), ['time_series']), ])
Fit the time-series data into the feature generation pipeline.
X = dft.fit_transform(period_index_time_series) X.head(6)
Output:
Generate the ground truth (output variable) using horizon_shift
method.
y = horizon_shift(period_index_time_series, horizon=3) y.head()
Output:
Next, split the data into train and test sets using the FeatureSplitter
method. Sample some data from each split part.
feature_splitter = FeatureSplitter() X_train, y_train, X_test, y_test = feature_splitter.transform(X, y) X_train.tail()
Output:
X_test
Output:
y_train.tail()
Output:
y_test
Output:
Develop a simple linear regression model from the SciKit-Learn library. Build a Generalized Auto-Regressive (GAR) model on top of the linear regression model to perform a simple time-series forecasting, and train the model with the training dataset.
lr = LinearRegression() model = GAR(lr) model = model.fit(X_train, y_train)
Once the model is trained, infer the future by predicting it.
predictions = model.predict(X_test) predictions
Output:
Find the Colab Notebook here with above code implementation.
Time-Series Plotting with Giotto-Time
Import the necessary libraries and modules.
import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline from gtime.preprocessing import TimeSeriesPreparation from gtime.plotting import seasonal_plot, seasonal_subplots, lag_plot, acf_plot
Load the Kansas Wheat Index data from the Giotta-Time’s official google cloud storage.
df_sp = pd.read_csv('https://storage.googleapis.com/l2f-open-models/giotto-time/examples/data/WheatTr.csv', sep='\t') df_column = df_sp.set_index('Effective date ')['S&P GSCI Kansas Wheat']
Transform the data into PeriodIndex
format and fill the missing values.
df_column.index = pd.to_datetime(df_column.index) time_series_preparation = TimeSeriesPreparation(output_name='Wheat price index') period_index_time_series = time_series_preparation.transform(df_column) df = period_index_time_series.resample('D').fillna(method='ffill')
Calculate logarithmic value of sales returns and generate a returns data.
returns = (np.log(df / df.shift(1))).dropna() returns.columns = ['Wheat price returns']
Plot the Wheat price and returns to visualize the data.
ax = df.plot(figsize=(10, 5)) ax = returns.plot(ax=ax, secondary_y=True)
Output:
Seasonal plots are powerful tools in Giotto-Time library that give an overall picture of how the time-series data vary over seasons such as yearly, monthly, weekly, etc. The following codes generate seasonal plots for price index data.
fig = plt.figure(figsize=(6,6)) m1 = fig.add_subplot(111, title='Seasonal plot (year/monthly)') seasonal_plot(df, 'year', freq='M', agg='last', ax=m1) plt.plot()
Output:
Plot monthly returns with seasonal plot in polar form.
fig = plt.figure(figsize=(6, 6)) m2 = fig.add_subplot(111, projection='polar') seasonal_plot(returns, 'year', freq='M', agg='last', ax=m2, polar=True) m2.set_title('Monthly returns') plt.plot()
Output:
Seasonal plots can also be realized through Whisker’s box plot. This plot gives the basic statistical summary such as mean, mode, quartiles, minimum and maximum entries.
seasonal_subplots(returns, 'year', 'M', agg='last', box=True) plt.show()
Output:
Lag plots have a prominent place in time-series analysis. It compares the data with its own temporal lags. Giotto-Time’s lag plots are simple to execute. Let’s visualize the price index data in a lag plot with three different lags, one day, one month and one year.
lag_plot(df, lags=[1, 30, 365]) plt.show()
Output:
Let’s visualize the lag plot for the returns data.
lag_plot(returns, lags=[1, 30, 365]) plt.show()
Output:
Autocorrelation of price index seems good even up to a lag of one month. But, in the case of returns, the plot is random irrespective of the lag.
Find the Colab Notebook here with the above code implementation.
Wrapping up
We discussed the open-source time-series forecasting Python library, Giotto-Time. We went through hands-on practice with Python codes on two tasks.
- Time-series forecasting
- Time-series data plotting
Giotto-Time’s full potential can be explored with real-world time-series problems consisting of data cleaning, data analysis, feature generation, forecasting and causality testing.