MITB Banner

Guide To GluonTS and PytorchTS For Time-Series Forecasting (With Python Implementation)

Share

GluonTS is a toolkit that is specifically designed for probabilistic time series modeling, It is a subpart of the Gluon organization, Gluon is an open-source deep-learning interface that allows developers to build neural nets without compromising performance and efficiency. AWS and Microsoft first introduced it on October 12th, 2017 that provides many different neural network architectures and leverages the deep learning models. It combines many packages into one like mxnet– a lightweight, portable, flexible Distributed/Mobile Deep learning model; for Python, R, Julia, Scala. Go, Javascript, and more.

Gluon’s goal is to leverage Jupyter notebooks’ strengths to present graphics, equations, and code together in one place.

GluonTS

gluon logo

“GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy.”

— GluonTS Arxiv Research paper

GluonTS is a toolkit that is specifically designed for probabilistic time series modeling, GlounTS provides the utilities for loading and looping over time-series datasets. It also provides state of the art models for time series forecasting, and a building block to define your own models and quickly test with different solutions, GluonTS has many features like you can:

  • Train and evaluate any inbuilt models on your custom dataset
  • Quickly create your solution using GluonTS
  • It provides custom abstractions and building blocks to create custom models.
  • Provide multiple baseline algorithms for comparison.
  • Plotting and evaluation facilities
  • Artificial and real datasets.

Installation

pip install gluonts
# as gluonts relies on mxnet
# install MXnet using pip
pip install mxnet

Getting Started 

We have seen time series forecasting using TensorFlow and PyTorch, but they come with a lot of code and require great proficiency over the framework. GluonTS provide simple and on point code for running your time series forecasting here is an example code to run GluonTS for predicting Twitter volume with DeepAR.

You can run the following code in a cloud development environment at: https://github.com/mmaithani/data-science/blob/main/Gluonts_twitter_volume_forecasting.ipynb

#importing gluonTS utilities and pandas
from gluonts.dataset import common
from gluonts.model import deepar
from gluonts.trainer import Trainer
import pandas as pd

#getting train datatset of twitter volume
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0)
data = common.ListDataset([{
    "start": df.index[0],
    "target": df.value[:"2015-04-05 00:00:00"]
}],
                          freq="5min")
#initializing trainers and deepAR estimators
trainer = Trainer(epochs=10)
estimator = deepar.DeepAREstimator(
    freq="5min", prediction_length=12, trainer=trainer)
predictor = estimator.train(training_data=data)

prediction = next(predictor.predict(data))
print(prediction.mean)
prediction.plot(output_file='graph.png')
output

Let’s take an example dataset to forecast time series dataset using GluonTS

#importing modules
%matplotlib inline
import mxnet as mx
from mxnet import gluon
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json

Importing inbuilt GluonTS datasets

from gluonts.dataset.repository.datasets import get_dataset, dataset_recipes
from gluonts.dataset.util import to_pandas
print(f"Available datasets: {list(dataset_recipes.keys())}")

We are going to use the m4_hourly dataset, now the datasets provided by gluonts are objects that consist of three main attributes we will see later.

dataset = get_dataset("m4_hourly", regenerate=True)

Three main attributes of GluonTs dataset are dataset.train is a training dataset, dataset.test is a testing dataset, and dataset.metadata contains metadata of the dataset. Let’s plot and iterate the dataset using the following command.

entry = next(iter(dataset.train))
train_series = to_pandas(entry)
train_series.plot()
plt.grid(which="both")
plt.legend(["train series"], loc="upper left")
plt.show()
gluonts

Similarly, you can plot test dataset

entry = next(iter(dataset.test))
test_series = to_pandas(entry)
test_series.plot()
plt.axvline(train_series.index[-1], color='r') # end of train dataset
plt.grid(which="both")
plt.legend(["test series", "end of train series"], loc="upper left")
plt.show()
dataset test series

Preprocessing 

N = 10  # number of time series
T = 100  # number of timesteps
prediction_length = 24
freq = "1H"
custom_dataset = np.random.normal(size=(N, T))
start = pd.Timestamp("01-01-2019", freq=freq)  # can be different for each time series

Now for splitting dataset and converting it to gluonts format use following commands:

from gluonts.dataset.common import ListDataset
# train dataset: cut "prediction_length", add "target" and "start" fields
train_ds = ListDataset([{'target': x, 'start': start} 
                        for x in custom_dataset[:, :-prediction_length]],
                       freq=freq)
# test dataset: using whole dataset, add "target" and "start" fields
test_ds = ListDataset([{'target': x, 'start': start} 
                       for x in custom_dataset],
                      freq=freq)

Training

GlounTS comes with its own hyper parameters and feedforward neural network like SimpleFeedForwardEstimator that accepts an input window of length context_length and predicts the distribution of the value. Let;s import necessary training methods and assign estimators values using following commands:

from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.trainer import Trainer
estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer=Trainer(ctx="cpu", 
                    epochs=5, 
                    learning_rate=1e-3, 
                    num_batches_per_epoch=100
                   )
)
#start training
predictor = estimator.train(dataset.train)

Evaluate

For evaluating the models we have further make_evaluation_predictions function that automates the process of prediction and model evaluation.

from gluonts.evaluation.backtest import make_evaluation_predictions
forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,  # test dataset
    predictor=predictor,  # predictor
    num_samples=100,  # number of sample paths for evaluation
)

Convert the generators to list to ease the computations and examine the first element of these lists:

forecasts = list(forecast_it)
tss = list(ts_it)
# first entry of the time series list
ts_entry = tss[0]

Convert the first five value of time-series from pandas to NumPy and initialize first entry of dataset.test

np.array(ts_entry[:5]).reshape(-1,)
dataset_test_entry = next(iter(dataset.test))

Similarly first 5 values and forecast entries

dataset_test_entry['target'][:5]
forecast_entry = forecasts[0]

Output

For visualizing the outputs use following commands:

def plot_prob_forecasts(ts_entry, forecast_entry):
    plot_length = 150 
    prediction_intervals = (50.0, 90.0)
    legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]

    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    ts_entry[-plot_length:].plot(ax=ax)  # plot the time series
    forecast_entry.plot(prediction_intervals=prediction_intervals, color='g')
    plt.grid(which="both")
    plt.legend(legend, loc="upper left")
    plt.show()
plot_prob_forecasts(ts_entry, forecast_entry)
output of gluonts

You can also evaluate the quality of time series forecast using evaluator class, that can compute the aggregate performance metrics,

from gluonts.evaluation import Evaluator
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(dataset.test))
print(json.dumps(agg_metrics, indent=4))

PyTorch-ts

You can achieve similar results using a third party framework called PyTorch-ts, built by Zalando Research, that is specifically designed for PyTorch enthusiasts, Pytorch-ts is probabilistic Time Series forecasting framework based on GluonTS backend and its installation and usage are pretty easy, you can find the source code here, There very minimal changes in Pytorch-ts as it used the Pytorch time series model by utilizing GluonTS as its API for loading dataset, transforming and testing.

Installation

$ pip install pytorchts

Getting Started

We are going to use a dataset volume of tweets mentioning the AMZN ticker symbol, to leverage the power of this model, first import the necessary packages using below commands, Notebook is available at: https://github.com/mmaithani/data-science/blob/main/PyTorch_ts_time_series_forecasting(gluonts).ipynb

import pandas as pd
import torch
import matplotlib.pyplot as plt

from pts.dataset import ListDataset
from pts.model.deepar import DeepAREstimator
from pts import Trainer
from pts.dataset import to_pandas

Import the Amazon tweets dataset: and plot the first 100 data points using pandas and plt.

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show(}
timestamp fo twitter dataset

Train the dataset using Pytorch-ts, we are using the data up to midnight on April 5th, 2015, and a 5mins data so req is set to 5min with 15epochs and we are expecting prediction for next hour, so prediction_length is set to 12.

training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min")
# parameter initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=15,
                                            device=device))
predictor = estimator.train(training_data=training_data)

Model is trained so lets forecast the hour following the midnight on 15-04-2015:

test_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-15 00:00:00"]}],
    freq = "5min")
for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='b', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')
output gluonts and pytorchts forecasting

Conclusion

We have discussed time series forecasting using GluonTS a forecasting library explicitly made for probabilistic time series problems and the outputs were quite satisfactory. We saw the same approach using PytorchTs (PyTorch-based time series framework backed by Gluon) also the Gluon integrates many other features. There are third party libraries are also been made on top of GluonTS that we are not discussing in this article like pytorch-ts which is a PyTorch-based Probabilistic time series forecasting model based on GluonTS backend.

Working notebooks and other resources used in the above demonstration:

Share
Picture of Mohit Maithani

Mohit Maithani

Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.