MITB Banner

Hands-On Guide To Darts – A Python Tool For Time Series Forecasting

In this article, we will learn about Darts, implement this over a time-series dataset.

Share

Data collected over a certain period of time is called Time-series data. These data points are usually collected at adjacent intervals and have some correlation with the target. There are certain datasets that contain columns with date, month or days that are important for making predictions like sales datasets, stock price prediction etc. But the problem here is how to use the time-series data and convert them into a format the machine can understand? Python made this process a lot simpler by introducing a package called Darts. 

In this article, we will learn about Darts, implement this over a time-series dataset.

Introduction to Darts

For a number of datasets, forecasting the time-series columns plays an important role in the decision making process for the model. Unit8.co developed a library to make the forecasting of time-series easy called darts. The idea behind this was to make darts as simple to use as sklearn for time-series. Darts attempts to smooth the overall process of using time series in machine learning. 

The basic principles of darts are:

  1. There are two types of models in darts :

Regression models: these predict the output based on a set of input time-series.

Forecasting models: these predict a future output based on past values.

  1. They have a class called TimeSeries which is immutable like strings. 
  2. The TimeSeries class can either one single dimensional or multi-dimensional. Some models like neural networks need multiple dimensions while other simple models work with just 1 dimension.
  3. Methods like fit() and predict() are unified across all models from neural networks to ARIMA

Implementation of darts on time-series data

Darts is open-source and can be installed with the pip command. To install darts use:

pip install u8darts

Dataset

Next, choose any time-series dataset of your choice. I have selected the monthly production of beer in Australia dataset. To download this click here. Let us now load the dataset and import the libraries needed.

from google.colab import drive

drive.mount('/content/gdrive/')

import pandas as pd

from darts import TimeSeries

beer_data = pd.read_csv('/content/gdrive/My Drive/beer.csv')

beer_data.head()

darts

The dataset contains two columns- the month with the year and the beer production in that time period. 

Train-test split

Let us now use the TimeSeries class and split the data into train and test. We will use a method called from_dataframe for doing this and pass column names in the method. Then, we will split the data based on the time period. The dataset has around 477 columns, so I chose the 275th time period to make the split (1978-10).

get_data = TimeSeries.from_dataframe(beer_data, 'Month', 'Monthly beer production')

traindata, testdata = get_data.split_before(pd.Timestamp('1978-10'))

 Modelling

Training of the model is very simple with darts. An exponential smoothing model is used here to fit the data. Similar to sklearn, fit() method is used to fit the dataset. 

from darts.models import ExponentialSmoothing

beer_model = ExponentialSmoothing()

beer_model.fit(traindata)

This completes the training part. Let us now make predictions and plot the graph

prediction = beer_model.predict(len(test))

print("predicted" ,prediction[:5])

print("actual",test[:5])

darts

import matplotlib.pyplot as plt

get_data.plot(label='actual')

prediction.plot(label='predict', lw=3)

plt.legend()

time-series

Here the monthly values after 1978 are forecasted due to the model exponential smoothing. It shows the time-series predictions with good accuracy.

Darts can also be used in neural networks, multivariate models and clustering models. 

Conclusion

In this article, we saw how to use the darts library to forecast time-series problems with just a few simple lines of code. The library is fast and saves time when compared to the Pandas library. The library also contains options for backtesting, regression models and even automatically select models. It is a great way to handle time-series datasets.

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.