Now Reading
Hands-On Guide To Darts – A Python Tool For Time Series Forecasting

Hands-On Guide To Darts – A Python Tool For Time Series Forecasting

Bhoomika Madhukar

Data collected over a certain period of time is called Time-series data. These data points are usually collected at adjacent intervals and have some correlation with the target. There are certain datasets that contain columns with date, month or days that are important for making predictions like sales datasets, stock price prediction etc. But the problem here is how to use the time-series data and convert them into a format the machine can understand? Python made this process a lot simpler by introducing a package called Darts. 

In this article, we will learn about Darts, implement this over a time-series dataset.

Introduction to Darts

For a number of datasets, forecasting the time-series columns plays an important role in the decision making process for the model. Unit8.co developed a library to make the forecasting of time-series easy called darts. The idea behind this was to make darts as simple to use as sklearn for time-series. Darts attempts to smooth the overall process of using time series in machine learning. 

The basic principles of darts are:



  1. There are two types of models in darts :

Regression models: these predict the output based on a set of input time-series.

Forecasting models: these predict a future output based on past values.

  1. They have a class called TimeSeries which is immutable like strings. 
  2. The TimeSeries class can either one single dimensional or multi-dimensional. Some models like neural networks need multiple dimensions while other simple models work with just 1 dimension.
  3. Methods like fit() and predict() are unified across all models from neural networks to ARIMA

Implementation of darts on time-series data

Darts is open-source and can be installed with the pip command. To install darts use:

pip install u8darts

Dataset

Next, choose any time-series dataset of your choice. I have selected the monthly production of beer in Australia dataset. To download this click here. Let us now load the dataset and import the libraries needed.


Stay Connected

Get the latest updates and relevant offers by sharing your email.

from google.colab import drive

drive.mount('/content/gdrive/')

import pandas as pd

from darts import TimeSeries

beer_data = pd.read_csv('/content/gdrive/My Drive/beer.csv')

beer_data.head()

darts

The dataset contains two columns- the month with the year and the beer production in that time period. 

Train-test split

Let us now use the TimeSeries class and split the data into train and test. We will use a method called from_dataframe for doing this and pass column names in the method. Then, we will split the data based on the time period. The dataset has around 477 columns, so I chose the 275th time period to make the split (1978-10).

get_data = TimeSeries.from_dataframe(beer_data, 'Month', 'Monthly beer production')

traindata, testdata = get_data.split_before(pd.Timestamp('1978-10'))

 Modelling

Training of the model is very simple with darts. An exponential smoothing model is used here to fit the data. Similar to sklearn, fit() method is used to fit the dataset. 

from darts.models import ExponentialSmoothing

beer_model = ExponentialSmoothing()

beer_model.fit(traindata)

See Also

This completes the training part. Let us now make predictions and plot the graph

prediction = beer_model.predict(len(test))

print("predicted" ,prediction[:5])

print("actual",test[:5])

darts

import matplotlib.pyplot as plt

get_data.plot(label='actual')

prediction.plot(label='predict', lw=3)

plt.legend()

time-series

Here the monthly values after 1978 are forecasted due to the model exponential smoothing. It shows the time-series predictions with good accuracy.

Darts can also be used in neural networks, multivariate models and clustering models. 

Conclusion

In this article, we saw how to use the darts library to forecast time-series problems with just a few simple lines of code. The library is fast and saves time when compared to the Pandas library. The library also contains options for backtesting, regression models and even automatically select models. It is a great way to handle time-series datasets.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top