Time series analysis and forecasting are the tasks in machine learning that require complex, time-consuming, and huge efforts for analyzing the data. Keeping these efforts in mind, researchers at Facebook (now Meta) have created a library called Kats. This library can be very useful and easy to use in the context of performance and high-level codes. In this article, we will discuss the functionality of the Kats library along with its implementation in some useful time series analysis tasks. The major points to be discussed in this article are listed below.
Table of contents
- What is Kats?
- Time series analysis with Kats
- Forecasting with Kats
- Detections with Kats
- Feature extraction with Kats
Let’s start with understanding what Kats is.
Sign up for your weekly dose of what's up in emerging technology.
What is Kats?
Kats stands for Kits to Analyze Time Series, which was developed by the researchers at Facebook, now Meta. One of the most important things about Kats is that it is very easy to use. Also, it is a very light weighted library of generic time series analysis in a very generalized nature. That means we can easily use it without spending so much time processing time series and calculations in different models.
This library provides us with models from traditional to advanced timings. Using the set of algorithms from Kats we can perform the following things in the time series analysis field:
- Forecasting: With this library, we can utilize 10+ models for forecasting that includes ensembling methods and self-supervised learning models
- Detection: With this library, we also get facilities using which we can detect patterns, seasonality, outlier, change point, and slow trend changes.
- Feature extraction and embedding: In various time-series analyses we find the usage of feature extraction and using Kats provided modules we can extract 65 features with their clear statistical definition. These features can also be used with other machine learning models like regression and classification models.
By looking at the above points, we can say that Kats can be a very important part of our time series analysis project. In the next sections, we will look at how easy it is to use this library. Before going to perform any operation we are required to install this library in our environment. In the Google Colab environment, we can easily install it using the following lines of code.
!pip install kats
In the above output, we can see the list of packages that are required to install this library.
Time series analysis with Kats
Now we are ready to use this library. Let’s start our time series analysis using Kits.
Loading the data
In various libraries, we find that the style of data is required to be set according to the module of the library. This library is different in that we can perform our analysis in time series using the normal Pandas data frame. In this article, we are going to use the air passenger data set, and using this data we are going to perform some analysis using the modules and functionality of Kats. We can find a copy of the airpassenger.csv dataset here.
import pandas as pd import numpy as np df = pd.read_csv("/content/air_passengers.csv") df.columns = ["time", "value"] df.head()
Here in the dataset, we can see that we have dates in one column and the number of passengers in other columns. After loading the data we will use our first module from the Kats library. The name of the module is TimeSeriesData. This module helps us in creating the object for time series in the required form of the library. We can find this module in the kat.consts part.
from kats.consts import TimeSeriesData df = TimeSeriesData(df) print(type(df))
Here we can see that the time series is a TimeSeriesData object. Since we are working on time series data there is always a requirement for one column to be time and the other to be different values. In any case of confusion where we have multiple variables, we can define them by name in this module, like following lines of code.
df_from_series = TimeSeriesData(time=df.time, value=df.value)
Using this module we can also perform various changes on the data type of time series. For example, we can change the standard datetime to pandas timestamp, str or int. With all these, we get facilities of different operations that can be used in time series analysis such as,
- Math operation
- Utilities like (to_dataframe, to_array, is_empty, is_univariate)
Let’s cross-check it by slicing and plotting our data.
# Creating two slices ts_1 = df[0:3] ts_2 = df[3:7] ts_1.extend(ts_2) ts_1
Let’s plot our data
Here we have plotted our time series and it took only one line of code. Now after analyzing our dataset we are ready to make forecasts using any of the models.
Forecasting with Kats
In the above section, we have discussed the module TimeSeriesData of Kats. in this section we will discuss the modelling procedure which we are required to follow to forecast some values.
Normally, in time series forecasting we find the usage of different models like ARIMA family models, VAR models, etc. using the Kits library we can perform the following time series modelling:
In this library, we have special modules for all of the above-given models that follow the sklearn model API pattern. That means we perform modelling using the following steps:
- Instantiate the model instance
- Fitting the data in the instance
- Predict the forecast values
For example, we can perform the time series modelling with the prophet model. Let’s start with importing the classes for the prophet model.
from kats.models.prophet import ProphetModel from kats.models.prophet import ProphetParams
Creating the instance of parameters of the model:
params = ProphetParams(seasonality_mode='multiplicative')
Creating an instance of prophet model
m = ProphetModel(df, params)
Fitting the model using fit function
Here in the output, we can see some of the information regarding our modelling that depends on our parameter setting.
Let’s Forecast some of the values using the fitted model using the predict function.
forecast = m.predict(steps=30, freq="MS") forecasting.head()
In the above output, we can see our prediction, where we have some upper and lower values of the prediction. This means that we have a range in between our prediction of what lies ahead. That is a good way of making predictions. Let’s visualize our prediction.
Here in the visualization, we can see our predicted value. Or we can say the predicted values from the library because we haven’t put so much effort into making predictions.
In the above section, we have seen how easily we can perform time series modelling using the Kats library. Now we will discuss the functionality of detection methods.
Detection with Kats
In this section, we will discuss the functions and functionality that Kats provides us to perform on the time series data. We mainly use these methods to find out the patterns of the time series. In summarization we can utilize the following functionality from Kats for detection:
- Outlier detection: Using the OutlierDtector module we can find out abnormal spikes in the time series.
- Changepoint detection: Changepoint can be considered as a point where we find abnormalities in time series. Before and after this point the time series is normal. Using the following algorithms we can find such points.
- CUSUM Detection
- Bayesian Online Change Point Detection (BOCPD)
- Stat Sig Detection
- Trend change detection: using the MKDetector module we can find out the change in the trend of any time series. Mathematically this module follows the Mann-Kendall detection algorithm.
In the above, we have discussed different detection functionality that can be used with the Kat library. In this article, we will look at how we can use OutlierDetector for generating a generalized understanding.
Using the below lines of code we can add outliers in our air passenger data set.
outlier_df = df.copy(deep=True) outlier_df.loc[outlier_df.time == '1950-12-01','value']*=5 outlier_df.loc[outlier_df.time == '1959-12-01', 'value']*=4 outlier_df.plot()
Here in the output, we can see that we have added some outliers in our dataset. After converting the data into a TimeSeriesData object we will be ready to detect outliers.
outlier_ts = TimeSeriesData(outlier_df) print(type(outlier_ts))
Calling and fitting OutlierDetector
from kats.detectors.outlier import OutlierDetector ts_outlier = OutlierDetector(outlier_ts, 'additive') ts_outlier.detector() ts_outlier.outliers
Here we can see our outlier and to delete it from data we can use the remover function of the same module.
outliers_removed = ts_outlier.remover(interpolate = False) outliers_removed.plot(cols=['y_0'])
In the output, we can see missing values in the data. We can also convert them to linear values using the parameter interpolate = true.
Here we have seen how we can use detection functionality from the Kats library. Now in the next section, we will see how we can extract features from data using the Kats library.
Feature extraction with Kats
Features of any time series play an important role in time series analysis and the accuracy of time series forecasting. Using the TsFeature of the Kats library we can calculate various features of any time series. Some of the examples of features are as follows:
- Level shift
- ACF and PACF
Let’s see how many others are there that we can extract.
Instantiating feature extractor instance
from kats.tsfeatures.tsfeatures import TsFeatures tsFeatures = TsFeatures()
Displaying the list of features
Here in the output, we can see extracted features from the TsFeatures module. In time series modelling these features can also be extracted using the features of the R language.
In this article, we have seen some of the basic steps that are required to follow in time series analysis. For taking those steps, we used the Kats library and saw how easily we have performed all of them. I encourage readers to use this library in their time series analysis projects and find what else we can do with it.