Now Reading
Merlion – Salesforce’s Latest Time Series Library. How To Use It With Python Code

Merlion – Salesforce’s Latest Time Series Library. How To Use It With Python Code

Cloud-based software company, Salesforce released Merlion this month, an open-source Python library for time series intelligence. It is used for time series analysis and provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. Along with this, we will also learn to implement anomaly detection in time series using Merlion. The major points to be discussed in this article are listed below.  

Table of Contents

  1. What is Merlion?
  2. Key Features of Merlion
  3. Architectural Arrangement of Merlion
  4. Implementation of Anomaly Detection using Merlion

 Let’s start with understanding the Merlion package.

Register for our Workshop on How To Start Your Career In Data Science?

What is Merlion?

It is an open-source time-series machine learning library that has a uniform interface for various commonly used models and datasets for anomaly detection and forecasting on univariate and multivariate time series, as well as conventional pre/post-processing layers. It includes numerous modules to increase use, such as visualization, anomaly score calibration to improve interoperability, AutoML for hyperparameter tuning and model selection, and model assembly. 

Merlion also offers a one-of-a-kind evaluation system that replicates live model deployment and re-training in production. This library intends to provide engineers and researchers with a one-stop solution for fast developing and benchmarking models for their specific time series needs across numerous time-series datasets.

Key Features of Merlion

It provides an end-to-end machine learning framework that covers data loading and transformation, model development and training, model output post-processing, and model performance evaluation. Apart from these Merlion is:

  1. A standardized and easily expandable framework for data loading, pre-processing, and benchmarking has been designed to support a wide range of time series forecasting and anomaly detection operations.
  2. A set of models for anomaly detection and forecasting that are linked through a common interface. Among the models are traditional statistical approaches, tree ensembles, and deep learning methods. Advanced users can tailor each model to their preferences.
  3. Abstracts that are efficient, robust, and provide a starting point for new users Models such as DefaultDetector and DefaultForecaster.
  4. AutoML is a model selection and hyperparameter tuning tool. 
  5. Practical, industry-inspired post-processing rules for anomaly detectors that improve the interpretability of anomaly scores while lowering the false positive rate. 
  6. Ensembles that are simple to use and integrate the results of numerous models to generate more robust performance. 
  7. Model predictions can be visualized natively.

Architectural Arrangement of Merlion

Merlion’s module architecture is divided into five layers:-

  • The data layer loads raw data, converts it to Merlion’s TimeSeries data structure, and performs any desired pre-processing.  
  • The modelling layer supports a variety of models for forecasting and anomaly detection, including autoML for automated hyperparameter tuning. 
  • The postprocessing layer offers practical solutions for improving interoperability and lowering the false positive rate of anomaly detection models. 
  • The next ensemble layer allows for transparent model selection and combining. 
  • The final evaluation layer includes important evaluation metrics and algorithms that emulate a model’s live deployment in production.

Merlion employs a wide range of models for forecasting and anomaly detection. Among these are statistical methods, tree-based models, and deep learning approaches. To transparently expose all of these possibilities to an end-user, the engineering team has unified all Merlion models under two common APIs, one for forecasting and the other for anomaly detection. All models start with a config object containing implementation-specific hyperparameters and support a model. method train(time series). Now let’s move to the implementation part where we implement anomaly detection and Forecasting a series.   

Implementation of Anomaly Detection using Merlion

Merlion includes a number of models that are optimized for detecting univariate time series anomalies. These are classified into two types: forecasting-based and statistical. Forecasters in Merlion are simple to modify for anomaly identification because they predict the value of a specified univariate in a generic time series. The anomaly score is just the difference between the expected and true-time series values, optionally normalized by the predicted standard error of the underlying forecaster (if it produces one).

To start using merlion first we need to install it, we can install it either by using the PIP command or by cloning the repository. Check here for the instructions for installing the package. 

Merlion comes with a data loader package called ts_dataset it basically implements certain python-based Classes which help to manipulate numerous time-series datasets into standardized pandas data frames. The submodules of it like ts_dataset.anomaly and ts_dataset.forecast are used to load the dataset for anomaly detection and forecasting a series respectively. 

For anomaly detection, we are using the NAB(Numenta Anomaly Benchmark) dataset. NAB is a new benchmark for evaluating algorithms in streaming, real-time applications for anomaly detection. It consists of more than 50 labeled real-world and artificial time series data files. We are using Merlion’s standard data class called TimeSeries from the subpackage utils which can handle both univariate and multivariate time series data. This class wraps a collection of Univariate time series in a single class. 

The below code shows the use case of both ts_dataset and TimeSeries class, and we are splitting the NAB train and test set and will take a glimpse of the obtained time series.

See Also

from merlion.utils import TimeSeries
from ts_datasets.anomaly import NAB
 
time_series, metadata = NAB(subset='realTweets')[5]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])

Merlion’s DefaultDetector, which is an anomaly detection model that balances performance and efficiency, may now be initialized and trained. On the test split, we also get its predictions.

from merlion.models.defaults import DefaultDetectorConfig, DefaultDetector
# initialize,train, and test the detector
model = DefaultDetector(DefaultDetectorConfig())
model.train(train_data=train_data)
test_pred = model.get_anomaly_label(time_series=test_data)

Now visualize the prediction, for visualization merlion comes with a visualization package that gives us a very interactive and informative visualization of our predictions.

from merlion.plot import plot_anoms
import matplotlib.pyplot as plt
fig, ax = model.plot_anomaly(time_series=test_data)
plot_anoms(ax=ax, anomaly_labels=test_labels)
plt.show()

Finally, we may assess the model quantitatively by using the evaluate package. Merlion’s evaluation implements utility and metrics by which we can evaluate the performance of our time series task.

As we can see in the plot, the model fired three alarms, with three true positives,  and one false negative, resulting in precision and recall. By using the evaluation package we can also look at the average time it took the model to accurately detect each abnormality as shown below.

from merlion.evaluate.anomaly import TSADMetric
#Precision Score
p = TSADMetric.Precision.value(ground_truth=test_labels, predict=test_pred)
# Recall Score
r = TSADMetric.Recall.value(ground_truth=test_labels, predict=test_pred)
# F1 Score
f1 = TSADMetric.F1.value(ground_truth=test_labels, predict=test_pred)
# returns mean time taken to detect anomaly
mttd = TSADMetric.MeanTimeToDetect.value(ground_truth=test_labels, predict=test_pred)
print(f"Precision: {p:.4f}, Recall: {r:.4f}, F1: {f1:.4f}\n"
      f"Mean Time To Detect: {mttd}")

Output:

Conclusion 

We have seen how seamlessly we can implement an anomaly detection task. For a wide range of models and datasets, it provides uniform, easily expandable interfaces, and implementations. Similarly, we can forecast a series just by changing the configuration for the forecasting model and dataset for which we want a forecast. You can check the implementation in the Colab notebook.  

References 


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top