# Merlion – Salesforce’s Latest Time Series Library. How To Use It With Python Code

Cloud-based software company, Salesforce released Merlion this month, an open-source Python library for time series intelligence.

Cloud-based software company, Salesforce released Merlion this month, an open-source Python library for time series intelligence. It is used for time series analysis and provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. Along with this, we will also learn to implement anomaly detection in time series using Merlion. The major points to be discussed in this article are listed below.

1. What is Merlion?
2. Key Features of Merlion
3. Architectural Arrangement of Merlion
4. Implementation of Anomaly Detection using Merlion

#### What is Merlion?

It is an open-source time-series machine learning library that has a uniform interface for various commonly used models and datasets for anomaly detection and forecasting on univariate and multivariate time series, as well as conventional pre/post-processing layers. It includes numerous modules to increase use, such as visualization, anomaly score calibration to improve interoperability, AutoML for hyperparameter tuning and model selection, and model assembly.

Merlion also offers a one-of-a-kind evaluation system that replicates live model deployment and re-training in production. This library intends to provide engineers and researchers with a one-stop solution for fast developing and benchmarking models for their specific time series needs across numerous time-series datasets.

#### Key Features of Merlion

It provides an end-to-end machine learning framework that covers data loading and transformation, model development and training, model output post-processing, and model performance evaluation. Apart from these Merlion is:

1. A standardized and easily expandable framework for data loading, pre-processing, and benchmarking has been designed to support a wide range of time series forecasting and anomaly detection operations.
2. A set of models for anomaly detection and forecasting that are linked through a common interface. Among the models are traditional statistical approaches, tree ensembles, and deep learning methods. Advanced users can tailor each model to their preferences.
3. Abstracts that are efficient, robust, and provide a starting point for new users Models such as DefaultDetector and DefaultForecaster.
4. AutoML is a model selection and hyperparameter tuning tool.
5. Practical, industry-inspired post-processing rules for anomaly detectors that improve the interpretability of anomaly scores while lowering the false positive rate.
6. Ensembles that are simple to use and integrate the results of numerous models to generate more robust performance.
7. Model predictions can be visualized natively.

#### Architectural Arrangement of Merlion

Merlion’s module architecture is divided into five layers:-

• The data layer loads raw data, converts it to Merlion’s TimeSeries data structure, and performs any desired pre-processing.
• The modelling layer supports a variety of models for forecasting and anomaly detection, including autoML for automated hyperparameter tuning.
• The postprocessing layer offers practical solutions for improving interoperability and lowering the false positive rate of anomaly detection models.
• The next ensemble layer allows for transparent model selection and combining.
• The final evaluation layer includes important evaluation metrics and algorithms that emulate a model’s live deployment in production.

Merlion employs a wide range of models for forecasting and anomaly detection. Among these are statistical methods, tree-based models, and deep learning approaches. To transparently expose all of these possibilities to an end-user, the engineering team has unified all Merlion models under two common APIs, one for forecasting and the other for anomaly detection. All models start with a config object containing implementation-specific hyperparameters and support a model. method train(time series). Now let’s move to the implementation part where we implement anomaly detection and Forecasting a series.

#### Implementation of Anomaly Detection using Merlion

Merlion includes a number of models that are optimized for detecting univariate time series anomalies. These are classified into two types: forecasting-based and statistical. Forecasters in Merlion are simple to modify for anomaly identification because they predict the value of a specified univariate in a generic time series. The anomaly score is just the difference between the expected and true-time series values, optionally normalized by the predicted standard error of the underlying forecaster (if it produces one).

To start using merlion first we need to install it, we can install it either by using the PIP command or by cloning the repository. Check here for the instructions for installing the package.

Merlion comes with a data loader package called ts_dataset it basically implements certain python-based Classes which help to manipulate numerous time-series datasets into standardized pandas data frames. The submodules of it like ts_dataset.anomaly and ts_dataset.forecast are used to load the dataset for anomaly detection and forecasting a series respectively.

For anomaly detection, we are using the NAB(Numenta Anomaly Benchmark) dataset. NAB is a new benchmark for evaluating algorithms in streaming, real-time applications for anomaly detection. It consists of more than 50 labeled real-world and artificial time series data files. We are using Merlion’s standard data class called TimeSeries from the subpackage utils which can handle both univariate and multivariate time series data. This class wraps a collection of Univariate time series in a single class.

The below code shows the use case of both ts_dataset and TimeSeries class, and we are splitting the NAB train and test set and will take a glimpse of the obtained time series.

from merlion.utils import TimeSeries
from ts_datasets.anomaly import NAB



Merlion’s DefaultDetector, which is an anomaly detection model that balances performance and efficiency, may now be initialized and trained. On the test split, we also get its predictions.

from merlion.models.defaults import DefaultDetectorConfig, DefaultDetector
# initialize,train, and test the detector
model = DefaultDetector(DefaultDetectorConfig())
model.train(train_data=train_data)
test_pred = model.get_anomaly_label(time_series=test_data)


Now visualize the prediction, for visualization merlion comes with a visualization package that gives us a very interactive and informative visualization of our predictions.

from merlion.plot import plot_anoms
import matplotlib.pyplot as plt
fig, ax = model.plot_anomaly(time_series=test_data)
plot_anoms(ax=ax, anomaly_labels=test_labels)
plt.show()


Finally, we may assess the model quantitatively by using the evaluate package. Merlion’s evaluation implements utility and metrics by which we can evaluate the performance of our time series task.

As we can see in the plot, the model fired three alarms, with three true positives,  and one false negative, resulting in precision and recall. By using the evaluation package we can also look at the average time it took the model to accurately detect each abnormality as shown below.

from merlion.evaluate.anomaly import TSADMetric
#Precision Score
# Recall Score
# F1 Score
# returns mean time taken to detect anomaly
print(f"Precision: {p:.4f}, Recall: {r:.4f}, F1: {f1:.4f}\n"
f"Mean Time To Detect: {mttd}")


Output:

#### Conclusion

We have seen how seamlessly we can implement an anomaly detection task. For a wide range of models and datasets, it provides uniform, easily expandable interfaces, and implementations. Similarly, we can forecast a series just by changing the configuration for the forecasting model and dataset for which we want a forecast. You can check the implementation in the Colab notebook.

## More Great AIM Stories

### Guide To OpenPyXL: A Python Module For Excel

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

##### MORE FROM AIM

LTI and Mindtree both play in Analytics services businesses, just like most other large IT/ITes service providers. But, what would the analytics services business of the merged entity look like?

##### GitHub now offers math support in markdown

GitHub’s math rendering capability uses MathJax; an open-source, JavaScript-based display engine.

Meta recently organised messaging event called ‘Conversations.’

##### Wipro announces 40,000 sq.ft. Innovation Studio in Texas

The studio will leverage Wipro’s deep reservoir of IPs, patents, and innovation DNA.

##### Google’s facial recognition tech to replace smart cards in Bengaluru metro trains￼

BMRCL plans to introduce the technology at its automatic fare collection gates.

##### Data science hiring process at DealShare

In the next few months, DealShare looks to grow its data science team by 15-20 members.

##### DeepMind’s AlphaFold 2 is half of the story

The idea was if I give you a sequence of amino acids, can you predict what will be the structure or the shape that it will take in the 3D space?

##### Lenskart invests USD 2 Mn in location intelligence platform GeoIQ

GeoIQ’s AI-based location tool will help Lenskart with its aggressive store rollout strategy.

##### TensorFlow v2.9 released: Major highlights

The main highlights of this release are performance enhancement with oneDNN and the release of a new API for model distribution, called DTensor