Identifying anomalies in time series data can be daunting, thanks to the vague definition of anomalies, lack of labelled data, and highly complex temporal correlations. Currently, the machine learning method used for anomaly detection faces scalability and portability issues, resulting in false-positives. And, therein lies the problem.
In a recent paper, MIT researchers proposed an unsupervised anomaly detection approach — TadGAN, that allows time series reconstruction and effectively flag anomalies in the data. Built on generative adversarial networks (GANs), Time Series GAN or TadGAN has been trained with cycle consistency loss to allow for effective time-series data reconstruction.
With a claim to outperform baseline methods in most cases, the researchers are planning to present this novel framework at the upcoming IEEE BigData conference. The research was done in collaboration with satellite company SES, looking to leverage a deep learning approach to analyse vast time-series data from communication satellites.
How does TadGAN work?
According to the researchers, there are two types of anomalies in time series data — point anomaly and collective anomaly. To flag both anomalies in time series domain, the researchers relied on GAN architecture, often used for image analysis, to generate time series sequences and outperform state-of-the-art benchmarks. Using the generator and discriminator functioning of the unsupervised learning of GAN architecture, the proposed model was able to flag anomalous data points.
Illustration of time series anomaly detection using unsupervised learning
The researchers implemented five of the most recent deep learning techniques and compared their performances with a baseline method from the 1970s, ARIMA. While some deep learning methods were able to beat ARIMA on 50% of the datasets, two failed to outperform it at all, because of its ability to fit anomalous data well. But, GANs, in such situations, may fail to fully capture the data’s hidden distribution, thus causing false alarms.
To safeguard against these false alarms, the team added an autoencoder algorithm to their GAN architecture for creating a more nuanced approach. By combining the unsupervised deep learning and GAN architecture, the researchers created a perfect anomaly detection approach that doesn’t raise false positives.
To evaluate this new method, the researchers ran anomaly detection tests on 11 combined datasets of real and synthetics data, playing off TadGAN against methods developed by tech giants like Amazon, Microsoft, etc. The datasets included — two datasets from spacecraft telemetry signals by NASA; four sub-datasets from Yahoo S5; and a Numenta Anomaly Benchmark dataset. The team noted that the proposed method outperformed ARIMA in eight out of 11 datasets.
Comparisons of the scores of baseline models across all datasets to ARIMA.
Additionally, to make the approach widely usable, the team not only open-sourced the system and decided to issue periodic updates, but also developed an anomaly detection benchmarking system for developers to compare the performances of different models.
By introducing cycle-consistent GAN architectures for time series data for the first time, the researchers systematically investigate how to utilise ‘Critic’ and ‘Generator’ outputs for anomaly score computation. The paper outlines the problem of time series anomaly detection and a description of the GAN Model’s workings.
The Highlights Of The Research
- A novel unsupervised cycle-consistent GAN-reconstruction based method has been proposed for time series anomaly detection.
- The approach leveraged the outputs from GANS’s ‘Generator’ and ‘Critic’ to compute robust anomaly scores at every time step.
- The researchers conducted an extensive evaluation using 11 time-series datasets from NASA, Yahoo, and Numenta, and showed the proposed approach outperforms eight other baselines.
- The researchers made the TadGAN code available free — to be extended with additional approaches and datasets1. The benchmark currently includes nine anomaly detection pipelines, 13 datasets, and two evaluation mechanisms.
With this proposed approach of Time Series GAN or TadGAN, that outperformed baseline models, the researchers hope to serve a wide variety of industries like BFSI, healthcare, energy, cloud computing and the space sector. The method can help companies advance the performance of their computer apps and track time-series signals in data centres to avoid service breaks. Further, according to MIT’s official news, the team is working on offering the Time Series GAN in a user interface to provide developers with an advanced time series analysis.
Read the paper here.