Now Reading
Informer: LSTF(Long Sequence Time-Series Forecasting) Model

Informer: LSTF(Long Sequence Time-Series Forecasting) Model

informer time series forecasting

Time series forecasting is in the industry before AI and machine learning, and it is the most complex technique to solve and forecast with the help of traditional methods of using statistics for time series forecasting the data. But now as the neural network has been introduced and many CNN-based time series forecasting models have been developed over the years, you can see how accurate and easy it became to predict future values based on historical time-series data points. Now many daily use cases require future prediction like electricity consumption planning, Long short term memory(LSTM) is the one which is used for long-term forecasting.

But there are many problems with LSTM which leads to further research in LSTF, let’s address one with a real-world  dataset of electrical transformer station temperature readings:

  1. Short sequence forecasting predicts fewer future data points.
  2. Long sequence Forecasting predicts a more extended period of time for better policy planning and investment.
  3. Due to long future prediction, the capacity of the existing method limits the performance of the long sequence forecasting, i.e., after 48 the MSE rises aggressively high, and the inference speeds drop.
short sequence vs long sequence

Here the LSTM network predicts the temperature of the station on an hourly basis to a longer period of time, i.e. short term period (12 points, 0.5 days) to the long sequence forecasting(480 points, 20 days). As shown in the above Fig.(c) the performance gap is substantial when the period sequence length got greater than 48 points in long sequence forecasting, the MSE score got an unsatisfactory result, and there is a sharp drop in inference speed, and the LSTM model fails.

Solving Long Sequence time series forecasting(LSTF) is the major problem. Some new models have been developed like transformers that show superior performance in capturing long-range time series data than RNN(recurrent neural networks) models. The transformer takes a lot of GPU computing power, so using them on real-world LSTF problems is unaffordable.


So to solve this problem recently a new approach has been introduced, Informer. With a research paper called Informers: Beyond Efficient Transformers for Long Sequence, Time-Series Forecasting. It is written by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.

The team of these researchers came up with a solution to answer the questions:

  • Can Transformer models be improved to be more memory and architecture efficient?
  • Can still by optimizing the computation power of the transformer it can maintain higher prediction capacity?

The previous Transformer model for LSTF has three limitations:

  1. Quadratic computation of self-attention.
  2. Memory bottleneck in stacking layers for long inputs.
  3. Speed plunge in predicting long outputs.

To remove all of these issues Informer comprises some new features:

  • Purposed ProbSparse self-attention mechanism to remove the canonical self-attention and it achieves the o(L log L) time complexity and memory usage.
  • Enhanced the prediction capacity in the LSTF problem, which contains the Transformer-like model performance to capture an individual long-range dependency between time-series data.
  • Self-attention distilling operation privileges dominating attention scores.
  • Reduced space complexity to O((2 − e)L log L).
  • Introduced Generative Style Decoder to acquire long sequence output with only one forward step needed.

Graph of Informer Model

graph mode architecture of informer
Fig. 1

The left part is the encoder, and it is capable of receiving a massive amount of long sequence data inputs(the green series). Now as we discussed Informer removed the canonical self-attention with their purposes ProbSparse self-attention. The blue trapezoid is the self-attention distilling operation to extract dominating attention, which reduces the network size sharply.

The decoder receives the long sequence data inputs, pads the target elements into 0, measures the feature map, and instantly gives the predicted outputs(orange bars) in the generative style.

Informer’s Encoder architecture

  • The upper stack is the main stack, which receives the whole input sequence.
  • The second stack takes a half slice of input.
  • Each horizontal stack in fig.1 stands for individual encoder copies.
  • Red layers are dot product matrixes of the self-attention mechanism, and it gets cascade decrease by applying self-attention distilling on every layer.
  • 2 feature map stacks are concatenated as the encoder’s output.

Getting Started

Research paper techniques are coded by Jieqi Peng and the repo is recently published with initial started code and dependencies on GitHub, code is in its initial release it is written in PyTorch and for reproducing the testing result make sure to use good GPU local machine as the training time can vary.


  • Python 3.6+
  • Matplotlib – 3.1
  • numpy – 1.17.3
  • pandas – 0.25.1
  • scikit_learn – 0.21.3
  • torch – 1.2.0

Clone and install the dependencies

!git clone
%cd Informer2020
!pip install -r requirements.txt


We are going to use the ETT(Electricity Transformers temperature) dataset as the informer used in the paper was tested on three different datasets ECL(Electricity consuming load), Weather dataset, and ETT.

  1. Download the ETT dataset from here
  2. Copy all the CSV files to Informer2020/data/ETT/ folder.

Let’s see the data 

!pip install pandas
Import pandas as pd

Training & Testing

For training and testing the model with ProbSparse self-attention on ETTh1, ETTh2, and ETTm1 dataset respectively use the following commands and reproduce the research paper results:

See Also

Note: Training can take hours if your GPU isn’t powerful enough in the case of a local machine.

# ETTh1 dataset
! python -u --model informer --data ETTh1 --attn prob

# similarly ETTh2 dataset
! python -u --model informer --data ETTh2 --attn prob

python -u --model informer --data ETTm1 --attn prob


Univariate long sequence time-series forecasting evaluation results on all the methods on four datasets. The best result is in bold representation.

univariate time series forecasting

Univariate Forecasting results

multivariate time series forecasting


We have seen how the Informer removes the LSTF problem and provided the ProbSparse Self-attention mechanism, which achieves o(L log L) time complexity and memory usage. For downloading and training on your own machine download the notebook from here: and outputs and training methods will be updated by time.

Other datasets on which informer is been tested are as follows and you can reproduce the result on them:

Also if you are interested in reading more about time series forecasting then checkout PyTorch time series forecasting and TensorFlow time series forecasting. To contribute to Informer open source projects, you can visit the official repo at and contribute.

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top