21st-may-banner design

The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers

One of the most important aspects of the xLSTM architecture is its flexible ratio of MLSTM and SLSTM blocks.

Share

The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers
Listen to this story

Sepp Hochreiter, the inventor of LSTM, has unveiled a new LLM architecture, featuring a significant innovation: xLSTM which stands for Extended Long Short-Term Memory. The new architecture addresses a major weakness of previous LSTM designs, which were sequential in nature and unable to process all information at once.

Register for Rakuten Product Conference 2024 >

The weaknesses of LSTMs, compared to Transformers, include the inability to revise storage decisions, limited storage capacities, and the lack of parallelizability due to memory mixing. Unlike LSTMs, Transformers parallelise operations across tokens, significantly improving efficiency.

Click here to read the paper

The main components of the new architecture include a matrix memory for LSTM, eliminating memory mixing, and exponential gating. These modifications allow the LSTM to revise its memory more effectively when processing new data.

The xLSTM architecture boasts O(N) time complexity and O(1) memory complexity as the sequence length increases, making it much more efficient than Transformers, which have quadratic time and memory complexity (O(N^2)).

In evaluations comparing Transformer LLM, RWKV, and xLSTM trained on 15 billion tokens of text, the xLSTM[1:0] (1 mLSTM, 0 sLSTM blocks) performed the best. Moreover, xLSTM architecture follows scaling laws similar to traditional Transformer LLMs.

One of the most important aspects of the xLSTM architecture is its flexible ratio of MLSTM and SLSTM blocks. MLSTM, which stands for matrix memory parallelizable LSTMs, can operate over all tokens at once, similar to Transformers. On the other hand, SLSTM, while not parallelizable, enhances state tracking ability but slows down training and inference.

The xLSTM architecture builds upon the traditional LSTM by introducing exponential gating with memory mixing and a new memory structure. It performs favorably in language modeling compared to state-of-the-art methods such as Transformers and State Space Models. 

The scaling laws suggest that larger xLSTM models will be significant competitors to current Large Language Models built with Transformer technology. Additionally, xLSTM has the potential to impact various other deep learning fields including Reinforcement Learning, Time Series Prediction, and the modelling of physical systems.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe

Subscribe to our Youtube channel and see how AI ecosystem works.

There must be a reason why +150K people have chosen to follow us on Linkedin. 😉

Stay in the know with our Linkedin page. Follow us and never miss an update on AI!