Last updated January 31, 2024
In AI News & Update

Eagle 7B RNN Model Surpasses Transformers in Performance

For the first time RNN models outperformed Transformer counterparts.

Share

Published on January 29, 2024

by K L Krithika

The open-source community has introduced Eagle 7B, a new RNN model, built on the RWKV-v5 architecture. This new model has been trained on 1.1 trillion tokens and supports over 100 languages. The RWKV architecture, short for ‘Rotary Weighted Key-Value,’ is a type of architecture used in the field of artificial intelligence, particularly in natural language processing (NLP) and is a variation of the Recurrent Neural Network (RNN) architecture.

Introducing Eagle-7B

Based on the RWKV-v5 architecture, bringing into opensource space, the strongest
– multi-lingual model
(beating even mistral)
– attention-free transformer today
(10-100x+ lower inference)

With comparable English performance with the best 1T 7B models pic.twitter.com/hWtEMC1264
— RWKV (@RWKV_AI) January 29, 2024

Eagle 7B promises lower inference cost and stands out as a leading 7B model in terms of environmental efficiency and language versatility.

The model, with its 7.52 billion parameters, shows excellent performance in multi-lingual benchmarks, setting a new standard in its category. It competes closely with larger models in English language evaluations and is distinctive as an “Attention-Free Transformer,” though it requires additional tuning for specific uses. This model is accessible under the Apache 2.0 license and can be downloaded from HuggingFace for both personal and commercial purposes.

In terms of multilingual performance, Eagle 7B has claimed to have achieved notable results in benchmarks covering 23 languages. Its English performance has also seen significant advancements, outperforming its predecessor, RWKV v4, and competing with top-tier models.

Working towards a more scalable architecture and use of data efficiently, Eagle 7B is a more inclusive AI technology, supporting a broader range of languages. This model challenges the prevailing dominance of transformer models by demonstrating the capabilities of RNNs like RWKV in achieving superior performance when trained on comparable data volumes.

In the RWKV model, the rotary mechanism transforms the input data in a way that helps the model better understand the position or or order of elements in a sequence. The weighted key value also makes the model efficient by retrieving the stored information from previous elements in a sequence.

However, questions remain about the scalability of RWKV compared to transformers, although there is optimism regarding its potential. The team plans to include additional training, an in-depth paper on Eagle 7B, and the development of a 2T model.

Access all our open Survey & Awards Nomination forms in one place

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.