DeepMind’s language model RETRO proves bigger is not always better

All Hail The King of Reinforcement Learning, DeepMind

Large language models are all the rage now, especially after the launch of Gpt-3. Ever since, AI powerhouses have come up with bigger and more sophisticated language models to push the frontiers of the NLP game. Now, DeepMind has proposed a smaller and less expensive language model, dubbed RETRO, with an aim to address the shortcomings of large language models.

Does size matter?

The dominant approach to building better LLMs has been to train the models on huge datasets. DeepMind itself has built a transformer language model, Gopher, with 280 billion parameters—which managed to halve the accuracy gap from GPT-3 to human expert performance.

Generally, the larger the model is, the more information it can consume during training, and the better it is at making predictions. Unfortunately, larger LLMs also require significantly more computer power to train, making them inaccessible to smaller organisations.

Researchers at DeepMind found that while scale led to improved ability for certain tasks (such as reading comprehension), areas such as mathematical and logical reasoning saw no substantial improvement. AI ethicist Timnit Gebru pointed out that the colossal size of these models make them more impenetrable than the average neural network. 

What’s RETRO?

RETRO has the ability to run a systematic search for information instead of memorising huge datasets, leading to lower training costs.

DeepMind claimed RETRO (which stands for “Retrieval Enhanced Transformer”) could perform at the same level as neural networks 25 times its size—despite taking less time, energy, and computing power to train. 

While LLMs like GPT-3 and Microsoft’s Megatron-Turing have 175 billion and 530 billion parameters respectively, RETRO functions on just 7 billion parameters. 

The model achieves this level of performance using an external memory in the form of a giant database covering 2 trillion passages of text it reviews to generate new sentences. The dataset is sourced from news articles, Wikipedia pages, books, and texts from GitHub, and comprises ten languages including Urdu, Russian, Chinese, and English.

The model takes after the human brain, which rely on dedicated memory mechanisms to learn new things. This allows the AI to look up information on an expansive database in the same way humans are able to use search engines. The idea behind this technology isn’t new, but it’s the first time that a look-up system has been developed for a language model.  

Ethical alternative to LLMs 

According to DeepMind, RETRO might be able to address the bias problem of LLMs. In contrast to the inner workings of most AI models, which are imperceptible and mysterious, the pieces of external data that RETRO refers to are readily available. Therefore, theoretically, it should be easier to discover and resolve what the AI has learned by scrutinizing the database over searching the neural network. Conversely, it would also be easier to tackle misinformation because of how much easier it is to add new information to the external data on RETRO. 
“There’s still a lot we don’t know about how to safely and productively manage models at current scales, and that’s probably going to get harder with scale in many ways, even as it gets easier in some,” said New York University professor Sam Bowman.

More Great AIM Stories

Srishti Mukherjee
Drowned in reading sci-fi, fantasy, and classics in equal measure; Srishti carries her bond with literature head-on into the world of science and tech, learning and writing about the fascinating possibilities in the fields of artificial intelligence and machine learning. Making hyperrealistic paintings of her dog Pickle and going through succession memes are her ideas of fun.
MORE FROM AIM
Meeta Ramnani
How bias creeps into large language models

According to DeepMind, unmodified LMs tend to assign high probabilities to exclusionary, biased, toxic, or sensitive utterances if such language is present in the training data.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM