Large language models are all the rage now, especially after the launch of Gpt-3. Ever since, AI powerhouses have come up with bigger and more sophisticated language models to push the frontiers of the NLP game. Now, DeepMind has proposed a smaller and less expensive language model, dubbed RETRO, with an aim to address the shortcomings of large language models.
Does size matter?
The dominant approach to building better LLMs has been to train the models on huge datasets. DeepMind itself has built a transformer language model, Gopher, with 280 billion parameters—which managed to halve the accuracy gap from GPT-3 to human expert performance.
Sign up for your weekly dose of what's up in emerging technology.
Generally, the larger the model is, the more information it can consume during training, and the better it is at making predictions. Unfortunately, larger LLMs also require significantly more computer power to train, making them inaccessible to smaller organisations.
Researchers at DeepMind found that while scale led to improved ability for certain tasks (such as reading comprehension), areas such as mathematical and logical reasoning saw no substantial improvement. AI ethicist Timnit Gebru pointed out that the colossal size of these models make them more impenetrable than the average neural network.
RETRO has the ability to run a systematic search for information instead of memorising huge datasets, leading to lower training costs.
DeepMind claimed RETRO (which stands for “Retrieval Enhanced Transformer”) could perform at the same level as neural networks 25 times its size—despite taking less time, energy, and computing power to train.
The model achieves this level of performance using an external memory in the form of a giant database covering 2 trillion passages of text it reviews to generate new sentences. The dataset is sourced from news articles, Wikipedia pages, books, and texts from GitHub, and comprises ten languages including Urdu, Russian, Chinese, and English.
The model takes after the human brain, which rely on dedicated memory mechanisms to learn new things. This allows the AI to look up information on an expansive database in the same way humans are able to use search engines. The idea behind this technology isn’t new, but it’s the first time that a look-up system has been developed for a language model.
Ethical alternative to LLMs
According to DeepMind, RETRO might be able to address the bias problem of LLMs. In contrast to the inner workings of most AI models, which are imperceptible and mysterious, the pieces of external data that RETRO refers to are readily available. Therefore, theoretically, it should be easier to discover and resolve what the AI has learned by scrutinizing the database over searching the neural network. Conversely, it would also be easier to tackle misinformation because of how much easier it is to add new information to the external data on RETRO.
“There’s still a lot we don’t know about how to safely and productively manage models at current scales, and that’s probably going to get harder with scale in many ways, even as it gets easier in some,” said New York University professor Sam Bowman.