Listen to this story
Meta recently introduced a 175 billion parameter Open Pretrained Transformer (OPT) model. Meta claims that this massive model, which is trained on publicly available data sets, is the first language technology system of this size to be released with its pretrained models and training code. In what can be considered a rare occurrence, Meta open-sourced this model.
The OPT model joins the ranks of several other advanced language models that have been developed and introduced recently. The NLP field of AI has seen a massive innovation in the past few years, with participation from leading tech companies of the world. Why is there such intense competition in this field, or in other words, are other AI domains lagging behind NLP in terms of innovation?
Sign up for your weekly dose of what's up in emerging technology.
Progress in NLP
The field of AI is fragmented broadly into domains that target different kinds of problems. Some systems are used for solving problems that involve navigation and movement through physical spaces, like autonomous vehicles and robotics; others deal with computer vision-related applications – differentiating and categorising images and patterns; common sense AI. Other forms of AI solve critical and specific problems. Like DeepMind’s AlphaFold solved a 50-year-old challenge. This innovation has accelerated the drug discovery process manifold.
That said, natural language processing is arguably the hottest field of AI. Even in humans, being multilingual and having language proficiency have been considered major indicators of intelligence. It is generally considered suggestive of an ability to parse complex messages and decipher coding variations across context, slang, and dialects. It is hardly surprising that AI researchers consider teaching machines the ability to understand and respond to natural language a great feat and even a step toward achieving general intelligence.
Speaking of innovation in this field, a widely considered breakthrough, the 175 billion parameter GPT-3 was released by OpenAI in 2020. A complex neural network, GPT-3 has been trained on 700 gigabytes of data scraped from across the web, including Wikipedia and digitalised books. GPT-3 set a precedent for even larger, advanced and, in some cases, computationally inexpensive models.
Innovation that supports NLP
There have been several stages in the evolution of the natural language processing field. It started in the 80s with the expert system, moving on to the statistical revolution, to finally the neural revolution. Speaking of the neural revolution, it was enabled by the combination of deep neural architectures, specialised hardware, and a large amount of data. That said, the revolution in the NLP domain was much slower than other fields like computer vision, which benefitted greatly from the emergence of large scale pre-trained models, which, in turn, were enabled by large datasets like ImageNet. Pretrained ImageNet models helped in achieving state-of-the-art results in tasks like object detection, human pose estimation, semantic segmentation, and video recognition. They enabled the application of computer vision to domains where the number of training examples is small, and annotation is expensive.
One of the most definitive inventions in recent times was the Transformers. Developed at Google Brains in 2017, Transformers is a novel neural network architecture and is based on the concept of the self-attention mechanism. The model outperformed both recurrent and convolutional models. It was also observed that a Transformer requires lesser computational power to train and is a better fit for modern machine learning hardware that speeds up training by order of magnitude. It became the architecture of choice for NLP problems, replacing earlier models like LSTM. The additional training parallelisation allowed training on a much larger dataset than it was once possible.
Thanks to Transformers and the subsequent invention of BERT, NLP achieved its ‘ImageNet moment’. BERT revolutionised NLP, and since then, a wide range of variations of these models have been proposed, such as RoBERTa, ALBERT, and XLNet. Beyond Transformers, several representation techniques like ELMo and ULMFiT have made headlines by demonstrating that pretrained language models can achieve state-of-the-art results on a range of NLP tasks.
“Transformer architecture has revolutionised NLP by enabling language generation and fine-tuning on a scale never previously seen in NLP. Furthermore, these models perform better when trained on large amounts of data; hence organisations are focusing on training larger and larger language models with little change in the model architecture. Big firms like Google and Meta, which can afford this type of training, are developing novel language models, and I expect more of the same from other large corporations,” said Shameed Sait, head of artificial intelligence at tmrw.
Echoing the same sentiment, Anoop Kunchukuttan, Microsoft researcher and the co-founder of AI4Bharat, said, “Interestingly, deep learning’s benefits were initially seen largely in the field of computer vision and speech. What happened was that NLP got some kind of a headstart in terms of the kind of models that were introduced subsequently. The attention-based mechanism, for example, led to great advancements in NLP. Also, the introduction of self-supervised learning influenced progress in the NLP field.”
Access to massive data
One of the major advantages that NLP is the availability of a massive amount of datasets to train advanced models on. Hugging Face, a startup which is building the ‘GitHub for Machine Learning’, has been working on democratising AI, with a special focus on NLP. Last year, Hugging Face released Datasets, a community library for NLP, which was developed over a year. Developed by over 250 developers, this library contains 650 unique datasets aimed at standardising end-user interface, version control, documentation and offering a lightweight frontend for internet-scale corpora.
Similarly, Facebook AI open-sourced FLORES-101 database to improve multilingual translation models. It is a many-to-many evaluation dataset covering 101 different languages. By making this information available publicly, Facebook wants to accelerate progress in NLP by enabling developers to generate more diverse and locally relevant tools.
The biggest benefit that language modelling has is that the training data is free with any text corpus. The availability of a potentially unlimited amount of training data is particularly important as NLP does not only deal with the English Language.
Towards AGI? Just not there yet
When GPT-3 model was released, a lot of over-enthusiastic publications termed it the first step toward AGI. While the model of this magnitude and processing power is nothing short of a technological marvel, considering it a move towards AGI is a bit of a stretch.
The New York University emeritus professor Gary Marcus, an author of the recent book ‘‘Rebooting AI,’’ said in an earlier interview with Analytics India Magazine, “The specific track we are on is large language models, an extension of big data. My view about those is not optimistic. They are less astonishing in their ability not to be toxic, tell the truth, or be reliable. I don’t think we want to build a general intelligence that is unreliable, misinforms people, and is potentially dangerous. For instance, you have GPT-3 recommending that people commit suicide.
There’s been enormous progress in machine translation, but not in machine comprehension. Moral reasoning is nowhere, and I don’t think AI is a healthy field right now.”
In a rare occurrence, Marcus’s rival Yann LecCun seems to agree with him. In a separate conference, Lecun called language an epiphenomenon of human intelligence. He added that there is a lot to intelligence which has nothing to do with language. “That’s where we should attack things first. … [Language] is number 300 in the list of 500 problems that we need to face,” Yann LeCun said.
So while language models and the domain of NLP might be certainly important to achieve AGI, it is simply not enough. For the time being, with the impending GPT-4 announcement and other language models waiting to be introduced, one may continue to see accelerated progress in the field for a long time to come.