OpenAI Releases GPT-3, The Largest Model So Far

What is Microsoft Without OpenAI?

OpenAI researchers released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. 

The previous OpenAI GPT model had 1.5 billion parameters and was the biggest model back then, which was soon eclipsed by NVIDIA’s Megatron, with 8 billion parameters followed by Microsoft’s Turing NLG that had 17 billion parameters. Now, OpenAI turns the tables by releasing a model that is 10x larger than Turing NLG.

Current NLP systems still largely struggle to learn from a few examples. With GPT-3, the researchers show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

Natural language processing tasks range from generating news articles to language translation and answering standardised test questions. 

The researchers trained 8 different sizes of model ranging from 125 million parameters to 175 billion parameters, with the last being GPT-3.

How GPT-3 Pipped Other Models

For GPT-3, the OpenAI team used the same model and architecture as GPT-2 that includes modified initialisation, pre-normalisation, and reversible tokenisation along with alternating dense and locally banded sparse attention patterns in the layers of the transformer.

The researchers state that larger models make increasingly efficient use of in-context information. As can be seen in the plot above, the steeper “in-context learning curves” for large models show improved ability to learn from contextual information.

For training, the researchers have used a combination of model parallelism within each matrix multiply and model parallelism.

GPT-3 was trained on V100 GPU’s on the part of a high-bandwidth cluster provided by Microsoft.

Evaluation of GPT-3 is done under 3 conditions: 

  1. few-shot learning
  2. one-shot learning
  3. zero-shot learning

GPT-3 achieved promising results in the zero-shot and one-shot settings, and in the few-shot setting, occasionally surpassed state-of-the-art models.

The results show that GPT-3 showed strong performance with translation, question-answering, and cloze tasks, as well as with unscrambling words and performing 3-digit arithmetic. The researchers claim that GPT-3 can even generate news articles which human evaluators have difficulty distinguishing from articles written by humans.

GPT-3 is an incredibly large model, and one cannot expect to build something like this without fancy computational resources. However, the researchers assure that these models can be efficient once trained, where even a full GPT-3 model generating 100 pages of content from a trained model can cost only a few cents in energy costs.

Where Can This Go Wrong

“GPT-3 has the potential to advance both the beneficial and harmful applications of language models.”

OpenAI researchers

In an unprecedented approach, the researchers go in detail about the harmful effects of GPT-3 in their paper. The high-quality text generating capability of GPT-3 can make it difficult to distinguish synthetic text from the human-written text, so the authors warn that there can be a misuse of language models. They admit that malicious uses of language models can be difficult to anticipate because language models can be repurposed in a very different environment or for a different purpose than what the researchers intended. 

They list the following misuses:

  • Spam & phishing 
  • Fraudulent academic essay writing 
  • Abuse of legal and governmental processes and
  • social engineering pretexting

Since GPT-3 scraped almost everything on the internet and every word written, the researchers had an opportunity to identify how the racial sentiments and other sentiments play out in conversations. For example, with the religion of Islam, they have found that words such as violent, terrorism and terrorist co-occurred at a greater rate than with other religions.

Despite many limitations and weaknesses, the researchers conclude that very large language models may be an important ingredient in the development of adaptable, general language systems.

Read the full paper here.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring