MITB Banner

OpenAI Releases GPT-3, The Largest Model So Far

Share

What is Microsoft Without OpenAI?

Illustration by Microsoft CEO Satya Nadella and OpenAI CEO Sam Altman at the Microsoft campus in Redmond, Wash. on July 15, 2019. (Photography by Scott Eklund/Red Box Pictures)

OpenAI researchers released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. 

The previous OpenAI GPT model had 1.5 billion parameters and was the biggest model back then, which was soon eclipsed by NVIDIA’s Megatron, with 8 billion parameters followed by Microsoft’s Turing NLG that had 17 billion parameters. Now, OpenAI turns the tables by releasing a model that is 10x larger than Turing NLG.

Current NLP systems still largely struggle to learn from a few examples. With GPT-3, the researchers show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

Natural language processing tasks range from generating news articles to language translation and answering standardised test questions. 

The researchers trained 8 different sizes of model ranging from 125 million parameters to 175 billion parameters, with the last being GPT-3.

How GPT-3 Pipped Other Models

For GPT-3, the OpenAI team used the same model and architecture as GPT-2 that includes modified initialisation, pre-normalisation, and reversible tokenisation along with alternating dense and locally banded sparse attention patterns in the layers of the transformer.

The researchers state that larger models make increasingly efficient use of in-context information. As can be seen in the plot above, the steeper “in-context learning curves” for large models show improved ability to learn from contextual information.

For training, the researchers have used a combination of model parallelism within each matrix multiply and model parallelism.

GPT-3 was trained on V100 GPU’s on the part of a high-bandwidth cluster provided by Microsoft.

Evaluation of GPT-3 is done under 3 conditions: 

  1. few-shot learning
  2. one-shot learning
  3. zero-shot learning

GPT-3 achieved promising results in the zero-shot and one-shot settings, and in the few-shot setting, occasionally surpassed state-of-the-art models.

The results show that GPT-3 showed strong performance with translation, question-answering, and cloze tasks, as well as with unscrambling words and performing 3-digit arithmetic. The researchers claim that GPT-3 can even generate news articles which human evaluators have difficulty distinguishing from articles written by humans.

GPT-3 is an incredibly large model, and one cannot expect to build something like this without fancy computational resources. However, the researchers assure that these models can be efficient once trained, where even a full GPT-3 model generating 100 pages of content from a trained model can cost only a few cents in energy costs.

Where Can This Go Wrong

“GPT-3 has the potential to advance both the beneficial and harmful applications of language models.”

OpenAI researchers

In an unprecedented approach, the researchers go in detail about the harmful effects of GPT-3 in their paper. The high-quality text generating capability of GPT-3 can make it difficult to distinguish synthetic text from the human-written text, so the authors warn that there can be a misuse of language models. They admit that malicious uses of language models can be difficult to anticipate because language models can be repurposed in a very different environment or for a different purpose than what the researchers intended. 

They list the following misuses:

  • Spam & phishing 
  • Fraudulent academic essay writing 
  • Abuse of legal and governmental processes and
  • social engineering pretexting

Since GPT-3 scraped almost everything on the internet and every word written, the researchers had an opportunity to identify how the racial sentiments and other sentiments play out in conversations. For example, with the religion of Islam, they have found that words such as violent, terrorism and terrorist co-occurred at a greater rate than with other religions.

Despite many limitations and weaknesses, the researchers conclude that very large language models may be an important ingredient in the development of adaptable, general language systems.

Read the full paper here.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.