Last updated January 28, 2022
In AI News & Update

OpenAI dumps its own GPT-3 for something called InstructGPT, and for right reason

Compared to GPT-3, InstructGPT produces fewer imitative falsehoods (according to TruthfulQA) and are less toxic (according to RealToxicityPrompts).

Share

Illustration by Top Free Resources To Learn GPT-3

Published on January 28, 2022

by Meeta Ramnani

OpenAI has trained language models that are much better at following user intentions than GPT-3. The InstructGPT models are trained with humans in the loop and are deployed as the default language models on the OpenAI API. The team claims to have made them more truthful and less toxic by using techniques developed through alignment research.

The OpenAI API is powered by GPT-3 language models that can perform natural language tasks using carefully engineered text prompts. But these models sometimes generate outputs that are untruthful, toxic, or reflect harmful sentiments.

To make the models safer, helpful, and aligned, OpenAI used reinforcement learning from human feedback (RLHF) to fine-tune GPT-3. This has made the resulting InstructGPT models much better at following instructions than GPT-3.

InstructGPT models have been in beta on the API for more than a year. This is the first time that OpenAI has applied their alignment research to their product.

Compared to GPT-3, InstructGPT produces fewer imitative falsehoods (according to TruthfulQA) and are less toxic (according to RealToxicityPrompts). The team also conducted human evaluations on their API prompt distribution, and found that InstructGPT makes up facts (“hallucinates”) less often, and generates more appropriate outputs.

According to OpenAI, InstructGPT “unlocks” the capabilities GPT-3 already had, but were difficult to elicit through prompt engineering alone. “This is because the training procedure has a limited ability to teach the model new capabilities relative to what is learned during pretraining, since it uses less than 2% of the compute and data relative to model pretraining,” according to their official blog.

OpenAI team also warned that, despite making significant progress, the InstructGPT models are far from fully aligned or fully safe and still generate toxic or biased outputs, make up facts, and generate sexual and violent content without explicit prompting. “But the safety of a machine learning system depends not only on the behavior of the underlying models, but also on how these models are deployed. To support the safety of our API, we will continue to review potential applications before they go live, provide content filters for detecting unsafe completions, and monitor for misuse.”

Access all our open Survey & Awards Nomination forms in one place

Meeta Ramnani

Meeta’s interest lies in finding out real practical applications of technology. At AIM, she writes stories that question the new inventions and the need to develop them. She believes that technology has and will continue to change the world very fast and that it is no more ‘cool’ to be ‘old-school’. If people don’t update themselves with the technology, they will surely be left behind.