Last updated December 17, 2021
In AI News & Update

Improving GPT’s Factual Accuracy Of Language Models

WebGPT produces answers that are preferred 56% of the time to answers written by human demonstrators

Share

Illustration by Closeup of Computer Screen With Address Bar of Web Browser

Published on December 17, 2021

by Meeta Ramnani

OpenAI has fine-tuned GPT-3 to answer open-ended questions using a text-based web browser accurately using WebGPT. The prototype has been taught to use a text-based web browser the way in which humans research online – it submits search queries with keywords, follows links, and scrolls web pages. The AI system is trained to cite the sources that make it easier to give feedback and improve factual accuracy.

According to researchers language models like GPT-3 are extremely useful for many different tasks, but they have a tendency to ‘hallucinate’ information while performing tasks that require obscure real-world knowledge. The model is provided with an open-ended question as well as a summary of the browser state. It has learnt to issue commands like ‘search …’, ‘find in page: …’ and ‘Quote: …’. Accordingly, the model collects passages from web pages and uses them to compose an answer.

OpenAI has used the same general methods to train GPT-3 that they have used in the past. They first trained the model to copy human demonstrations. This gives it the ability to use the text-based browser to answer questions. They then improved the helpfulness and accuracy of the answers, by training a reward model that can predict human preferences.

The system has been trained to answer questions from ELI5, a dataset of open-ended questions that have been scraped from the “Explain Like I’m Five” subreddit. Their best-performing model produces answers which are preferred 56 percent of the time as compared to answers written by human demonstrators.

The model was also evaluated on TruthfulQA, an adversarially-constructed dataset of short-form questions that are designed to test whether models fall prey to common misconceptions. Here answers are scored on informativeness and truthfulness. The new model outperformed GPT-3 on TruthfulQA and exhibit more favourable scaling properties.

Although the model is more truthful than GPT-3 and generates false statements less frequently, it still poses risks. Answers that have citations are perceived as having authority, which can obscure the fact that the model makes basic errors. The model also reinforces the existing beliefs.

Human feedback and tools like web browsers offer a promising path towards building truthful, general-purpose AI systems.

Access all our open Survey & Awards Nomination forms in one place

Meeta Ramnani

Meeta’s interest lies in finding out real practical applications of technology. At AIM, she writes stories that question the new inventions and the need to develop them. She believes that technology has and will continue to change the world very fast and that it is no more ‘cool’ to be ‘old-school’. If people don’t update themselves with the technology, they will surely be left behind.