Active Hackathon

Improving GPT’s Factual Accuracy Of Language Models

WebGPT produces answers that are preferred 56% of the time to answers written by human demonstrators

OpenAI has fine-tuned GPT-3 to answer open-ended questions using a text-based web browser accurately using WebGPT. The prototype has been taught to use a text-based web browser the way in which humans research online – it submits search queries with keywords, follows links, and scrolls web pages. The AI system is trained to cite the sources that make it easier to give feedback and improve factual accuracy.

According to researchers language models like GPT-3 are extremely useful for many different tasks, but they have a tendency to ‘hallucinate’ information while performing tasks that require obscure real-world knowledge. The model is provided with an open-ended question as well as a summary of the browser state. It has learnt to issue commands like ‘search …’, ‘find in page: …’ and ‘Quote: …’. Accordingly, the model collects passages from web pages and uses them to compose an answer.


Sign up for your weekly dose of what's up in emerging technology.

OpenAI has used the same general methods to train GPT-3 that they have used in the past. They first trained the model to copy human demonstrations. This gives it the ability to use the text-based browser to answer questions. They then improved the helpfulness and accuracy of the answers, by training a reward model that can predict human preferences.

The system has been trained to answer questions from ELI5, a dataset of open-ended questions that have been scraped from the “Explain Like I’m Five” subreddit. Their best-performing model produces answers which are preferred 56 percent of the time as compared to answers written by human demonstrators.

The model was also evaluated on TruthfulQA, an adversarially-constructed dataset of short-form questions that are designed to test whether models fall prey to common misconceptions. Here answers are scored on informativeness and truthfulness. The new model outperformed GPT-3 on TruthfulQA and exhibit more favourable scaling properties.

Although the model is more truthful than GPT-3 and generates false statements less frequently, it still poses risks. Answers that have citations are perceived as having authority, which can obscure the fact that the model makes basic errors. The model also reinforces the existing beliefs.

Human feedback and tools like web browsers offer a promising path towards building truthful, general-purpose AI systems.

More Great AIM Stories

Meeta Ramnani
Meeta’s interest lies in finding out real practical applications of technology. At AIM, she writes stories that question the new inventions and the need to develop them. She believes that technology has and will continue to change the world very fast and that it is no more ‘cool’ to be ‘old-school’. If people don’t update themselves with the technology, they will surely be left behind.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM