OpenAI Might Just Have Solved The Issue Of Faulty Reward Functions In the Wild

The human labellers supervise and evaluate the model’s output by the tool, even if they have not read the books themselves.

In 2016, OpenAI published a blog post, ‘Faulty Reward Functions in the Wild’, discussing an AI model that got creative and found a ‘counterintuitive’ way to optimise and reach its goal. The company realised the need to design a safe AI system to avoid misinterpretation of the specified goals. 

Brian Christian’s book, The Alignment Problem, talks about this problem- to ensure that ML models act per human intentions. Now, OpenAI has introduced its tool for a scalable solution to the alignment problem. According to the OpenAI team, it “needs to work on tasks where model outputs are difficult or time-consuming for humans to evaluate.” The team tested this model to summarise an entire book to demonstrate this solution. 

Introducing the tool

OpenAI’s tool combines recursive task decomposition and learning from human feedback. The model is initially trained on smaller parts of the task, followed by human feedback on the broader task. Next, human demonstrations and comparisons were collected and fine-tuned on GPT-3. Finally, the summarization was done using behavioural cloning and reward modelling. 

How it works

The model begins the inference by summarising small sections of the book and then recursively summarising the smaller summaries, then summarising those into a higher-level summary until the output is a summary of the entire book. “Our main result is a model that can be applied recursively to generate plausible summaries of entire books,” according to the research paper. The human labellers supervise and evaluate the model’s output by the tool, even if they have not read the books themselves. 

Source: OpenAI’s paper


One of the challenges faced by large pretrained models is summarisation. OpenAI’s previous blogs discuss the method of training a model with reinforcement learning from human feedback. This method helped them align the model summaries with human preferences.

Source: OpenAI

This is the structure of the algorithm used for shorter paragraphs. But to present the same results on an entire book, the team applied ‘recursive task decomposition’. 

The process involves a human ‘decomposing’ or breaking up their parent task into several subtasks. Each subtask is shorter and simpler than the parent task, and having the responses to the subtasks would help a human provide a training signal for the parent task. 

This allows for easier evaluation by humans; the person doesn’t need to have read the book beforehand since they can refer to the shorter parts. It also helps trace the summary writing process, trace back to actual events in the book, and leverage the tool for books of unbound lengths. 

An illustration of the process of breaking up the text in Alice’s Adventures In Wonderland to process a short summary. 

Find more examples here


The summaries contained the important events from the book, abstractly synthesising the details, but the team also admitted to the tool, often leaving out important information or not grasping the broader context. 

Still, the model proved to outperform OpenAI’s behavioural cloning baseline significantly. “A small number of summaries approach human-level quality,” the team noted. The ‘sensible’ summaries were evaluated to achieve a substantial rating, even matching the average quality of human-written summaries. 

The ratings were 6/7 from humans who had read the book 5% of the time, and 5/7 rating from those who had read the book 15% of the time. The model achieved state-of-the-art results on the BookSum dataset for book-length summarisation. A zero-shot question-answering model can also use the summaries to obtain state-of-the-art on the NarrativeQA dataset for book-length question answering.

The results proved that combining recursive task decomposition with learning from human feedback can be a practical approach to scalable oversight for difficult long-document NLP tasks, broadening the scope for future models. 

“Our current approach to this problem is to empower humans to evaluate machine learning model outputs using assistance from other models,” stated the blog. The team is hopeful of creating similar and better tools in the future to empower large scale empirical work on scaling alignment techniques. 

Download our Mobile App

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.