OpenAI Benchmarks Reinforcement Learning To Avoid Model Overfitting

OpneAI Benchmark

OpenAI has benchmarked reinforcement learning by mitigating most of its problems using the procedural generational technique. RL has been a central methodology in the field of artificial intelligence. However, over the years, researchers have witnessed a few shortcomings with the approach. Developers often use a colossal amount of data to train and increase the efficiency of machine learning models. But this has resulted in overfitting of data in many cases, thereby, causing hindrance in the adoption of ML technologies.


To avoid overfitting, diversity in environments is an essential aspect — that’s why quite a few techniques have been used for keeping that in check. For one, Arcade Learning Environment (ALE) – a framework for the development and testing of AI agents – has been the standard for evaluating AI technologies for years. It facilitates hundreds of Atari 2600 games for ML models to train and assess in simulated environments. While diversity between different games in the ALE is one of its greatest strengths, the low emphasis on generalisation presents a significant drawback.

Consequently, in order to address such generalisation challenges, there have been numerous attempts. Another such attempt is CoinRun – a training environment released by OpenAI in late last year, to provide generalisation diversity. While it helped in determining generalisation in reinforcement, it was limited to one environment. 

To facilitate a generalisation, various attempts have been made by different groups such as Dota and StarCraft using procedural generation to evaluate generalisation in reinforcement learning models. But it comes with lots of complexities; It is tough to iterate in extremely complicated, tedious to use more than one environment at the same time.

While all of the above solutions were able to improve the overfitting issues, those approaches were not feasible in various use cases. To address such problems, OpenAI strived to render high diversity within environments, across environments, along with experimental convenience.

Overfitting Solution Through Procedural Generalisation Benchmark

AI agents become more robust and deliver higher accuracy when they are trained in generalised environments that offer ever-changing levels. Consequently, as a part of procedural generalisation, OpenAI included 16 unique environments designed to evaluate generalisation and sample efficiency in reinforcement learning.

OpenAI said that such methodology is ideal for evaluating generalisation since unique training and testing data can be created in every environment. Besides, they also stressed that it is also well-suited to evaluate sample efficiency since these environments offer new and compelling challenges for ML models.

Their Approach Towards Achieving The Benchmark

While devising a plan, OpenAI focused on a wide range of aspects to benchmark the approach. They focused on high diversity, fast evaluation, tunable difficulty, and emphasis on visual recognition and motor control. To achieve high diversity, the environment generation logic was given maximal freedom, resulting in providing meaningful challenges to increase the ability of models. Besides, they have calibrated environment difficulty in a way that ML models can progress, resulting in the faster experimental pipeline.

As a part of their difficulty, the environments support two well-calibrated modes: easy and hard. While the former caters to the developers with limited computational power, the latter is for users with high-performance computing systems.

OpenAI further evaluated the generalisation while utilising ConiRun to understand how the agent struggled to generalise. Therefore, the firm evaluated how the size of the training data impacts generalisation. So, they generated training data from 100 to 100,00 levels and trained agents from 200M on it using Proximal Policy Optimisation (PPO) to measure the performance.


The above evaluations were central in getting insights into the reasons behind overfitting. They witnessed that the agents overfit to small training data set in most of the environments. And with more data, the training performance improved. This is contrary to the trend in supervised learning, where the learning is inversely proportional to the size of training data sets.

With generalisation across levels in the training set, even with large training set can enhance training performance.


It will allow developers to make the most out of the plethora of available data and improve the accuracy of ML models. OpenAI’s benchmark has really shifted the landscape in reinforcement learning as it solves the overfitting problem with large datasets. The benchmark describes how important diverse environment distributions is while training and evaluation reinforcement learning agents. 

Although this advancement will allow developers to build robust and effective ML models, it will have to be applied in strenuous use cases for evaluating its full potential.

Download our Mobile App

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox