How This Japanese Video Game Is Being Used To Benchmark AI Models

Soon Everyone Will be a Gamer

Benchmarking AI agents inside simulated environments have become the new norm. Researchers are leveraging a wide range of video games to train and evaluate AI models for improving their performance. The idea behind using games is to achieve generalisation for models. To attain true AI, researchers test them in an environment in which the models have never been trained in. Recently, researchers used Mega Man 2 game to assess AI agents as it consists of eight different challenges.

Today, mostly, organisations train and evaluate models in a similar environment, resulting in attaining a higher accuracy. However, when the same agent is used in an unfamiliar environment, it terribly fails to deliver the desired performance. Therefore, the gaming environments are actively utilised for benchmarking AI agents for its efficiency.

Motivation Behind Choosing Mega Man 2 

Mega Man 2 is an action game published by Capcom, which is the sequel of the Mega Man that was released in Japan in 1988. In the Mega Man 2 game, contenders have to battle against the evil Dr Wily and its rogue robots — Metal Man, Air Man, Bubble Man, Quick Man, Crash Man, Flash Man, Heat Man, and Wood Man.

These eight different enemies make it a perfect simulated environment to test AI agents. The gameplay of Mega Man is the prime reason behind adopting it as a testing environment. In the game, a player controls the Mega Man, who is equipped with a weapon, and on defeating evil robots the Mega Man gets the specialised robots’ weapon. Since evil robots have a different speciality, every time the Mega Man defeats an enemy, he gets a unique weapon, thereby, making it easy for the player to conquer the next enemy.

As a part of their exploration, researchers replaced manual player with AI agent. And to increase the difficulty, Mega Man didn’t use the new weapons it used to receive on demolishing the villains.

Methodology Used By Researchers

Researchers used EvoMan, a game-playing framework based on the boss fights of the game Mega Man 2. They aimed to overpower all eight enemies using only the default arm cannon of Mega Man. For this, they trained AI agent only to defeat a set of bosses and then are evaluated with the models against all eight evils. This allowed them to assess how an agent performed when it was deployed in a new environment. The agent can only deliver in a new environment if it is been able to generalise things when it was trained in a specific territory.

To evaluate the generalisation, researchers trained the model on a set of four enemies and then validated it by deploying it with all eight bosses. As each enemy differs from one another, generalising patterns for avoiding being shot and winning by making desired manoeuvres is a challenging task. Consequently, several learning strategies were used to check which one delivers sufficient generalisation.

Different agents were used for all four bosses to training and get specialised models. Variants of neuroevolution strategies with 1-layer perceptron and 2-layers perceptron with 10 and 50 neurons for the hidden layer were used. And the weights of neural networks was tuned by Genetic Algorithm and LinkedOpt algorithm. Besides, the evolution of neural networks topology which was carried out with the NEAT algorithm.

The above image was the log of the evaluation of models which was the harmonic mean — the game started with 100 points and the points which were gained by hitting the enemy. And then the final point was calculated as: Gain = 100.01 + ep – ee. While ep represents the energy of the player, ee is the energy of the enemy. The constant — 100.1 was added to get a valid result.

The harmonic mean depicts that NEAT learning methodology produced the best results followed by the two-layer neural network whose weights were tweaked with Genetic Algorithm.

What It Means

Various organisations are trying to achieve true AI by enabling models to generalise the learning. Earlier in December OpenAI benchmarked reinforcement learning to avoid model overfitting and shifted the landscape by solving the absence of generalisation problem. Besides, DeepMind tried a similar attempt with Starcraft 2 and obtained a 99.8% success. Such instances are proving Jerome Pesenti, head of AI at Facebook, wrong as he, in early December, said that AI is still a pattern matching. However, with this development, we can envision commonsense in AI with generalisation.


The Generalisation is where AI lies as only delivering exceptional results in the same environment will not take us towards accomplishing AI in everything. Thus, it is of paramount importance to progress in generalisation to integrate AI in a wide range of use cases. This is yet another success in the AI landscape for improving agents dexterity. However, often the real-world performance of models are wimpy, thus it would be interesting to witness how these advancements are materialised in the real world.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox