Active Hackathon

Why Benchmarking AI Models With Games Is Not A Very Good Idea

Benchmarking AI

Today, video and board games are playing a crucial role in benchmarking AI intelligence. Although such methodologies have been used since the early nineties in Chess, of late, researchers have embraced video games for evaluating AI intelligence. Notably, in recent years, Mega Man II, StarCraft II, among other video games, has become prefered games for achieving true AI. However, such methodology has now come under the scanners as experts are critical of its effectiveness in approaching the AI everyone envisions.

Nevertheless, researches cite the use of various games for benchmarking is crucial to understand the progress in AI. Besides, almost all the researchers, who have used AI in games cite the importance of diversity they gain. However, Francois Chollet, an engineer at Google and the creator of Keras library for neural networks, in an interview with a media firm, was critical of the benchmarking process that researchers are approaching for their AI agents.


Sign up for your weekly dose of what's up in emerging technology.

Francois Chollet’s View On AI Benchmarking Methodology

Chollet, in his recent paper, “On the Measure of Intelligence”, raised an argument that the AI landscape needs to refocus on the intelligence they want to accomplish with their work. He said researchers need to look past benchmarks using video games and focus on the skill that makes humans clever in adapting to new things.

In the interview, Chollet pinpointed the fact that if an AI agent is performing better than humans doesn’t mean anything until the ML models can use that skill to deliver results in different environments. He further explained such an approach are the prime factors why self-driving cars are not on the roads yet. “You end up with tree search and minimax, which doesn’t teach anything about human intelligence,” described Chollet. “Games are a proxy for general intelligence, failing in the road to accomplishing true AI,” he added.

Chollet further described that leveraging a colossal amount of data will help in obtaining an AI that will render in specific task effectively, but it will not get anyone closer to general intelligence. Explaining the inefficiency of the benchmark with video games, he said that anyone could optimise algorithms to gain superior results within a particular game. There is nothing substantial in it and therefore are misrepresented with the AI intelligence we need.

Alternative Approach For Benchmarking AI

Talking about one of the potential approaches for benchmarking AI intelligence, Chollet was of the opinion that measuring the efficiency in new environments is the way to go. He believes that just like human, AI agents should be trained with fewer data and use prior knowledge to learn new things quickly. However, as mostly all AI have relied on pattern matching Chollet might be overambitious to envision a general intelligence just yet; one needs to develop a technology that doesn’t exist in order to replicate human-like commonsense. Only then one can truly approach to gain general intelligence. 


Late last year, the head of Facebook AI, Jerome Pesenti, in an interview said that soon the development in AI would hit the wall as research have crossed seven figures and but will not rise further because no one can afford that. Pesenti also noted that AI is not pattern matching; rather, it’s about obtaining the commonsense that humans have.

While benchmarks with video games might not be the ideal method, but due to a dearth of an effective methodology, games might still in the future be used for evaluating AI. Undoubtedly, Chollet argument of the current approach is legit, but unless researchers figure out a technique that would allow one to integrate human intelligence in AI, however, benchmarking with games still remains the best short.

More Great AIM Stories

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022