DeepMind Trains AI Agents To Play Games Without Human Interaction Data

The repetitive process of trial and error has proven effective in teaching computer systems to play many games.

In its latest step towards general-purpose AI systems, DeepMind has proposed XLand, a virtual environment, to formulate new learning algorithms, which control how agent trains and the games on which it trains. XLand was introduced via a paper titled, “Open-Ended Learning Leads to Generally Capable Agents“, in which DeepMind researchers demonstrated a technique to train an agent capable of playing many different games without requiring human interaction data

Challenges with traditional reinforcement learning

The repetitive process of trial and error has proven effective in teaching computer systems to play many games, including chess, shogi, Go, and StarCraft II. However, one of the main challenges with reinforcement learning-trained systems is a lack of training data. Systems trained by reinforcement learning are unable to adapt their learned behaviours to new tasks because they are not trained on a broad enough set of tasks.

For instance, AlphaZero performed well against some of the world’s best chess, shogi, and Go programmes even though it was aware of only the game’s basic rules. However, the hitch was that since AlphaZero trained on each game through repetition, it was unable to learn a different game or task without having to do it all over again from scratch. This was true for other reinforcement learning games as well. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.


Determined to create AI agents that could better address the limitations of human intelligence, Deepmind created XLand. In addition to having a bigger range of possible games to work on, these agents will now be able to deal with entirely new situations and challenge themselves with new games and tasks, such as ones they have never seen before.

The DeepMind AI agents are represented by 3D virtual avatars that reside in a multiplayer online environment meant to mimic the physical world. Players’ surroundings are assessed through RGB images, and they learn how to interact with different games and genres. 

XLand lets a user programmatically specify the game space, and hence the game space allows for data to be generated algorithmically and automatically. This is why the behaviours of the player characters significantly influence the artificial-intelligence (AI) agents in XLand. This complex, non-linear relationship between environment and behaviour gives rise to excellent data to train on because even minute alterations in the components of the environment can lead to radical changes in the challenges for virtual agents.

As each generation of systems acquires ever-increasing performance and robustness, their task-generating functions improve to reflect that growth, and in turn, each new generation adds their improved self to the multiplayer environment.

The team utilised a neural network structure that provides an attention mechanism. Further, to improve agents’ overall capabilities, Deepmind uses population-based training (PBT) to adjust the parameters of the dynamic task generation. Furthermore, they combine multiple training runs into a chain that each subsequent generation of agents can use to bootstrap off the previous generation.


After training five generations of virtual agents over the course of approximately 700,000 unique games and experiencing 200 billion training steps, DeepMind noted a significant rise in both learning and overall performance. This is true for all procedurally generated evaluation tasks except a handful that could not be completed even by a human. In addition, the team said, “Our agents appear to exhibit more cooperative behaviour when playing with a copy of themselves. Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently.

According to DeepMind, in contrast to traditional, top-down approaches, the agents make frequent attempts at self-betterment through trial and error, exploring different states in search of satisfying outcomes. The systems displayed a wide range of patterns rather than highly optimised and specific patterns for specific tasks.

Ritika Sagar
Ritika Sagar is currently pursuing PDG in Journalism from St. Xavier's, Mumbai. She is a journalist in the making who spends her time playing video games and analyzing the developments in the tech world.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox