This New AI Algorithm Can Master Games Without Being Told The Rules


Two years after DeepMind introduced AlphaZero, an AI-based program that could challenge humans at the game of chess, the researchers have demonstrated MuZero. The researchers at DeepMind describe it to be a significant step towards formulating general-purpose algorithms.

While its predecessor, AlphaZero could learn games such as Go, chess, and shogi from scratch, MuZero can master these games (along with Atari) without being told the rules. It can plan winning strategies in unknown environments. This is particularly significant with respect to games like Atari, where the rules and dynamics are generally complicated and unpredictable.


MuZero’s Advantage Over Its Predecessors

MuZero was first introduced in 2019 as a preliminary paper at the NeurIPS 2019 conference. It combines AphaZero’s lookahead tree search with a new state-of-art result on the art result of Atari benchmark. MuZero demonstrates a leap ahead in the capabilities of reinforcement learning algorithms.


Sign up for your weekly dose of what's up in emerging technology.

The natural step in the evolution of artificial intelligence is incorporating the ability to learn quickly and accurately generalise to new scenarios, just like the human mind. There have been many methods that scientists have adopted over the years to build this capability, two of which are lookahead search and model-based planning.

Lookahead search strategy relies mainly on the game’s rules or an accurate simulator and relies heavily on the given knowledge of their environment’s dynamics. It works great when preparing algorithms for classic games such as checkers, poker or chess, like in AlphaZero. However, they do not take too well to complex real-world problems and cannot be necessarily decomposed into simple rules.

Download our Mobile App

On the other hand, model-based systems first learn an accurate model of the environment’s dynamics and then use it to plan, helping it do well even in complex real-world situations. They do not use a learned model but instead estimate the best action that can be taken next. Model-based systems have significant disadvantages, as well. For visually rich domains, as in Atari, modelling every aspect of the environment becomes very complicated for even the model-based system. 

Credit: DeepMind

To overcome the limitations of the previously mentioned lookahead search and model-based planning, MuZero uses a different approach. Instead of modelling the entire environment, MuZero chooses only the critical aspects for the decision-making process and models them. The factors are selected based on three elements — how good is the current position (value), the best action to be taken (policy), and how good was the last action (reward).

MuZero’s Performance

The DeepMind researchers chose Go, chess, shogi and Atari to test the capabilities of MuZero. While Go, chess and shogi were used for assessing its performance on challenging planning problems, Atari was used as a benchmark for checking its capabilities in a visually complex setting. It was observed that MuZero outperformed previous algorithms used for Atari and matched AlphaZero’s Go, chess, and shogi performance. 

Further study also showed that MuZero’s capabilities were enhanced by 1000 Elo, a unit to measure a player’s relative skill, as the time taken per move by the algorithm was increased from one-tenth of a second to 50 seconds. The pattern is comparable with the difference between an amateur and a professional human professional player.

It was observed that MuZero could generalise actions and situations and need not search for all possibilities in games like Atari to learn effectively.

Read the full paper here.


Wrapping Up

Facebook, too, announced an AI bot ReBeL that could play chess (a perfect information game) and poker (an imperfect information game) with equal ease, using reinforcement learning. The company called it a positive step towards creating general AI algorithms that could be applied to real-world issues related to negotiations, fraud detection, and cybersecurity.

With MuZero, researchers hope to extend its application to tackling real-world challenges such as in robotics, industries, and others.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox