Two years after DeepMind introduced AlphaZero, an AI-based program that could challenge humans at the game of chess, the researchers have demonstrated MuZero. The researchers at DeepMind describe it to be a significant step towards formulating general-purpose algorithms.
While its predecessor, AlphaZero could learn games such as Go, chess, and shogi from scratch, MuZero can master these games (along with Atari) without being told the rules. It can plan winning strategies in unknown environments. This is particularly significant with respect to games like Atari, where the rules and dynamics are generally complicated and unpredictable.
MuZero’s Advantage Over Its Predecessors
MuZero was first introduced in 2019 as a preliminary paper at the NeurIPS 2019 conference. It combines AphaZero’s lookahead tree search with a new state-of-art result on the art result of Atari benchmark. MuZero demonstrates a leap ahead in the capabilities of reinforcement learning algorithms.
The natural step in the evolution of artificial intelligence is incorporating the ability to learn quickly and accurately generalise to new scenarios, just like the human mind. There have been many methods that scientists have adopted over the years to build this capability, two of which are lookahead search and model-based planning.
Lookahead search strategy relies mainly on the game’s rules or an accurate simulator and relies heavily on the given knowledge of their environment’s dynamics. It works great when preparing algorithms for classic games such as checkers, poker or chess, like in AlphaZero. However, they do not take too well to complex real-world problems and cannot be necessarily decomposed into simple rules.
On the other hand, model-based systems first learn an accurate model of the environment’s dynamics and then use it to plan, helping it do well even in complex real-world situations. They do not use a learned model but instead estimate the best action that can be taken next. Model-based systems have significant disadvantages, as well. For visually rich domains, as in Atari, modelling every aspect of the environment becomes very complicated for even the model-based system.
To overcome the limitations of the previously mentioned lookahead search and model-based planning, MuZero uses a different approach. Instead of modelling the entire environment, MuZero chooses only the critical aspects for the decision-making process and models them. The factors are selected based on three elements — how good is the current position (value), the best action to be taken (policy), and how good was the last action (reward).
The DeepMind researchers chose Go, chess, shogi and Atari to test the capabilities of MuZero. While Go, chess and shogi were used for assessing its performance on challenging planning problems, Atari was used as a benchmark for checking its capabilities in a visually complex setting. It was observed that MuZero outperformed previous algorithms used for Atari and matched AlphaZero’s Go, chess, and shogi performance.
Further study also showed that MuZero’s capabilities were enhanced by 1000 Elo, a unit to measure a player’s relative skill, as the time taken per move by the algorithm was increased from one-tenth of a second to 50 seconds. The pattern is comparable with the difference between an amateur and a professional human professional player.
It was observed that MuZero could generalise actions and situations and need not search for all possibilities in games like Atari to learn effectively.
Read the full paper here.
Facebook, too, announced an AI bot ReBeL that could play chess (a perfect information game) and poker (an imperfect information game) with equal ease, using reinforcement learning. The company called it a positive step towards creating general AI algorithms that could be applied to real-world issues related to negotiations, fraud detection, and cybersecurity.
With MuZero, researchers hope to extend its application to tackling real-world challenges such as in robotics, industries, and others.