This New AI Algorithm Can Master Games Without Being Told The Rules

MuZero_AIM

Two years after DeepMind introduced AlphaZero, an AI-based program that could challenge humans at the game of chess, the researchers have demonstrated MuZero. The researchers at DeepMind describe it to be a significant step towards formulating general-purpose algorithms.

While its predecessor, AlphaZero could learn games such as Go, chess, and shogi from scratch, MuZero can master these games (along with Atari) without being told the rules. It can plan winning strategies in unknown environments. This is particularly significant with respect to games like Atari, where the rules and dynamics are generally complicated and unpredictable.

 

MuZero’s Advantage Over Its Predecessors

MuZero was first introduced in 2019 as a preliminary paper at the NeurIPS 2019 conference. It combines AphaZero’s lookahead tree search with a new state-of-art result on the art result of Atari benchmark. MuZero demonstrates a leap ahead in the capabilities of reinforcement learning algorithms.

The natural step in the evolution of artificial intelligence is incorporating the ability to learn quickly and accurately generalise to new scenarios, just like the human mind. There have been many methods that scientists have adopted over the years to build this capability, two of which are lookahead search and model-based planning.

Lookahead search strategy relies mainly on the game’s rules or an accurate simulator and relies heavily on the given knowledge of their environment’s dynamics. It works great when preparing algorithms for classic games such as checkers, poker or chess, like in AlphaZero. However, they do not take too well to complex real-world problems and cannot be necessarily decomposed into simple rules.

On the other hand, model-based systems first learn an accurate model of the environment’s dynamics and then use it to plan, helping it do well even in complex real-world situations. They do not use a learned model but instead estimate the best action that can be taken next. Model-based systems have significant disadvantages, as well. For visually rich domains, as in Atari, modelling every aspect of the environment becomes very complicated for even the model-based system. 

Credit: DeepMind

To overcome the limitations of the previously mentioned lookahead search and model-based planning, MuZero uses a different approach. Instead of modelling the entire environment, MuZero chooses only the critical aspects for the decision-making process and models them. The factors are selected based on three elements — how good is the current position (value), the best action to be taken (policy), and how good was the last action (reward).

MuZero’s Performance

The DeepMind researchers chose Go, chess, shogi and Atari to test the capabilities of MuZero. While Go, chess and shogi were used for assessing its performance on challenging planning problems, Atari was used as a benchmark for checking its capabilities in a visually complex setting. It was observed that MuZero outperformed previous algorithms used for Atari and matched AlphaZero’s Go, chess, and shogi performance. 

Further study also showed that MuZero’s capabilities were enhanced by 1000 Elo, a unit to measure a player’s relative skill, as the time taken per move by the algorithm was increased from one-tenth of a second to 50 seconds. The pattern is comparable with the difference between an amateur and a professional human professional player.

It was observed that MuZero could generalise actions and situations and need not search for all possibilities in games like Atari to learn effectively.

Read the full paper here.

 

Wrapping Up

Facebook, too, announced an AI bot ReBeL that could play chess (a perfect information game) and poker (an imperfect information game) with equal ease, using reinforcement learning. The company called it a positive step towards creating general AI algorithms that could be applied to real-world issues related to negotiations, fraud detection, and cybersecurity.

With MuZero, researchers hope to extend its application to tackling real-world challenges such as in robotics, industries, and others.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

More Stories

MORE FROM AIM
Vijaysinh Lendave
Complete Guide To LightGBM Boosting Algorithm in Python

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm. It has quite effective implementations such as XGBoost as many optimization techniques are adopted from this algorithm. However, the efficiency and scalability are still unsatisfactory when there are more features in the data.

Victor Dey
Microsoft FLAML VS Traditional ML Algorithms: A Practical Comparison

FLAML is an open-source automated python machine learning library that leverages the structure of the search space in search tree algorithmic problems and is designed to perform efficiently and robustly without relying on meta-learning, unlike traditional Machine Learning algorithms. To choose a search order optimized for both cost and error and it iteratively decides the learner, hyperparameter, sample size and resampling strategy while leveraging their compound impact on both cost and error of the model as the search proceeds.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM