This New AI Algorithm Can Master Games Without Being Told The Rules

MuZero_AIM

Two years after DeepMind introduced AlphaZero, an AI-based program that could challenge humans at the game of chess, the researchers have demonstrated MuZero. The researchers at DeepMind describe it to be a significant step towards formulating general-purpose algorithms.

While its predecessor, AlphaZero could learn games such as Go, chess, and shogi from scratch, MuZero can master these games (along with Atari) without being told the rules. It can plan winning strategies in unknown environments. This is particularly significant with respect to games like Atari, where the rules and dynamics are generally complicated and unpredictable.

 

MuZero’s Advantage Over Its Predecessors

MuZero was first introduced in 2019 as a preliminary paper at the NeurIPS 2019 conference. It combines AphaZero’s lookahead tree search with a new state-of-art result on the art result of Atari benchmark. MuZero demonstrates a leap ahead in the capabilities of reinforcement learning algorithms.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The natural step in the evolution of artificial intelligence is incorporating the ability to learn quickly and accurately generalise to new scenarios, just like the human mind. There have been many methods that scientists have adopted over the years to build this capability, two of which are lookahead search and model-based planning.




Lookahead search strategy relies mainly on the game’s rules or an accurate simulator and relies heavily on the given knowledge of their environment’s dynamics. It works great when preparing algorithms for classic games such as checkers, poker or chess, like in AlphaZero. However, they do not take too well to complex real-world problems and cannot be necessarily decomposed into simple rules.

On the other hand, model-based systems first learn an accurate model of the environment’s dynamics and then use it to plan, helping it do well even in complex real-world situations. They do not use a learned model but instead estimate the best action that can be taken next. Model-based systems have significant disadvantages, as well. For visually rich domains, as in Atari, modelling every aspect of the environment becomes very complicated for even the model-based system. 

Credit: DeepMind

To overcome the limitations of the previously mentioned lookahead search and model-based planning, MuZero uses a different approach. Instead of modelling the entire environment, MuZero chooses only the critical aspects for the decision-making process and models them. The factors are selected based on three elements — how good is the current position (value), the best action to be taken (policy), and how good was the last action (reward).

MuZero’s Performance

The DeepMind researchers chose Go, chess, shogi and Atari to test the capabilities of MuZero. While Go, chess and shogi were used for assessing its performance on challenging planning problems, Atari was used as a benchmark for checking its capabilities in a visually complex setting. It was observed that MuZero outperformed previous algorithms used for Atari and matched AlphaZero’s Go, chess, and shogi performance. 

Further study also showed that MuZero’s capabilities were enhanced by 1000 Elo, a unit to measure a player’s relative skill, as the time taken per move by the algorithm was increased from one-tenth of a second to 50 seconds. The pattern is comparable with the difference between an amateur and a professional human professional player.

It was observed that MuZero could generalise actions and situations and need not search for all possibilities in games like Atari to learn effectively.

Read the full paper here.

 

Wrapping Up

Facebook, too, announced an AI bot ReBeL that could play chess (a perfect information game) and poker (an imperfect information game) with equal ease, using reinforcement learning. The company called it a positive step towards creating general AI algorithms that could be applied to real-world issues related to negotiations, fraud detection, and cybersecurity.

With MuZero, researchers hope to extend its application to tackling real-world challenges such as in robotics, industries, and others.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR