MITB Banner

After Go and Chess, AI Is Back to defeat Mere Humans—this time its Stratego

DeepNash came third rank - based on 50 ranked matches against top human players over the course of two weeks at in April 2022

Share

Listen to this story

Deepmind has been the pioneer in making AI models that have the capability to mimic a human’s cognitive ability to play games. Games are a common testbed to assess a model’s ability. After mastering games like Go, Chess and Checkers, Deepmind has launched DeepNash, an AI model that can play Stratego at an expert level. 

Mastering a game like ‘Stratego’ is a significant achievement for AI research because it presents a challenging benchmark for learning strategic interactions at a massive scale. Stratego’s complexity is based on two key aspects. Firstly, there are 10535 possible states in the game, which is exponentially larger than Texas hold ’em poker(10164 states) and Go(10360 states). The second is that at the start of the game, any given situation in Stratego requires reasoning over 1066 possible deployments for each player.

DeepNash learns to play Stratego in a self-play model-free manner without the need for human demonstration. DeepNash outperforms previous state-of-the-art AI agents and achieves expert human-level performance in the most complex variant of the game, Stratego Classic.

The Nash Equilibrium

DeepNash, at its core, is based on a model-free reinforcement learning algorithm that is termed as Regularised Nash Dynamics(R-NaD). 

Source: arxiv.org

DeepNash combines the concept of R-NaD with its deep neural network architecture and converges to an approximate ‘Nash equilibrium’ by directly modifying the underlying multi-agent learning dynamics. By this technique, DeepNash was able to beat the existing state-of-the-art AI methods in Stratego, even achieving an all-time best ranking of #3 on the Gravon games platform against human expert players.

Deepesh’s learning approach

DeepNash employs an end-to-end approach to employ the learning of the deployment phase. The model uses deep reinforcement learning coupled with a theoretic game approach in this phase. The goal of the model is to learn to approximate Nash equilibrium through self-play. This technique guarantees that the agent will perform well even against a worst-case opponent.

Stratego computationally challenges all existing search techniques due to search space intractability. To resolve this, DeepNash uses an orthogonal route without search and proposes a new method(R-Nad). This new model combines model-free reinforcement learning in self-play with a game theoretic algorithmic idea.

This combined approach does not require modelling private states from public data. However, the challenge with this approach is that of scaling up this model-free reinforcement learning approach with R-NaD for making self-play competitive against human experts in Stratego – a feat that remains yet to be achieved.

We learn a Nash equilibrium in Stratego through self-play and model-free reinforcement learning. The idea of combining model-free RL and self-play has been tried before, but it has been empirically challenging to stabilise such learning algorithms when scaling up to complex games.

Source: arxiv.com

The idea behind the R-NaD algorithm is that it is possible to define a learning update rule that provides a dynamical system that, in turn, reveals the existence of a Lyapunov function. This function decreases during learning, which in turn guarantees convergence to a fixed nash equilibrium.

Results

To test DeepNash’s capabilities, it is evaluated against both human expert players and the latest SOTA Stratego bots. The former test is performed on Gravon, a well-known online gaming platform for Stratego players. The latter is performed against known Stratego bots like Celsius, Asmodeus, PeternLewis, etc. 

  • Evaluation against Gravon: DeepNash was evaluated based on 50 ranked matches against top human players over the course of two weeks in April 2022. DeepNash managed to win 42 of these matches, which brings it to an 84 percent efficiency. Based on the classic Stratego ranking in 2022, DeepNash’s performance corresponds to a score of 1799, which makes DeepNash the third best player among all Gravon Stratego players. This result confirms that DeepNash has reached a human expert level in Stratego and that too only via self-play, without any help of existing human data.
  • Evaluation against SOTA Stratego-bots: DeepNash goes up against several existing Stratego algorithm bots, including Probe, Master of the Flag, Demon of Ignorance, and Celsius 1.1, among others.

Source: arxiv.org

Inspite of training only with self-play, DeepNash achieves victory against all of the bots with an overwhelming majority. However, in a few matches that DeepNash lost against Celsius1.1, the latter took a high-risk strategy of getting a significant material advantage by capturing pieces with a high-ranking piece at the start of the game.

DeepNash is designed with the sole aim of learning a Nash equilibrium policy during training and learning the qualitative behaviour of a top player. DeepNash managed to generate a wide range of deployments which made it difficult for the human players to find patterns to exploit. DeepNash also demonstrated its capability to make non-trivial trade-offs between information and material, execute bluffs and take risks when needed. 

Share
Picture of Kartik Wali

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.