After Go and Chess, AI Is Back to defeat Mere Humans—this time its Stratego

DeepNash came third rank - based on 50 ranked matches against top human players over the course of two weeks at in April 2022
Listen to this story

Deepmind has been the pioneer in making AI models that have the capability to mimic a human’s cognitive ability to play games. Games are a common testbed to assess a model’s ability. After mastering games like Go, Chess and Checkers, Deepmind has launched DeepNash, an AI model that can play Stratego at an expert level. 

Mastering a game like ‘Stratego’ is a significant achievement for AI research because it presents a challenging benchmark for learning strategic interactions at a massive scale. Stratego’s complexity is based on two key aspects. Firstly, there are 10535 possible states in the game, which is exponentially larger than Texas hold ’em poker(10164 states) and Go(10360 states). The second is that at the start of the game, any given situation in Stratego requires reasoning over 1066 possible deployments for each player.

DeepNash learns to play Stratego in a self-play model-free manner without the need for human demonstration. DeepNash outperforms previous state-of-the-art AI agents and achieves expert human-level performance in the most complex variant of the game, Stratego Classic.

The Nash Equilibrium

DeepNash, at its core, is based on a model-free reinforcement learning algorithm that is termed as Regularised Nash Dynamics(R-NaD). 


DeepNash combines the concept of R-NaD with its deep neural network architecture and converges to an approximate ‘Nash equilibrium’ by directly modifying the underlying multi-agent learning dynamics. By this technique, DeepNash was able to beat the existing state-of-the-art AI methods in Stratego, even achieving an all-time best ranking of #3 on the Gravon games platform against human expert players.

Deepesh’s learning approach

DeepNash employs an end-to-end approach to employ the learning of the deployment phase. The model uses deep reinforcement learning coupled with a theoretic game approach in this phase. The goal of the model is to learn to approximate Nash equilibrium through self-play. This technique guarantees that the agent will perform well even against a worst-case opponent.

Stratego computationally challenges all existing search techniques due to search space intractability. To resolve this, DeepNash uses an orthogonal route without search and proposes a new method(R-Nad). This new model combines model-free reinforcement learning in self-play with a game theoretic algorithmic idea.

This combined approach does not require modelling private states from public data. However, the challenge with this approach is that of scaling up this model-free reinforcement learning approach with R-NaD for making self-play competitive against human experts in Stratego – a feat that remains yet to be achieved.

We learn a Nash equilibrium in Stratego through self-play and model-free reinforcement learning. The idea of combining model-free RL and self-play has been tried before, but it has been empirically challenging to stabilise such learning algorithms when scaling up to complex games.


The idea behind the R-NaD algorithm is that it is possible to define a learning update rule that provides a dynamical system that, in turn, reveals the existence of a Lyapunov function. This function decreases during learning, which in turn guarantees convergence to a fixed nash equilibrium.


To test DeepNash’s capabilities, it is evaluated against both human expert players and the latest SOTA Stratego bots. The former test is performed on Gravon, a well-known online gaming platform for Stratego players. The latter is performed against known Stratego bots like Celsius, Asmodeus, PeternLewis, etc. 

  • Evaluation against Gravon: DeepNash was evaluated based on 50 ranked matches against top human players over the course of two weeks in April 2022. DeepNash managed to win 42 of these matches, which brings it to an 84 percent efficiency. Based on the classic Stratego ranking in 2022, DeepNash’s performance corresponds to a score of 1799, which makes DeepNash the third best player among all Gravon Stratego players. This result confirms that DeepNash has reached a human expert level in Stratego and that too only via self-play, without any help of existing human data.
  • Evaluation against SOTA Stratego-bots: DeepNash goes up against several existing Stratego algorithm bots, including Probe, Master of the Flag, Demon of Ignorance, and Celsius 1.1, among others.


Inspite of training only with self-play, DeepNash achieves victory against all of the bots with an overwhelming majority. However, in a few matches that DeepNash lost against Celsius1.1, the latter took a high-risk strategy of getting a significant material advantage by capturing pieces with a high-ranking piece at the start of the game.

DeepNash is designed with the sole aim of learning a Nash equilibrium policy during training and learning the qualitative behaviour of a top player. DeepNash managed to generate a wide range of deployments which made it difficult for the human players to find patterns to exploit. DeepNash also demonstrated its capability to make non-trivial trade-offs between information and material, execute bluffs and take risks when needed. 

Download our Mobile App

Kartik Wali
A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox