Top 6 Baselines For Reinforcement Learning Algorithms On Games

Games like chess, GO, and Atari have become testbeds of testing deep reinforcement learning algorithms. Companies like DeepMind and OpenAI have done a tremendous amount of research into this field and have set up gyms that can be used to train reinforcement learning agents.
Here we take a look at top works that have set a new benchmark for reinforcement learning models:

AlphaGo Zero

Task: Game of Go

Dataset: ELO Ratings
Go, invented in China, is a 2,500-year-old game where the players make strategies to lock each other’s moves. Those who fail to make a move, lose. Two millennia later, DeepMind, now owned by Alphabet Inc., created a policy-based deep neural network called AlphaGo that competed against legendary Go player Mr Lee Sedol, the winner of 18 world titles, and beat him 4-1 in the world championship bout back in 2016.
AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa (no domain knowledge) reinforcement learning from games of self-play. With AlphaGo Zero, the authors generalise this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging domains. 
AlphaZero with no prior domain knowledge has managed to achieve an expert level play of chess, shogi and Go within 24 hours and convincingly defeated a world-champion program in each case.


Sign up for your weekly dose of what's up in emerging technology.


Task: Go, Shogi, Chess and Atari

The game of chess is the most widely-studied domain in the history of artificial intelligence. In real-world problems, the dynamics governing the environment are often complex and make tree-based methods insufficient.
So, this work introduced the MuZero algorithm, which is a combination of a tree-based search and a learned model. MuZero achieved superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. 
When evaluated on 57 different Atari games, MuZero achieved a new state-of-the-art and matched the performance of the AlphaZero algorithm that was supplied with the game rules. 

Count-Based Exploration for Deep Reinforcement Learning

Task: Atari Games

Dataset: Atari 2600 Freeway

This work describes a simple generalisation of the classic count-based approach that can reach near state-of-the-art performance on various high-dimensional and/or continuous deep reinforcement learning benchmarks. This goes against the thought process that count-based methods cannot be applied in high-dimensional state spaces since most states will only occur once. The authors use hash functions in their method, where state spaces are mapped to hash codes. This mapping allows counting their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. 

Implicit Quantile Networks for Distributional Reinforcement Learning

Task: Atari Games

Dataset: Atari 2600 Freeway
In this work, the authors build on many improvements that have been made in distributional reinforcement learning over the years and present an applicable, flexible, and state-of-the-art distributional variant of Deep Q learning networks (DQN). 
To achieve state-of-the-art results, the researchers have used quantile regression to approximate the full quantile function for the state-action return distribution. By re-parameterising a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. The authors have demonstrated improved performance on the 57 Atari 2600 games in the ALE, and have also used their algorithm to study the effects of risk-sensitive policies in Atari games. 

SATNet: Bridging Deep Learning and Logical Reasoning

Task: Sudoku

Dataset: Sudoku 9×9

visual Sudoku image

Integrating logical reasoning within deep learning architectures has been a major goal of modern AI systems. In this paper, authors propose a new direction towards this goal by introducing a solver that can be integrated into the loop of larger deep learning systems.
This work showed how to analytically differentiate through the solution and efficiently solve the associated backward pass. The authors demonstrate that by integrating this solver into end-to-end learning systems can learn the logical structure of challenging problems in a minimally supervised fashion. In particular, with this method, they show that Sudoku learned solely from examples. 
By combining our MAXSAT solver with traditional convolutional architecture, they have also solved a “visual Sudoku” problem that maps images of Sudoku puzzles to their associated logical solutions 


Task: Starcraft II

Dataset: CollectMineralShards

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This is a multi-agent problem with multiple players interacting. This game posed the challenge of imperfect information due to a partially observed map. The large state space could be observed only from raw input feature planes and requires long-term strategies over thousands of steps. 

This work provided an open-source Python-based interface for communicating with the game engine.

SC2LE offered a new and challenging environment for exploring deep reinforcement learning algorithms and architectures. 

Future Of Deep RL

Recent deep reinforcement learning strategies have been able to deal with high-dimensional continuous state spaces through complex heuristics.

The games such as Atari, Chess and sudoku are incredibly difficult for humans to master and to make the machines perform well at tasks, which are known to represent human intellect is a phenomenal achievement. These models have shown great improvement in the past couple of years, assuring that reinforcement learning will finally get the attention it deserves.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM