Games like chess, GO, and Atari have become testbeds of testing deep reinforcement learning algorithms. Companies like DeepMind and OpenAI have done a tremendous amount of research into this field and have set up gyms that can be used to train reinforcement learning agents.
Here we take a look at top works that have set a new benchmark for reinforcement learning models:
AlphaGo Zero
Task: Game of Go
Dataset: ELO Ratings
Go, invented in China, is a 2,500-year-old game where the players make strategies to lock each other’s moves. Those who fail to make a move, lose. Two millennia later, DeepMind, now owned by Alphabet Inc., created a policy-based deep neural network called AlphaGo that competed against legendary Go player Mr Lee Sedol, the winner of 18 world titles, and beat him 4-1 in the world championship bout back in 2016.
AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa (no domain knowledge) reinforcement learning from games of self-play. With AlphaGo Zero, the authors generalise this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging domains.
AlphaZero with no prior domain knowledge has managed to achieve an expert level play of chess, shogi and Go within 24 hours and convincingly defeated a world-champion program in each case.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
MuZero
Task: Go, Shogi, Chess and Atari
The game of chess is the most widely-studied domain in the history of artificial intelligence. In real-world problems, the dynamics governing the environment are often complex and make tree-based methods insufficient.
So, this work introduced the MuZero algorithm, which is a combination of a tree-based search and a learned model. MuZero achieved superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.
When evaluated on 57 different Atari games, MuZero achieved a new state-of-the-art and matched the performance of the AlphaZero algorithm that was supplied with the game rules.
Count-Based Exploration for Deep Reinforcement Learning
Task: Atari Games
Dataset: Atari 2600 Freeway
This work describes a simple generalisation of the classic count-based approach that can reach near state-of-the-art performance on various high-dimensional and/or continuous deep reinforcement learning benchmarks. This goes against the thought process that count-based methods cannot be applied in high-dimensional state spaces since most states will only occur once. The authors use hash functions in their method, where state spaces are mapped to hash codes. This mapping allows counting their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory.
Implicit Quantile Networks for Distributional Reinforcement Learning
Task: Atari Games
Dataset: Atari 2600 Freeway
In this work, the authors build on many improvements that have been made in distributional reinforcement learning over the years and present an applicable, flexible, and state-of-the-art distributional variant of Deep Q learning networks (DQN).
To achieve state-of-the-art results, the researchers have used quantile regression to approximate the full quantile function for the state-action return distribution. By re-parameterising a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. The authors have demonstrated improved performance on the 57 Atari 2600 games in the ALE, and have also used their algorithm to study the effects of risk-sensitive policies in Atari games.
SATNet: Bridging Deep Learning and Logical Reasoning
Task: Sudoku
Dataset: Sudoku 9×9
Integrating logical reasoning within deep learning architectures has been a major goal of modern AI systems. In this paper, authors propose a new direction towards this goal by introducing a solver that can be integrated into the loop of larger deep learning systems.
This work showed how to analytically differentiate through the solution and efficiently solve the associated backward pass. The authors demonstrate that by integrating this solver into end-to-end learning systems can learn the logical structure of challenging problems in a minimally supervised fashion. In particular, with this method, they show that Sudoku learned solely from examples.
By combining our MAXSAT solver with traditional convolutional architecture, they have also solved a “visual Sudoku” problem that maps images of Sudoku puzzles to their associated logical solutions
SC2LE
Task: Starcraft II
Dataset: CollectMineralShards
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This is a multi-agent problem with multiple players interacting. This game posed the challenge of imperfect information due to a partially observed map. The large state space could be observed only from raw input feature planes and requires long-term strategies over thousands of steps.
This work provided an open-source Python-based interface for communicating with the game engine.
SC2LE offered a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.
Future Of Deep RL
Recent deep reinforcement learning strategies have been able to deal with high-dimensional continuous state spaces through complex heuristics.
The games such as Atari, Chess and sudoku are incredibly difficult for humans to master and to make the machines perform well at tasks, which are known to represent human intellect is a phenomenal achievement. These models have shown great improvement in the past couple of years, assuring that reinforcement learning will finally get the attention it deserves.