DeepMind has been creating games since the Alphabet company was just a start-up. However, the company’s continued focus on developing game players has cerated historical breakthroughs in less than a decade—starting with Atari that has become groundbreaking in off-policy deep reinforcement learning to the original AlphaGo that compressed several decades of game playing into a few years and was followed by continuous developments. In fact, DeepMind’s games were planned by one of the founders, Demis Hassabis, before the company was even created. This article will take you through the fascinating history of DeepMind’s breakthrough in games.
Atari Games: playing in the arcade environment
In 2013, DeepMind’s first algorithm was tested on Atari 2600 games. The researchers selected the Arcade Learning Environment to test the algorithm’s competency across various games in an environment as challenging as human players’ choice. The initial algorithm learned to play seven games while achieving an average human’s performance on three of those. In 2015, DeepMind refined the algorithm to test it on Atari’s suite of 49 games, and the machine beat human performance on 23 of them.
A persisting challenge still was succeeding in Atari’s four major games, Montezuma’s Revenge, Pitfall, Solaris and Skiing, which are particularly proven to be tough for AIs, given it will have to try several strategies and best moves for payoff. Only in the previous year did the algorithm manage to achieve this.
AlphaGo: the first AI to defeat a professional human Go player
In 2015, DeepMind released AlphaGo, which the company claims is “the first computer program to defeat a professional human Go player, the first to defeat a Go world champion, and is arguably the strongest Go player in history.”
The oldest game, the Chinese board game Go, is considered more complex than chess due to the 10 by 170 possible configurations for success in the game. AlphaGo is built on a computer program combining an advanced search tree with a deep neural network. To win, the neural network inputs a description of the board game and processes it through the millions of neuron-like connections in its various network layers. DeepMind disrupted the gaming sphere of ‘amateur’ computer players with AlphaGo being the first AI to defeat a professional Go player, Mr Fan Hui, three-time reigning European Champion by a score of 5-0. It has since beat the world’s greatest player of the previous decade, Mr Lee Sedol, by a 4-1 victory with 200 million worldwide watchers sitting at the edge of their seats.
AlphaGo Zero: self-training Go computer player
In 2017, DeepMind released an updated version of AlphaGo, AlphaGo Zero. While AlphaGo was trained by playing thousands of matches with levels of players, AlphaGo Zero learnt by playing against itself. In just a few days, the computer program garnered years of human knowledge and learned to play Go from AlphaGo. The newer version surpassed the performance of all previous versions and taught itself new unconventional strategies and moves. AlphaGo Zero beat more Go world champions like Lee Sedol and Ke Jie.
AlphaZero: self-training Chess, Go and Shogi player
In late 2017, taking AlphaGo Zero on a macro level, DeepMind introduced Alpha Zero, an AI that can teach itself mastering games of Chess, Shogi and Go, from scratch. The system also beat the existing world champion computer programs in all cases. This is due to its deep neural network technology that goes beyond handcrafted features and only inputs the game’s basic rules. By playing over and over with itself, AlphaZero developed unique and creative strategies to win all three games.
AlphaStar: StarCraft II player
In 2019, DeepMind introduced AlphaStar, an AI program that can play the real-time strategy game StarCraft II. It is the first AI to reach the top league of the game, challenging the world’s two principal players and ranking above 99.8 per cent of the active players on Battle.net. The program is built on neural networks, reinforcement learning self-play, multi-agent learning and imitation learning to allow the AI to learn directly from the game’s data. AlphaStar battled with the three agents on the game, Protoss, Terran, and Zerg, through a single neural network and achieved grandmaster level for all three. AlphaStar’s knowledge was equivalent to 200 years of playing time during the introduction stage.
MuZero: AlphaGo+Atari player
In 2019, the latest addition in the AlphaGo programming, MuZero, was introduced, taking the technology one step further. The AI matches AlphaZero on Go, Chess and Shogi while mastering an array of the Atari games – without any input of the game rules.
Instead, the program learns through a model of the environment and applies the information to AlphaZero’s lookahead tree search. As a result, MuZero can plan winning strategies even in unknown domains, making it another of DeepMind’s inventions to pioneer reinforcement learning algorithms towards AGI.
Agent 57: player of 57 Atari games
In 2020, DeepMind released an updated version of the initial Atari2600 games that can finally beat the four most challenging games of the suite. According to their paper, Agent57 is the first deep reinforcement learning agent to outperform humans on all 57 Atari 2600 games in the Arcade Learning Environment data set.
Agent57 is an amalgamation of all the improvements in DeepMind’s Deep-Q network since the Atari games back in 2012. It also consists of a form of memory that allows it to base decisions on prior learning from games and a reward system to encourage the AI to explore more strategies.
Player of Games: perfect and imperfect game player
In 2021, DeepMind’s most recent addition is Player of Games (PoG), which performs well in both perfect and imperfect information games. The AI’s array of games extends beyond Chess and Go to Poker and Scotland Yard. PoG works on a single algorithm with minimal domain-specific knowledge. This marks a significant improvement in DeepMind’s cumulative step-ups in the games their AIs can play. For example, AlphaZero could only play perfect games, but PoG can understand imperfect informational games like Poker that rely on game-theoretic reasoning to hide private information properly.
PoG’s search potential is suited across fundamentally different game types, with DeepMind’s guarantee that it will find an approximate Nash equilibrium by resolving subgames to remain consistent during online play. PoG uses growing-tree counterfactual regret minimisation (GT-CFR) to build subgames non-uniformly and expand the tree toward the most relevant future states while iteratively refining values and policies. It also uses self-play that trains value-and-policy networks using both game outcomes and recursive sub-searches applied to situations that came up in previous searches.
It is important to note that, for all of DeepMind’s success, these AI models are not realistically versatile. They tend to be good at one thing and one thing only. The biggest challenge towards AGI is training AI at more than one task, and while models like Agent57 can learn 57 tasks, they can only learn and play one model at a time. Despite having the same algorithm, the program will need to retrain each game. However, DeepMind’s player games mark some of the first times an algorithm has reached the top levels in games, crafted unique strategies, or defeated the best players.