Back in 2016, when Google’s DeepMind and Blizzard announced their partnership to push the limits of AI through StarCraft 2, the idea of an artificial intelligence beating the top players in the world seemed absurd. But this year in January when DeepMind streamed the matches between elite human players and AlphaStar, it became one of the top players, though, in Protoss Level, it matched up well against a Zerg Pro.
Yes, AlphaStar holds some advantage of being an AI, and the obvious one is the reaction speed. But its Actions Per Minute (APM) were lower than that of the Zerg level player it faced in its matchup. AlphaStar did not beat the human player through speed; it dominated them through the strategies, AlphaStar learnt from the data collected through countless matches it had been put through. Of course, the AI has so much experience because no human has a memory large enough to remember so many strategies.
However, AI like this has one crucial drawback — it forgets.
Let’s head back in time for a while. In 1992, researchers at IBM developed TD-Gammon by combining the learning-based system with a neural network to play the game of backgammon.
Instead of playing according to hard-coded rules, the developers on TD-Gammon used reinforcement learning to figure out, through analysis, how to play the game in a way that maximises its probability of winning through the notion of self-play to make the system more proficient in the game.
Self-play means that the system improved when it started playing against itself. Now, when an agent plays against itself, it eventually develops a lot of strategies to win against its next version. But while doing so, it may forget how to win against its previous self. Forgetting can create a loop and get stuck in it without making any real progress. That’s where StarCraft comes in.
How does StarCraft help?
Countering the limitations of Self Play: When DeepMind open-sourced StarCraft, they found that the self-play had its limitation when it came to the improvement of the AI. Usually, the agent maximises the probability of winning when it plays against itself, but in an environment where the agent interacts with other agents or a group, it tends to expose its flaws, and this is how StarCraft helps on making improvements.
For example, if a player aims to get better at StarCraft in the real world, the player might choose to partner up with friends. Here, not everyone is aiming at winning, but to improve more and more by exposing flaws and getting better. Using this strategy, the agent gets better when it is made to play with a mixer of agents- The League.
Finding the winning strategy using Imitation learning: Exploring a complex environment is another problem for AI. StarCraft poses as a complex environment for AI to overcome this challenge.
There are 1026 possible actions available for an agent at each step, and there are thousands of actions an agent has to take before learning whether it has won or lost. So, without having prior knowledge, it is almost impossible to know a valid strategy.
Learning from different human strategy and using them while self-play, acts as a solution to this problem. To achieve this, DeepMind has used what is called Imitation-learning.
Role of Reinforcement Learning: Due to the advances in reinforcement learning, imitation-learning and the league, AlphaStar is now at the Grandmaster level in StarCraft in all three races. Because of these advances in reinforcement, AlphaStar was able to achieve the Grandmaster level with the same constraints as a human player.
AlphaStar entered Battle.net anonymously with a camera interface and with the same information a human player has with restrictions to its API to make it comparable with human players.
A little more than a month ago, AlphaStar beat 99.8% of the StarCraft 2 players active on Battle.net. It also means that there are 0.2% of players who can still outperform AlphaStar, but it might be okay to assume that it won’t be the same case for long.
When asked about his opinion of the game, pro gamer Diego “Kelazhur” Schwimer said, “AlphaStar is an intriguing and unorthodox player – one with the reflexes and speed of the best pros but strategies and a style that are entirely it’s own. The way AlphaStar was trained, with agents competing against each other in a league, has resulted in gameplay that’s unimaginably unusual; it really makes you question how much of StarCraft’s diverse possibilities pro players have really explored.”
Outlook
As DeepMind continues to venture in this field, questions arise whether the advancements can be used to make autonomous lethal weapons, but DeepMind has vowed never to develop them.
One of the reasons to make the AI go through strategy games, especially like StarCraft is that it exposes the AI to real-world problems. StarCraft has more moves than a chess game, and it has the least information about the opposing player. The technological advances made here can be applied to robotics, self-driving cars and virtual assistants who make decisions based on information which observed imperfectly.