Listen to this story
Strategy games have long been a ground of proof for deep learning algorithms. What started with DeepMind’s reinforcement learning tests on ‘Atari’ games in 2013 has now blossomed to neural networks that can beat world champions at complex games like ‘Go’ and ‘Shogi’.
While games have long been the cornerstone of DeepMind’s strategy to train deep neural networks, Meta beat them at their own game. Today, Meta’s AI arm made history by creating CICERO—an algorithm that can achieve human-level performance in a strategy game known as ‘Diplomacy’. Yann LeCun, VP and Chief AI Scientist at Meta AI, said,
“An agent that can play at the level of humans in a game as strategically complex as Diplomacy is a true breakthrough for cooperative AI.”
Diplomacy was seen as an insurmountable obstacle for non-human agents to conquer—until now. Let’s take a look at how Meta has leapfrogged the leader in strategic reasoning algorithms and solved one of the biggest grand problems in AI.
Sign up for your weekly dose of what's up in emerging technology.
Genesis of AI in games
DeepMind has long been the innovator in training algorithms to play games. With a laser focus in reinforcement learning, Alphabet’s AI division first started with training algorithms to play 57 Atari 2600 games. With a success rate of close to 50%, DeepMind saw games as one of the best ways to test their machine learning chops.
They then decided to take on Go, one of the most mathematically complex games in the world. Due to the large number of possible configurations, DeepMind created a neural network known as AlphaGo that was built on top of an advanced search tree—comparatively simple when taking CICERO into account. While this set the precedent for neural networks being used to beat even world champions at their own game, DeepMind still had a long way to go. In 2017, they released a revamped version of AlphaGo, called ‘AlphaGo Zero’. Zero was able to learn the game by playing against itself, and was also able to teach itself unique strategies and approaches.
AlphaGo Zero got yet another update, now coined ‘AlphaZero’. This algorithm could not only beat world champions in Go but also teach itself to play Chess and Shogi at a champion level. This was then followed up by ‘AlphaStar’, an AI agent that could play the real-time strategy game ‘StarCraft II’. Soon, AlphaStar found itself at the top of the leaderboard, ranking above 99.8% of the world’s players. DeepMind then decided to revisit the Atari games that their primitive algorithms had trouble with, and released ‘Agent 57’—a neural network that could outperform humans on all 57 Atari games.
The crown jewel of DeepMind currently is the Player of Games algorithm, which can not only play games with complete information, but also games which have incomplete information, such as Poker. However, even as these agents demonstrate the power of neural networks for rule-based games, certain types of strategy games that require human interaction are far out of DeepMind’s purview. While they represent the cutting edge in strategic reasoning algorithms, most of them are only good for playing games and cannot see any real-world implementation.
Meta solved the problem by borrowing a page out of DeepMind’s book; combining strategic reasoning algorithms, like AlphaGo, with a natural language processing model, like GPT-3.
To understand why it was so difficult to create an AI to play Diplomacy, we must first understand the rules of the game. Diplomacy first started out as a board game in the late 1950s and was among the first games to be played over the Internet. Each game can take anywhere from weeks to months to complete, and is typically played between seven players.
Each player controls an army and must move them accordingly to capture supply centres. However, they can also interact with other players to form alliances or betray an alliance that has already been formed. Each player has to consider the diplomatic consequences of every move they make, which is the crux of Diplomacy.
As we can deduce, the game relies on the agent to understand the other player’s motivation and then make the move most befitting to their diplomatic position. This requires an intrinsic understanding of where the player currently stands with respect to their allies and enemies. Along with this, a player has to predict whether their current allies will remain their allies or whether they will switch to the other side, depending on the state of affairs.
For an AI agent to play this game, it has to not only understand the rules of the game, it also has to accurately gauge the possibility of betrayal by other human players. In addition to this, the agent also has to use natural language to reach a diplomatic agreement with other players, as the game cannot be won by an agent playing alone. Andrew Goff, a three-time Diplomacy World Champion, said,
“What impresses me the most about CICERO is its ability to communicate with empathy and build rapport whilst also tying it back to its strategic objectives. Its strategy informs its communication and its communication informs its strategy.”
In their blog detailing the workings of CICERO, Meta showed off the agent’s capability to engage in conversations in natural-sounding language with other human players. In addition to this, the algorithm was also able to keep a track of its relationships with other players through a combination of dialogue history and the state of the playing board. It was also able to accurately identify the intent of its partner and predict the best move ahead while still maintaining its current relationship.
The advancements made by Meta to create CICERO can not only be used to play games, but can also be applied to create better conversational agents. While current AI agents can reply to a simple query, the technology behind CICERO can allow them to carry out a full-fledged conversation with humans while understanding context cues and the point of the conversation.
DeepMind vs. Meta
While DeepMind has long been moving towards creating an AGI with a focus on reinforcement learning and decision trees, Meta’s approach to the problem seems to offer a more holistic view of the problem statement. While it is relatively easier for machine learning algorithms to learn to be efficient at rule-based games, solving for an imperfect information problem while considering human emotions proves to be a much more difficult undertaking.
Keeping in mind the impact that CICERO can have, Meta has open-sourced the model and its accompanying research. CICERO might as well be the catalyst that human-facing AI needed to offer truly seamless communication with non-AI counterparts. While history is full of examples of strategic reasoning algorithms excelling in their fields and conversational agents reaching mainstream adoption, this marks the first time both these models have been brought together in such an effective manner.
In a way, DeepMind has been working backwards to generalised AI by making a model that can solve lots of different problems. Meta’s approach of clubbing together a natural language processing model along with a strategic reasoning model and making them work together is closer to how the human brain works. CICERO might provide a more comprehensive picture of what an AGI might look like in the future, and we might be seeing what could be the beginning of true conversational AI.