In recent news, the research team at Facebook has introduced a general AI bot, ReBeL that can play both perfect information, such as chess and imperfect information games like poker with equal ease, using reinforcement learning. As the company says, it is a big step towards creating a general AI algorithm that could perform well over a range of games.
The researchers believe that this algorithm will have real-world applications, including dealing with negotiations, fraud detection, and even cybersecurity.
The ReBeL Algorithm
AlphaZero from DeepMind rapidly caught the fancy of the AI research community when it was released back in 2017. An AI-based program that could play games like chess, shogi, and Go is not unheard of, but AlphaZero is different as it uses reinforcement learning with search (RL+Search) to ‘learn on its own’ by mimicking the world-class players.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
There have also been models designed to play other games such as poker. For instance, Facebook, in 2019, introduced Pluribus bot that managed to defeat human experts in six-player no-limit Hold ’em, which is the most widely played poker format in the world.
However, there has been no generalised AI algorithm so designed that could champion both chess and poker.
For humans, we recognise these two as just different games in the broadest sense. However, for a machine, it classifies games like chess as perfect information games, where the player is aware of all the possible events and knows or can see other player’s moves; games like poker are classified as imperfect information games players need to balance all possible outcomes when making a decision on the fly.
So while AlphaZero performs well for chess, it breaks down when used for imperfect information games. To address this aspect, Facebook has now introduced a Recursive Belief-base Learning (ReBeL), which according to the social media giant is a ‘major step toward creating ever more general AI algorithms’.
ReBeL is an improvement over the general reinforcement learning+Search algorithm (also used by AlphaZero). It is built on previous models such as AlphaZero but now comes with an additional capability to play games like poker, where it assesses the chances of the opponent player having a particular card, for example, a pair of aces.
ReBeL was found to be effective in large scale two-player zero-sum imperfect-information games such as poker. Its performance was evaluated on two imperfect information games — heads-up no-limit Texas Hold ’em, a form of poker, and Liar’s Dice, a ‘bluff-and-deceive’ dice game played with multiple pieces.
While experimenting, in the case of heads-up no-limit Texas Hold ’em, ReBeL could beat a human expert with statistical significance. It was also found to work with Liar’s Dice, which is another type of imperfect information game — thereby establishing the model’s capability as a general framework. Facebook has now open-sourced the implementation on Liar’s Dice to allow the wider AI research community to build upon these results.
As per Facebook, ReBeL is the first AI that uses RL+Search that works well even with imperfect-information games. However, it also has a few limitations.
Firstly, the amount of computational prowess of ReBeL is very high, especially in the context of certain games such as Recon Chess. In Recon or Reconnaissance Chess, the player is not aware of the positions of the opponent’s pieces and has to depend on certain ‘sensing actions’ to determine the board. It has strategic depth but very little common knowledge.
Secondly, since ReBeL depends on knowing the exact rules of the game, it may be useful for Go and poker where the rules and corresponding rewards are well known in advance. However, the same cannot be said for real-world interactions. Hence, for now, one can say that ReBeL’s mechanisms and its success are only limited to two-player zero-sum games, which are few and rare in real-world interactions.
Despite its shortcomings, this algorithm has achieved superior performance in heads-up no-limit Texas Hold ’em using relatively far less expert knowledge. Writing about its capabilities and future expectations, Facebook said in the blog, “… we view this as a major step toward developing universal techniques for multiagent interactions, and thus as a step toward complex real-world applications like fraud detection and cybersecurity.”