Bluffing and tricking opponent with little to no interaction is what makes poker such an intense and lucrative game to play and to watch as well. Bluffing occasionally can be effective, but it might become predictable over time, resulting in losing a lot of money. It is, therefore, an art to balance the probability with which one can win, making it a tricky game.
Not only statistics but a lot of gut feeling goes into the game. So, can AI, a data-driven, pattern matching machine, trump the best minds of the game?
Efforts have been made for the past few years to teach AI to beat humans at almost every game that involves strategy. Poker so far has been immune to the relentless efforts of AI even though there were some decent achievements in the form of Libratus back in 2017.
But the tables turned overnight as an algorithm trained by the might of Facebook AI and Carnegie Mellon University beat the poker professionals at the latest showdown in Texas.
Pluribus, a new AI bot has defeated elite players in the most popular and widely played poker format in the world: six-player no-limit Texas Hold’em poker.
Pluribus is innovated on Libratus, the AI that decisively beat four leading human professionals in the two-player variant of poker called heads-up no-limit Texas hold’em last year.
Pluribus runs on two CPUs. For comparison, AlphaGo used 1,920 CPUs and 280 GPUs for real-time search in its 2016 matches against top Go professional Lee Sedol. Pluribus also uses less than 128 GB of memory.
The amount of time Pluribus takes to search on a single subgame varies between one second and 33 seconds depending on the particular situation. On average, Pluribus plays twice as fast as typical human pros: 20 seconds per hand when playing against copies of itself in six-player poker.
The Bluff Of Pluribus
All AI breakthroughs in previous benchmark games, be it checkers, chess, Go, two-player poker, StarCraft 2, and Dota 2, have been limited to those with only two players or two teams facing off in a zero-sum competition.
In each of those cases, the AI was successful because it attempted to estimate widely popular strategy- Nash equilibrium.
It is not generally possible to efficiently compute a Nash equilibrium in a game with three or more players.
AI for Chess used alpha-beta pruning search whereas, Go was dealt through Monte Carlo tree search. The above picture shows how the Monte Carlo Counterfactual Regret Minimization algorithm updates the traverser’s strategy by assessing the value of real and hypothetical moves. In Pluribus, this traversal is actually done in a depth-first manner for optimization purposes.
A successful poker AI must reason about hidden information and carefully balance its strategy to remain unpredictable while still picking good actions and not to forget the role of luck in this game.
To reduce the role of luck, the researchers developed a version of the AIVAT variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased.
For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck.
As can be seen in the above plot, how well the AI bot performed as it is measured against the final snapshot of training on a 64-core server for 8 days and required less than 512 GB of RAM.
“The bot wasn’t just playing against some middle-of-the-road pros, it was playing some of the best players in the world,” said Darren Elias who is a four-time World Poker Tour champion.
Another significant achievement of this bot is that it was trained using little power and memory resources.
Key Takeaways
- This is the first time an AI bot has beaten top human players in a complex game with more than two players or two teams.
- Pluribus succeeds because it can very efficiently handle the challenges of a game with both hidden information and more than two players. It uses self-play to teach itself how to win, with no examples or guidance on strategy.
- Pluribus training cost is the equivalent of less than $150 worth of cloud computing resources compared to other recent AI milestone projects, which required the equivalent of millions of dollars’ worth of computing resources to train.
Future Direction
Multi-player interactions pose serious theoretical and practical challenges to AI techniques. The results nevertheless show that a carefully constructed AI algorithm can reach superhuman performance outside of two-player zero-sum games.
Few applications in the real world scenario can be taking action on harmful content and dealing with cybersecurity challenges, as well as managing an online auction or navigating traffic.
So far, developing an AI system capable of defeating elite players in full-scale poker with multiple opponents at the table was widely recognized as the key remaining milestone. With Pluribus, AI has entered the coveted arena of AGI and it looks poised to make more breakthroughs.
Know more about Pluribus here.