MITB Banner

After Poker And Go, Reinforcement Learning Is Now Beating Mahjong Players

Share

For the first time, an AI model has outperformed top players in the game of Mahjong. Microsoft Research Asia designed an AI model for Mahjong known as Suphx. The researchers evaluated Suphx on the most popular and competitive Mahjong platform, Tenhou, which has more than 3,50,000 active users. The Suphx model has exhibited higher performance than most top players in terms of stable rank – t is rated above 99.99% of all the officially ranked players in the Tenhou platform.

Games have become one of the most popular testbeds for testing reinforcement learning algorithms. AI researchers have already become successful in beating human players with deep reinforcement learning algorithms in two or multi-player games like Go, Texas Hold’em, Atari, among others. 

Companies like OpenAI and DeepMind have been doing a lot around this. Last year, OpenAI benchmarked reinforcement learning so that the learning model avoids overfitting. Why do researchers choose this algorithm? It is because this learning approach enables the computer to make a series of decisions that maximizes a reward metric for the task without human intervention, and without being explicitly programmed to achieve the task. 

Behind the Model

Suphx – short for Super Phoenix – is an AI system for four-player Japanese Mahjong (Riichi Mahjong). The training of Suphx is based on distributed reinforcement learning. The model adopts deep convolutional neural networks (CNNs) as the model architecture for its policy.

Due to the complex rules of Mahjong, Suphx learns five models to handle different situations. These are the discard model, the Riichi model, the Chow model, the Pong model, and the Kong model. Besides these, Suphx employs another rule-based winning model to decide whether to declare a winning hand and win the round.

The learning phase of Suphx contains three significant steps. They are mentioned below:-

  • The five models of Suphx are trained by supervised learning, using (state, action) pairs of top players collected from the Tenhou platform.
  • The supervised models are improved through self-play reinforcement learning (RL), with the models as policy. The researchers adopt the popular policy gradient algorithm and introduce global reward prediction and oracle guiding to handle the unique challenges of Mahjong.
  • During online playing, the researchers employed run-time policy adaptation to leverage new observations on the current round to perform even better.

Why Mahjong

According to the researchers, Mahjong is a much more complicated game than other games like chess, Go, etc. which have been played by AI models. It is a multi-round tile-based game with imperfect information and multiple players. In each round, four players compete with each other towards the first completion of a winning hand.

The researchers chose this game mainly because of three reasons. Firstly, according to them, this game has complicated scoring rules. Each game of Mahjong contains multiple rounds, and the final ranking, as well as the reward of the game, is determined by the accumulated round scores of those rounds. Furthermore, it has a vast number of possible winning hands, making the scoring rules more complex than previously studied games, including chess, Go, etc.

Secondly, the broad set of hidden information of the tiles makes Mahjong a much more difficult imperfect-information game than previously studied ones, such as Texas hold’em poker. Thirdly, the playing rule of Mahjong is much more complicated because of the various actions involved. 

Wrapping Up

The researchers claim that building a strong Mahjong program raises great challenges to the current studies on game AI. Furthermore, they claim that Suphx can help in solving complex real-world problems in finance market prediction and logistic optimization. They stated, “We believe our techniques designed in Suphx for Mahjong, including global reward prediction, oracle guiding and parametric Monte-Carlo policy adaptation, have a great potential to benefit for a wide range of real-world applications.”

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.