How Bayesian inference is used for matchmaking in online gaming

In 1959, Hungarian-American physics professor Arpad Elo developed a statistical rating system–a method for calculating the relative skill levels of players in zero-sum games.

The online video gaming ecosystem is exploding. Now, CSGO has 750,000 daily users, League of Legends has 3.5 million, and Warzone added 8 million players between January and February 2022. Skill-based matchmaking plays a huge role in driving player engagement as it allows players to compete with opponents at similar levels and ratings.

In 1959, Hungarian-American physics professor Arpad Elo developed a statistical rating system–a method for calculating the relative skill levels of players in zero-sum games such as chess. The Elo ranking system is also used in two-player video games such as League of Legends, Counter-Strike: Global Offensive, Rocket League, and Brawlhalla. FIDE had adopted the system in 1970 and still uses Elo’s rating difference table. The biggest drawback of the Elo system is that two players can have identical results and still get different ratings, as the rating is calculated as a change to the current rating. 

A player’s rating is provisional in the Elo system as long as it is based on less than a fixed number of games. Mark Glickman’s Bayesian rating system Glicko fixed the issue by modelling the player’s skill as a Gaussian belief distribution characterised by a mean and variance. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.


Microsoft developed TrueSkill, a skill-based ranking system for online matchmaking for Xbox Live. TrueSkill is a Bayesian skill rating algorithm used worldwide and has multiple theoretical and practical advantages over the Elo system. The former uses a factor graph and the approximate message passing to deduce the marginal belief distribution about the skill of each team. In some cases, if messages in the factor graph are non-Gaussian, they are approximated through moment matching using the Expectation Propagation algorithm. The system measures individual skill levels after comparing pre-match predicted ranking to the post-match observed ranking.

Download our Mobile App

Quality matchmaking: TrueSkill matches players with similar skills and sorts the games according to the match quality. TrueSkill has proven significantly better than Elo for free-for-all and player-vs-player (PvP) game modes. However, its main goal is to reduce the number of matches required to identify the gamer’s skill. 

Odds of winning: The quality of the rating system and the matchmaking model depends on the gamer’s winning ratio. The balance is key. If the winning ratio is too high, the gamer is matched with weaker opponents. It becomes unfair and will also affect the matchmaking for the upcoming games.

In the paper TrueSkill: A Bayesian Skill Rating System, the researchers processed the Halo 2 dataset but rejected games that did not meet a certain match quality threshold. Next, the researchers computed the winning ratio of each player and, depending on the minimal number of games played by each player, measured the average deviation of the winning probability from 50%.

However, the researchers found TrueSkill could provide fairer matchmaking (minimum matches) for PvP games with a winning probability within 35% to 65%. The TrueSkill algorithm offers automatic player rating and matchmaking on Xbox Live. The system had processed thousands of games every day–one of the largest use cases of Bayesian inference to date.

Level up: TrueSkill does not restrict itself to quality matchmaking and balanced gameplay, but it also helps you level up your skill. It happens over a while and depends on the number of games you have played, your opposition, and the type of games you play. TrueSkill ranking system allows you to move up quickly early and reduce the step size in the updates post back-to-back consistent games. As a general rule, the more people per team, the longer it takes to go up or down one level. But the more teams per game, the faster you can go up or down.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Akashdeep Arul
Akashdeep Arul is a technology journalist who seeks to analyze the advancements and developments in technology that affect our everyday lives. His articles primarily focus upon the business, cultural, social and entertainment side of the technology sector.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox