The development of a game involves rounds of playtesting to study players’ behaviours and experiences before shaping the final product. However, human playtesting tasks are arduous and repetitive, costly, and can significantly slow down the design and development process.
Automated playtesting minimises the need for human intervention. Existing Deep Reinforcement Learning (DRL) game-playing agents can predict both game difficulty and player engagement.
Now, researchers from Aalto University and Rovio Entertainment have introduced a novel method for automated playtesting by enhancing DRL with Monte Carlo Tree Search (MCTS). “Our focus is on combining DRL and MCTS to predict pass and churn rates as measures of game difficulty and engagement, and to model the relationship between these two game metrics,” as per the paper.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
The researchers took an earlier model presented in the paper titled ‘Predicting Game Difficulty and Churn Without Players’ as the base. The intent behind the model is to integrate AI gameplay with a simulation of how the player population evolves over the stages using DRL.

Method & results
“In this paper, we extend Roohi et al.’s original approach by complementing their DRL gameplay with MCTS and by improving the features extracted from the AI gameplay data. By utilising the same high-level population simulation, reward function, and DRL agents trained for each level, we allow for a direct comparison with their original work.”
Researchers used MCTS for the pass and churn rate prediction. However, the first thing required was to identify which variant of the algorithm allows an AI agent to play even the game’s hardest levels successfully. Four MCTS algorithms were tested to try the best one:
- Vanilla MCTS
- DRL MCTS
- Myopic MCTS
- DRL-Myopic MCTS
To select the best MCTS variant, researchers ran each candidate 20 times on each level. One run used 16 parallel instances of the game. This process took approximately 10 hours per level on a 16-core 3.4 GHz Intel Xeon CPU.
By comparing against the original DRL-based approach, the paper demonstrated that combining MCTS and DRL can improve the prediction accuracy while requiring much less computation in solving hard levels than MCTS alone. “Although combinations of MCTS and DRL have been utilised in many state-of-the-art game-playing systems, our work is the first to measure its benefits in predicting ground-truth human player data,” as per the paper.
However, games aren’t just for fun. We can learn how to optimise diverse processes in a range of distinct and intriguing subfields by training a virtual agent to outperform human players. This is what Google DeepMind accomplished with their popular AlphaGo, which defeated the world’s best Go player at the time. From training, game testing agents to evaluating RL agents, recent research works have enhanced the gaming ecosystem.
Recent developments
Google AI came out with an ML-based approach to train game testing agents and found bugs in games. Imitationation learning (IL) – inspired by the DAgger algorithm– informs ML policies by watching professionals play the game.
A group of researchers from Microsoft Research, the University of Nottingham and UC Berkeley have created a novel methodology for testing the human-AI collaboration of RL agents in the Overcooked two-player gaming environment.
Researchers have effectively used deep reinforcement learning (deep RL) to train agents that can perform well in a variety of contexts in recent years. However, deploying a reinforcement learning agent in a real-world setting necessitates a high level of robustness.