Tencent, the owner of China’s largest messaging app WeChat, is one of the biggest tech players in the country. The company has been doing a number of AI researches in its video game business and has gained the position of being the second-largest cloud platform in China. The company has been implementing AI in games in various ways such as real-time identification games, playtime limits for children, among others.
Last year, Fine Art, a Go-playing computer built by Tencent, defeated a human Go champion. The International Go Federation reported that Fine Art played 34 games against professionals given a two-stone handicap, and won 30.
Recently, the researchers at Tencent AI lab developed an AI system which has the capability to defeat human champions in a smash-hit mobile game called Arena of Valor. The Arena of Valor, also known as Honor of Kings is a multiplayer online battle arena (MOBA) game. To the latest, the researchers revealed the technique which has been utilised to master the MOBA game.
AI Techniques Behind the System
For this system, the researchers studied the deep reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. The researchers claimed that this system is of low coupling and high scalability which enables efficient explorations at large scale. The algorithm basically includes several strategies such as decoupling of control dependency, an attention mechanism for target selection, game-knowledge-based pruning method called action mask for the efficient exploration, LSTM for learning skill combos, and an improved version of proximal policy algorithm (PPO) objective called dual-clip PPO.
The researchers design a scalable and loosely-coupled system architecture to construct the utility of data parallelism. The architecture mainly consists of four modules which help in providing high throughput and smooth data storage and transmission while avoiding the bottleneck of communication cost. They are mentioned below
- Reinforcement Learning (RL) learner: The RL Learner is a distributed training environment. To accelerate policy update using large batch sizes, multiple RL Learners are integrated to parallelly fetch data from the same number of Memory Pools.
- AI Server: AI Server covers the interaction logic between the game environment and the AI model. It basically generates episode via self-play with mirrored policies.
- Dispatch Module: Dispatch Module collects data samples from AI Servers, consisting of reward, feature, action probabilities, etc.
- Memory Pool: Memory pool is basically a server where its internals are implemented as a memory-efficient circular queue for data storage.
How It Works
The researcher designed a deep reinforcement learning framework together with a set of algorithm which helps to enable efficient explorations at massive scale for multi-agent competitive environments like MOBA 1v1 games. For this system, a neural network architecture along with encoding of multi-modal inputs, the decoupling of inter-correlations in controls, exploration pruning mechanism, and attack attention is designed to consider the everchanging game situations in MOBA 1v1 games.
In order to evaluate the trained AI system’s capability in the real world, the researchers deployed the AI model into the Honor of Kings game to play against the professional human players. The results were such that the AI model beat the professional human players on heroes of different types. The model achieved 5 kills per game but gets killed only 1.33 times on average.
The researches further evaluated whether the policies learned by the AI model could counter to a diversity of top human players. In this case, the model achieved a 99.81% win rate among 2,100 matches, while losing in only 4 games. As the next move, the researchers assured that the framework and algorithm will be open-sourced to the public, while the game core of Honor of Kings will be made accessible to the community to facilitate further research on complex games.