One of the most used machine learning (ML) algorithms of this year, reinforcement learning (RL) has been utilised to solve complex decision-making problems. In the present scenario, most of the researches are focussed on using RL algorithms which helps in improving the performance of the AI model in some controlled environment.
Ubisoft’s prototyping space, Ubisoft La Forge has been doing a lot of advancements in its AI space. The goal of this prototyping space is to bridge the gap between the theoretical academic work and the practical applications of AI in videogames as well as in the real world. In one of our articles, we discussed how Ubisoft is mainstreaming machine learning into game development. Recently, researchers from the La Forge project at Ubisoft Montreal proposed a hybrid AI algorithm known as Hybrid SAC, which is able to handle actions in a video game.
Most reinforcement learning research papers focus on environments where the agent’s actions are either discrete or continuous. However, when training an agent to play a video game, it is common to encounter situations where actions have both discrete and continuous components. For instance, when wanting the agent to control systems that have both discrete and continuous components, like driving a car by combining steering and acceleration (both continuous) with the usage of the hand brake (a discrete binary action).
This is where Hybrid SAC comes into play. Through this model, the researchers tried to sort out the common challenges in video game development techniques. The contribution consists of a different set of constraints which is mainly geared towards industry practitioners.
The Algorithm Behind
The approach in this research is based on Soft Actor-Critic which is designed for continuous action problems. Soft Actor-Critic (SAC) is a model-free algorithm which was originally proposed for continuous control tasks, however, the actions which are mostly encountered in video games are both continuous as well as discrete.
In order to deal with a mix of discrete and continuous action components, the researchers converted part of SAC’s continuous output into discrete actions. Thus the researchers further explored this approach and extended it to a hybrid form with both continuous and discrete actions. The researchers also introduced Hybrid SAC which is an extension to the SAC algorithm that can handle discrete, continuous and mixed actions — discrete-continuous.
How It Works
The researchers trained a vehicle in a Ubisoft game by using the proposed Hybrid SAC model with two continuous actions (acceleration and steering) and one binary discrete action (hand brake). The objective of the car is to follow a given path as fast as possible, and in this case, the discrete hand brake action plays a key role in staying on the road at such a high speed.
Hybrid SAC exhibits competitive performance with the state-of-the-art on parameterised actions benchmarks. The researchers showed that this hybrid model can be successfully applied to train a car on a high-speed driving task in a commercial video game, also, demonstrating the practical usefulness of such an algorithm for the video game industry.
While working with the mixed discrete-continuous actions, the researchers have gained several experiences and shared them as a piece of advice to obtain an appropriate representation for a given task. They are mentioned below
- Identify which action components (both discrete and continuous) should be made dependent on each other. When in doubt, it is advised to start with a simpler parameterization based on independent components, and only investigate later the potential benefits of more complex parameterizations.
- When a continuous component depends on a discrete component, consider duplicating it (one for each discrete action) as long as the model size remains reasonable. This will allow to consider them as independent, making it easier for the model to specialise the value of the component to each discrete action.
- If possible, try to avoid dependencies among continuous dimensions, so as to keep simple parameterization where each action dimension can be sampled independently.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box. Contact: email@example.com