MITB Banner

This AI Agent Uses Reinforcement Learning To Self-Drive In A Video Game

Share

One of the most used machine learning (ML) algorithms of this year, reinforcement learning (RL) has been utilised to solve complex decision-making problems. In the present scenario, most of the researches are focussed on using RL algorithms which helps in improving the performance of the AI model in some controlled environment.

Ubisoft’s prototyping space, Ubisoft La Forge has been doing a lot of advancements in its AI space. The goal of this prototyping space is to bridge the gap between the theoretical academic work and the practical applications of AI in videogames as well as in the real world. In one of our articles, we discussed how Ubisoft is mainstreaming machine learning into game development.  Recently, researchers from the La Forge project at Ubisoft Montreal proposed a hybrid AI algorithm known as Hybrid SAC, which is able to handle actions in a video game. 

Most reinforcement learning research papers focus on environments where the agent’s actions are either discrete or continuous. However, when training an agent to play a video game, it is common to encounter situations where actions have both discrete and continuous components. For instance, when wanting the agent to control systems that have both discrete and continuous components, like driving a car by combining steering and acceleration (both continuous) with the usage of the hand brake (a discrete binary action). 

This is where Hybrid SAC comes into play. Through this model, the researchers tried to sort out the common challenges in video game development techniques. The contribution consists of a different set of constraints which is mainly geared towards industry practitioners.   

The Algorithm Behind

The approach in this research is based on Soft Actor-Critic which is designed for continuous action problems. Soft Actor-Critic (SAC) is a model-free algorithm which was originally proposed for continuous control tasks, however, the actions which are mostly encountered in video games are both continuous as well as discrete. 

In order to deal with a mix of discrete and continuous action components, the researchers converted part of SAC’s continuous output into discrete actions. Thus the researchers further explored this approach and extended it to a hybrid form with both continuous and discrete actions. The researchers also introduced Hybrid SAC which is an extension to the SAC algorithm that can handle discrete, continuous and mixed actions — discrete-continuous.

How It Works

The researchers trained a vehicle in a Ubisoft game by using the proposed Hybrid SAC model with two continuous actions (acceleration and steering) and one binary discrete action (hand brake). The objective of the car is to follow a given path as fast as possible, and in this case, the discrete hand brake action plays a key role in staying on the road at such a high speed.

Wrapping Up

Hybrid SAC exhibits competitive performance with the state-of-the-art on parameterised actions benchmarks. The researchers showed that this hybrid model can be successfully applied to train a car on a high-speed driving task in a commercial video game, also, demonstrating the practical usefulness of such an algorithm for the video game industry.

While working with the mixed discrete-continuous actions, the researchers have gained several experiences and shared them as a piece of advice to obtain an appropriate representation for a given task. They are mentioned below

  • Identify which action components (both discrete and continuous) should be made dependent on each other. When in doubt, it is advised to start with a simpler parameterization based on independent components, and only investigate later the potential benefits of more complex parameterizations. 
  • When a continuous component depends on a discrete component, consider duplicating it (one for each discrete action) as long as the model size remains reasonable. This will allow to consider them as independent, making it easier for the model to specialise the value of the component to each discrete action.
  • If possible, try to avoid dependencies among continuous dimensions, so as to keep simple parameterization where each action dimension can be sampled independently.
Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.