With significant advancements in the field of deep learning, machines are now trained to achieve human-like performances for numerous tasks. Many a time, these AI systems even surpassed human capabilities with their extraordinary accuracy and abilities.
In recent times, many of these models have also been trained to play prominent video games like Alpha Go, StarCraft II and Atari games along with chess. In a recent study by the University of Zurich and Sony AI researchers have proposed a deep reinforcement learning model that will perform the task of autonomous car racing in one of the renowned racing games — Gran Turismo Sport.
According to researchers, autonomous cars at high speed is a challenging task that requires precise and accurate actions by the vehicle to avoid any sort of mishappening. However, self-driving cars come with some primary robotic challenges like planning trajectories and controlling friction limits when another vehicle approaches its territory.
Therefore, to resolve these issues and perform well in the GT Sport, they decided to leverage maximum-entropy deep reinforcement learning to train a sensorimotor policy which will enable the driverless car to complete the race as fast as possible. Such a trained deep reinforcement agent will map the observation to carry out commands.
According to data, currently, the built-in non-player characters in the GT Sport is approximately 11 seconds behind the faster human driver and is about 83% slower than all other humans. Thus, the study was aimed to create an AI agent that can surpass human players of the game.
Overview Of Creating The Autonomous Agent
According to researchers, the earlier work related to autonomous cars created for racing has been towards trajectory planning and control, supervised learning and reinforcement learning approaches. However, none of these approaches managed to provide an exceeding performance over human drivers in speed.
Therefore, here researchers decided to build a neural network controller that can navigate the autonomous car while minimising the travel time on a given track in the GTS racing. To work with the reinforcement learning methodology, they first defined the reward function that can formulate the ring problem and a neural network policy that can turn inputs to actions. Once that’s done, the policy parameters are optimised by maximising the reward functions using Soft Actor-Critic (SAC) algorithm.
For addressing the minimum-time problem and reward function, there was a requirement to find a valid policy for a given car and track. The researchers designed a proxy-based reward based on the current course progress to be evaluated at timely intervals. Such maximisation, of course, helps in minimising the lap time in the long-enough time horizon. Further to avoid any short-term bias, which can disable the RL agent to brake while racing, the researchers introduced a second reward that penalised wall contact of the vehicle.
Further, the driving policy has been represented by a deep neural network — SAC network architecture, which is combined with two Q-function networks, and a state-value function network with two hidden layers with 256 Rectified Linear Units (ReLU) nodes. This results in a total of 599,566 trainable parameters of the RL agent.
Evaluation Of The RL Agent
To evaluate the approach, the researchers leveraged three-race setting difficulties of different cars and tracks and compared the same to the built-in characters as well as 50,000 human players.
For this, they trained separate agents for three experimental environments on the data provided by Polyphony Digital Inc. of the best time-lapse and trajectory of every participant involved. To provide level playing conditions for humans, the competitions were restricted to fixed settings like car, course, tyres and racing assistance.
Reference settings to compare RL agent to human drivers
For providing a level playing field, the RL agent’s lap times are only being compared to the ones achieved by the built-in players and the fastest human drivers. Here the RL agent outperformed the best human lap time in all the reference settings and even overcame the limitations attached with built-in players.
Results & Conclusion
Firstly, for the first track, the RL agent undercuts the fastest human driver by 0.15 and 0.04 for A and B settings, respectively. This difference in margins could be attributed to the speed difference between the two cars used — Audi TT Cup and Mazda Demio. Secondly, while the human players struggled with the fast-paced setting, the RL agent performed accurately under the increasing demands.
However, in the third setting — “C” — the RL agent defeats the best human time by 0.62 seconds. In the learning progress of setting A for three types of neural network policies, the RL agent learns to achieve lap times faster than the best human player.
Out-in-out driving behaviour and how the RL agent is learning
Following the evaluation, the RL Agent got trained to make use of the whole width of the track to maximise its trajectories, aka out-in-out trajectory, which allows it to drive at high speed without losing traction. Here it was noted the RL agent is driving in curves better than the best human player available without any prior demonstration.
This was possible by its early curve detection capability to assess the sharpness for managing the speed to complete the curves. In the above image, one can see that the autonomous agent was decelerating ~100 meters before taking the turn. The agent further learned to control a policy that can improve the speed and reduce the lap time by 0.15 seconds.
Thus, with the results in hand, it can easily be established that the proposed RL agent can learn to drive autonomously at very high speed in different race settings and difficulties, including curves. In all the possible settings, the RL agent achieved better lap time than all human players in the trajectories chosen by the humans itself.
Such an approach will work without any manual intervention, human data or any prior path planning, using limited computational power to train and evaluate the agent. And therefore, such an approach can help in creating agents that can adapt to new situations with fewer samples.
Read the whole paper here.