Uber has further improved its self-driving cars’ performance by proposing raster-based conditional GAN architecture. This is powered by a novel differentiable rasterizer module at the input of the conditional discriminator. It maps generated trajectories into the raster space in a differentiable manner. One of the most critical pieces of the self-driving puzzle is the task of predicting the future movement of surrounding traffic actors, which allows the autonomous vehicle to safely and effectively plan its next route.
The bird’s-eye view (BEV) rasterization methods and GANs collectively provide the contextual understanding, while matching the real-world distribution through the adversarial framework. Although the current techniques helped self-driving cars make effective decisions, they were far from optimal. Therefore, Uber built upon previous methods – such as top-down scene rasterization and generative adversarial networks (GANs) – to further enhance its autonomous vehicles’ performance in traffic.
Sign up for your weekly dose of what's up in emerging technology.
Uber proposed a novel GAN architecture conditioned on an input raster image, referred to as Scene-Compliant GAN (SC-GAN). The critical component of the architecture is the differential rasterizer, which allows generated trajectories to be projected directly into the raster space in a differentiable manner. This simplifies the discriminator task, while allowing the gradients to flow back to the generator. This leads to higher efficiency of the adversarial training and more realistic output trajectories.
According to the researchers, most GAN-based models do not condition on the scene context image in the discriminator, leading to suboptimal performance. The discriminator encodes the input trajectory with a Long Short-Term Memory (LSTM) encoder and makes classifications solely on the trajectory embeddings. Therefore, it does not identify the actual trajectory that is not scene-compliant.
The model consists of three main modules: generator network, discriminator network, and differentiable trajectory rasterizer.
Generator Network: It generates the trajectory prediction given the actor’s state input, per-actor raster, and a noise vector. The generator first extracts the scene context features using CNN. However, researchers used MbileNet to get the faster real-time inference. Besides, past observed actor states are also embedded with a shallow encoding layer and concatenated with the extracted scene context features and the latent noise vector, before passing to a trajectory decoder module that generates the trajectory predictions.
Discriminator Network: It classifies whether a given future trajectory is coming from ground truth (true) or the generator (fake), conditioned on the past observed states and the scene context image. Such scene context was not used in the previous GAN-based discriminator network architecture. Therefore, researchers proposed a scene-compliant architecture that is more sensitive to noncompliant trajectories, comprising only fully convolutional layers. The proposed discriminator relies on a novel differentiable trajectory rasterizer.
Differentiable Rasterizer: The trajectory rasterization module of the scene-compliant discriminator is tasked with rasterizing the future trajectory. It sequences either predicted or ground-truth information into 2D occupancy grids, where each encodes a single next trajectory point, and is a 2D grid with the same shape and resolution as the raster image.
Unlike other GAN-based predictions that used vanilla cross-entropy loss as their GAN loss, the researchers used Wasserstein GAN loss with gradient penalty, resulting in outperforming different approaches.
Tested against several baselines like GAN-LSTM, S-LSTM, S-GAN, S-Way, and more, the proposed SC-GAN model overpowered the existing GAN architectures for motion prediction, reducing both average and final prediction error numbers. The researchers mentioned that SC-GAN successfully predicted cars’ movements even in somewhat challenging edge cases. For instance, when a vehicle was approaching an intersection in a straight-only lane, SC-GAN predicted that it would continue straight, even though the car’s tracked heading was slightly tilted to the left. Besides, SC-GAN rightly anticipated that a car would take a right turn after approaching an intersection in a turning lane.
Various qualitative and quantitative analysis delivered results that outperformed the current state-of-the-art in GAN-based motion prediction of the surrounding actors, producing more accurate and realistic trajectories. Differentiating between motion prediction can take self-driving cars to the next level as it enhances the performance of the autonomous vehicle. This is because with this, ML models can make decisions that go beyond simple pattern matching with inputs. Besides, it can easily become resistant to adversarial attacks that try to trick the AI agents with slight changes.