The Brilliant Language Of Lanes

Tesla’s Full Self-Driving takes huge steps to re-create 3D models of objects using in-vehicle cameras to improve autopilot capabilities
Listen to this story

On Tesla’s AI day, the Autopilot team revealed the improvements and massive upgrades in their software. Overall, the Full Self Driving (FSD) has released 35 software updates to date. Ashok Elluswamy, the Autopilot Director, announced that around 160,000 customers globally have been running the beta software of the autopilot and the self-driving system. This is a leap from 2,000 customers last year. 

The Autopilot team explained how the FSD system is trained and operates—starting from neural networks to training data, and planning, alongside training infrastructure, AI compiler and inference stages, and more.

Occupancy Network

The Occupancy Network is a multi-camera-based neural network that predicts the surrounding environment of the car using inferred images. The prediction process takes place within the system of the vehicle and is not reliant on the server—therefore, it is also able to predict the future movement and position of the surrounding objects. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The Occupancy Network uses all eight cameras on the vehicle, capturing 12-bit images, to detect objects around the car and create a single, unified volumetric occupancy 3D vector space. Since it is based on video inputs, it can also instantaneously—in less than 10 milliseconds—detect changes in the environment like crossing pedestrians, debris, or accelerating cars and adjust the speed and position of the car relative to the uncertainty.

Additionally, the team is also developing the Neural Radiance Fields (NeRF) networks by treating the output vectors from the Occupancy Network as inputs for NeRF. Using images from the cameras on the vehicles, NeRF can 3D reconstruct dense meshes using volumetric rendering.

The network is trained with a Large Auto-Labelled Dataset without any human interaction. The team built three in-house supercomputers comprising 14,000 GPUs for training and auto-labelling. The training videos are stored in 30 petabytes of storage cache, with half a million videos flowing in and out of the system daily.  

Language of Lanes

In the previous detection method of lanes, Tesla used 2D Pixelwise Instance Segmentation, which could only detect the eagle lane and the adjacent lanes. This only worked efficiently on well-designed and structured roads like the highways. But on roads within the cities, the intersections and lanes are quite complex.

Tesla introduced ‘FSD Lanes Neural Network’ which comprises three components—Vision Component, Map Component, and Language Component.

The ‘Vision Component’ consists of a set of convolutional layers, attention layers, and other neural network layers that—using the videos from the eight cameras on the vehicles—produce a visual representation. This visual representation is then enhanced with the ‘Map Component’ which has the road-level navigation map which is called the ‘Lane Guidance Module’. 

The Lane Guidance Module consists of neural network layers that give information about the intersection, number of lanes, and various other features of the road that the cameras on the vehicles might not be able to identify easily in real-time. The first two components produce a 3D Dense World Tensor.

This Dense World Tensor is treated as an input image and combined with Tesla’s developed language for encoding lanes and lanes topology called the ‘Language of Lanes’—which is the third component—using LLMs in which the words and tokens are the lane positions of the space.

Training Data

Labelling the training data of half a million videos that pass through the supercomputers everyday is a mammoth task. The team built an Auto-Labelling machine for the Lanes Network which, using video footage from the vehicle’s camera, is able to reconstruct 3D vector spaces with the combination of the occupancy network and the newly developed language of lanes. To create one vector mesh from a single trip, the system only takes approximately 30 minutes. 

Then using ‘Multi-Trip Reconstruction’, footage from different cars is combined and matched. This creates a map in an even lesser time and only requires human intervention in the end to finalise the label of the output.

To fix some of the labels where the automated labelling system was facing trouble like parked vehicles, trucks, vehicles on curvy roads, or parking lots, the team corrected 13,900 video labels manually to optimise the whole data engine.

Thanks to its accelerated video library built on PyTorch, the team noted a +30% training speed. Using the generated data from the occupancy network, the language of lanes, and NeRF-generated 3D reconstruction models, the team created a Simulation. In this 3D-created world, the team introduced new challenges, environments, and objects to train the system on different changing situations like road designs, biomes, weather conditions, and more. 

Elon Musk said that the FSD beta would be available worldwide by the end of this year. “But, for a lot of countries, we need regulatory approval. So, we are somewhat gated by the regulatory approval in other countries,” explained Musk, “From a technical standpoint, it will be ready to go to a worldwide beta by the end of this year.”  

Mohit Pandey
Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR