Andrej Karpathy, Senior Director of AI at Tesla, unveiled a supercomputer at the Computer Vision and Pattern Recognition Conference 2021. It is the world’s fifth most powerful supercomputer in terms of floating-point operations per second (FLOPS).
Inside the cars, Tesla needs powerful computers to run its self-driving software. On the outside, it requires supercomputers to train the self-driving software.
The newly launched supercomputer will be used to train Tesla’s Autopilot and its Full Self Driving (FSD) AI. “For us, computer vision is the bread and butter of what we do and what enables the Autopilot and for that to work really well, you need a massive data set, we get that from the fleet, and you also need to train massive neural nets and experiment a lot. So we’ve invested a lot into the compute,”Andrej said.
Tesla’s self-driving software is powered by neural networks fed on humongous amounts of data–about 1.5 petabytes–from its cars. Tesla uses the neural networks to label 4D data it receives via videos from the cameras installed in its vehicles. The data is then used to train the software to enable autonomous navigation using radar and cameras.
In May, Tesla decided to drop the radar sensor altogether and move completely to a camera-based system in its Model 3 and Model Y cars. The north star of the Tesla Vision is to build autonomous cars with navigation skills better than an average human.
What’s in store?
For some time now, Tesla has been working on a supercomputer. “Tesla is developing a NN training computer called Dojo to process truly vast amounts of video data. It’s a beast! Please consider joining our AI or computer/chip teams if this sounds interesting,” Musk tweeted last year.
Tesla wants to build the world’s fastest supercomputer. At present, Japan’s Fugaka supercomputer is the fastest at 415 petaflops. From the looks of it, Tesla’s newly launched supercomputer seems to be an antecedent to Dojo.
Dojo has been designed to ingest video data and perform massive levels of unsupervised training on visual data, with an expected capacity of one exaFLOP (one quintillion or 1018 floating-point operations per second) or 1,000 petaFLOPs. It will serve as the central system for Tesla to train its self-driving AI.
The newly released supercomputer boasts of:
- 720 nodes of 8x A100 80GB (5760 GPUs total)
- 1.8 EFLOPS (720 nodes * 312 TFLOPS-FP16-A100 * 8 GPU/nodes)
- 10 PB of NVME storage at 1.6 TBps
- 640 Tbps of total switching capacity
At Tesla’s fourth-quarter earnings meeting in January, Elon Musk said Tesla would allow third parties to use Dojo’s capabilities to train neural networks. “We are not trying to keep it to ourselves. I think there could be a whole line of business in and of itself. And then, of course, for training vast amounts of video data and getting the reliability from 100 percent to 200 percent better than the average human to 2,000 percent better than the average human. So that will be very helpful in that regard,” he had said.