Active Hackathon

How NVIDIA Built A Supercomputer In 3 Weeks

“Today that mega-system, called Selene, has its own robot attendant and is driving AI forward in automotive, healthcare and natural-language processing.”

Assembling supercomputers take years to build. It requires many service personnel working round the clock for many months to deliver a commission. But, beating all odds, NVIDIA claims to have built its supercomputer within three weeks. Not only did NVIDIA assemble a mammoth of a computer in a short time but also have broken records in the recently conducted MLPerf benchmark tests. NVIDIA’s supercomputer went toe to toe with other industry giants like Google and has set up new benchmarks of computation on state-of-the-art algorithms. 

But how was this made possible? What are the practices that the team imbibed to be on the top? 


Sign up for your weekly dose of what's up in emerging technology.

How Was A Supercomputer Assembled In Such A Short Notice

NVIDIA’s experience in networking, storage, power and thermals came in handy while building Selene. According to NVIDIA, their team crafted ever-larger clusters of V100-based NVIDIA DGX PODs.  

The chief architect behind this system, Michael Houston says that he and his team are building machines by anticipating the uses and long lifetimes. The team then assembled large clusters of V100-based NVIDIA DGX-2 systems, called DGX PODs.  

“We tore everything out twice. It was the fastest way forward, but it still had a lot of downtime and cost,” said Houston. The team then redesigned the overall network to simplify assembling the system and connected modules of 20 nodes by simple “thin switches”, which can be tested easily. The pre-designed cables were then bundled together with Velcro at the factory. Racks were labelled so that they can be traced comfortably.

In the best of times, stated NVIDIA,  it can take dozens of engineers a few months to assemble, test and commission a supercomputer-class system. But, when the world came to a standstill due to pandemic, a small team from NVIDIA assembled a system, which is now the world’s seventh-fastest computer! This mega-system, called Selene, now has its own robot attendant and is driving AI forward in automotive, healthcare and NLP.

Source: NVIDIA

“Selene broke records for AI training performance in the latest MLPerf benchmarks.”

NVIDIA had Selene up and running within a few weeks to be ready for the ultimate showdown of computational giants and also for the customers like Argonne.  NVIDIA’s computer clocked 27 petaflops, and for comparison, India’s fastest supercomputer Pratyush delivers a meagre 3.7 pflops. 

Overview Of NVIDIA DGX Systems

Source: NVIDIA

Today’s enterprise needs an end-to-end strategy for AI innovations to accelerate time-to-insights and reveal new business frontiers. NVIDIA’s DGX systems are custom built to offer services for end-to-end AI development. Their stack of services include:

  • NVIDIA DGX Station is the world’s fastest workstation for data science teams. 
  • NVIDIADGX-1/2/X is an AI system purpose-built for enterprise AI in the data centre. It integrates eight NVIDIA V100 Tensor Core GPUs, using NVLink technology, delivering petaFLOPs of AI performance.
  • NVIDIA DGX POD is a reference architecture for AI scaling, combining compute, networking, storage, power, cooling, and more.

Culminating years of knowledge and experience allowed NVIDIA to build NVIDIA DGX SuperPOD. SuperPOD is a combination of 64 DGX-2 nodes, culminating in a 96-node architecture. DGX SuperPODs are powering up systems for top players like Lockheed Martin in aerospace and Microsoft in cloud-computing services. Today, this mega-system is even being used by the Argonne National Laboratory to research ways to stop the coronavirus. So far, NVIDIA has been powering supercomputers across the world. Now, with Selene, NVIDIA gives a tougher competition to its peers. With such a diverse stack of products, NVIDIA is going to be a key player in the coming days for quicker deployment of data centres.

Kow more here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Data Science Skills Survey 2022 – By AIM and Great Learning

Data science and its applications are becoming more common in a rapidly digitising world. This report presents a comprehensive view to all the stakeholders — students, professionals, recruiters, and others — about the different key data science tools or skillsets required to start or advance a career in the data science industry.

How to Kill Google Play Monopoly

The only way to break Google’s monopoly is to have localised app stores with an interface as robust as Google’s – and this isn’t an easy ask. What are the options?