“Today that mega-system, called Selene, has its own robot attendant and is driving AI forward in automotive, healthcare and natural-language processing.”
Assembling supercomputers take years to build. It requires many service personnel working round the clock for many months to deliver a commission. But, beating all odds, NVIDIA claims to have built its supercomputer within three weeks. Not only did NVIDIA assemble a mammoth of a computer in a short time but also have broken records in the recently conducted MLPerf benchmark tests. NVIDIA’s supercomputer went toe to toe with other industry giants like Google and has set up new benchmarks of computation on state-of-the-art algorithms.
But how was this made possible? What are the practices that the team imbibed to be on the top?
How Was A Supercomputer Assembled In Such A Short Notice
NVIDIA’s experience in networking, storage, power and thermals came in handy while building Selene. According to NVIDIA, their team crafted ever-larger clusters of V100-based NVIDIA DGX PODs.
The chief architect behind this system, Michael Houston says that he and his team are building machines by anticipating the uses and long lifetimes. The team then assembled large clusters of V100-based NVIDIA DGX-2 systems, called DGX PODs.
“We tore everything out twice. It was the fastest way forward, but it still had a lot of downtime and cost,” said Houston. The team then redesigned the overall network to simplify assembling the system and connected modules of 20 nodes by simple “thin switches”, which can be tested easily. The pre-designed cables were then bundled together with Velcro at the factory. Racks were labelled so that they can be traced comfortably.
In the best of times, stated NVIDIA, it can take dozens of engineers a few months to assemble, test and commission a supercomputer-class system. But, when the world came to a standstill due to pandemic, a small team from NVIDIA assembled a system, which is now the world’s seventh-fastest computer! This mega-system, called Selene, now has its own robot attendant and is driving AI forward in automotive, healthcare and NLP.
“Selene broke records for AI training performance in the latest MLPerf benchmarks.”
NVIDIA had Selene up and running within a few weeks to be ready for the ultimate showdown of computational giants and also for the customers like Argonne. NVIDIA’s computer clocked 27 petaflops, and for comparison, India’s fastest supercomputer Pratyush delivers a meagre 3.7 pflops.
Overview Of NVIDIA DGX Systems
Today’s enterprise needs an end-to-end strategy for AI innovations to accelerate time-to-insights and reveal new business frontiers. NVIDIA’s DGX systems are custom built to offer services for end-to-end AI development. Their stack of services include:
- NVIDIA DGX Station is the world’s fastest workstation for data science teams.
- NVIDIADGX-1/2/X is an AI system purpose-built for enterprise AI in the data centre. It integrates eight NVIDIA V100 Tensor Core GPUs, using NVLink technology, delivering petaFLOPs of AI performance.
- NVIDIA DGX POD is a reference architecture for AI scaling, combining compute, networking, storage, power, cooling, and more.
Culminating years of knowledge and experience allowed NVIDIA to build NVIDIA DGX SuperPOD. SuperPOD is a combination of 64 DGX-2 nodes, culminating in a 96-node architecture. DGX SuperPODs are powering up systems for top players like Lockheed Martin in aerospace and Microsoft in cloud-computing services. Today, this mega-system is even being used by the Argonne National Laboratory to research ways to stop the coronavirus. So far, NVIDIA has been powering supercomputers across the world. Now, with Selene, NVIDIA gives a tougher competition to its peers. With such a diverse stack of products, NVIDIA is going to be a key player in the coming days for quicker deployment of data centres.
Kow more here.