Behind Meta’s claim of building world’s fastest AI Supercomputer

Meta has released the AI Research SuperCluster (RSC), calling it one of the fastest AI supercomputers running presently in the world.

Advertisement

Meta has released the AI Research SuperCluster (RSC), calling it one of the fastest AI supercomputers running presently in the world. RSC will work across hundreds of different languages, analyse text, images and video together, which will help in building better AI models.

Mark Zuckerberg, while introducing RSC, said, “Meta has developed what we believe is the world’s fastest AI supercomputer. We’re calling it RSC for AI Research SuperCluster. The experiences we’re building for the metaverse require enormous compute power (quintillions of operations/second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more.”

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
https://twitter.com/MetaAI/status/1485658757245947914

RSC to play a key role in Metaverse

In order to understand the full benefits of self-supervised learning and transformer-based models, it requires training increasingly large, complex, and adaptable models. Speech recognition has to work effectively even in challenging scenarios that come with a lot of background noise. NLP has to understand more languages and dialects.

Meta said that RSC can train models that use multimodal signals to determine whether an action, sound or image is harmful or benign more quickly. It added that when RSC moves to the next phase, it will get even bigger with enhanced capabilities as the groundwork for metaverse is built. Meta’s researchers have already started using RSC for training large models in NLP and computer vision. 

Research infrastructure from NVIDIA

Meta has collaborated with NVIDIA to build the AI Research Supercomputer. It uses 760 NVIDIA DGX A100 systems as its compute nodes. It comes with 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network to give 1,895 petaflops of TF32 performance. Penguin Computing is the NVIDIA Partner Network delivery partner for RSC.

Penguin also provided managed services and AI-optimised infrastructure for Meta consisting of 46 petabytes of cache storage with its Altus systems. Pure Storage FlashBlade and FlashArray//C provide the scalable all-flash storage capabilities needed to boost the RSC.

Credit: NVIDIA

This is the second time NVIDIA has been the chosen partner for Meta as its base to provide research infrastructure. In 2017, Meta had built the first generation of infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs. It had the capabilities of handling 35,000 AI training jobs in a day.

The early benchmarks of Meta have shown that RSC can train large NLP models three times faster and run computer vision jobs twenty times faster than the previous system. Later this year, in the second phase, RSC will expand to 16,000 GPUs. Meta thinks it will deliver five exaflops of mixed precision AI performance.

Privacy and security

Meta says that RSC has been built keeping privacy and security as prime focus areas. 

  • RSC is isolated from the larger internet. It has no direct inbound or outbound connections with traffic flowing only from Meta’s production data centres.
  • The entire data path from the storage systems to the GPUs is end-to-end encrypted. It has the necessary tools and processes to verify that these requirements are met every time, Meta claims.
  • Before the data is imported to RSC, it goes through a privacy review process to confirm it has been correctly anonymised. After that, it is encrypted before it finds its usage in training AI models. The decryption keys are deleted regularly so that older data is not still accessible.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.