MITB Banner

Google Releases Cloud TPU v4 Pods Benchmarks For Large Model Training

Google’s Open division submissions consist of a 480 billion parameter dense Transformer-based encoder-only benchmark using TensorFlow and a 200 billion-parameter JAX benchmark. These models are architecturally similar to MLPerf’s BERT model but with larger dimensions and number of layers.
Share

Google recently concluded the MLPerf v1.1 Training round, where it submitted two large language model benchmarks into the Open division, one with 480 billion parameters and a second with 200 billion parameters. These submissions make use of publicly available infrastructure, including Cloud TPU v4 Pod slices and the Lingvo open-source modelling framework. 

Training models traditionally at these scales would require building a supercomputer at the cost of tens or even hundreds of millions of dollars – something only a few companies can afford to do. Customers can achieve the same results using exaflop-scale Cloud TPU v4 Pods without incurring the costs of installing and maintaining an on-premise system. 

Google’s Open division submissions consist of a 480 billion parameter dense Transformer-based encoder-only benchmark using TensorFlow and a 200 billion-parameter JAX benchmark. These models are architecturally similar to MLPerf’s BERT model but with larger dimensions and number of layers.

Image Source: Google

These submissions demonstrate large model scalability and high performance on TPUs across two distinct frameworks. Notably, with their stacked transformer architecture, these benchmarks are fairly comparable in terms of their compute characteristics with other large language models.

The two submissions were benchmarked on 2048-chip and 1024-chip TPU v4 Pod slices, respectively. Google was able to achieve an end-to-end training time of ~55 hours for the 480B parameter model and ~40 hours for the 200B parameter model. Each of these runs achieved a computational efficiency of 63% – calculated as a fraction of floating-point operations of the model together with compiler rematerialization over the peak FLOPs of the system used.

Image Source: Google

Achieving these impressive results required a combination of several cutting edge technologies. First, each TPU v4 chip provides more than 2X the compute power of a TPU v3 chip – up to 275 peak TFLOPS. Second, 4,096 TPU v4 chips are networked together into a Cloud TPU v4 Pod by an ultra-fast interconnect that provides 10x the bandwidth per chip at scale compared to typical GPU-based large scale training systems.

Large models are very communication intensive: local computation often depends on results from the remote computation that are communicated across the network. TPU v4’s ultra-fast interconnect has an outsized impact on the computational efficiency of large models by eliminating latency and congestion in the network. Google’s submissions represent an important class of models that have become increasingly important in ML research and production but are currently not represented in MLPerf’s Closed division benchmark suite. 

PS: The story was written using a keyboard.
Share
Picture of Victor Dey

Victor Dey

Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India