MITB Banner

Google Releases New Language Model That Kicks GPT-3’s Butt

Google AI’s GLaM model achieves competitive results on zero-shot and one-shot learning

Share

Google has introduced the Generalist Language Model (GLaM). It is a trillion weight model that uses sparsity. It not only makes it more efficient in terms of training and serving, but it also achieves a competitive advantage on multiple few-shot learning tasks. In terms of performance, GLaM demonstrates improved learning efficiency across 29 public NLP benchmarks in seven categories like language completion, open domain question answering, and inference tasks.

Over the past few years, leading AI institutes and tech companies have been releasing several language models – each bigger and more advanced than the previous. GPT-3’s launch was no less than a watershed moment in this space – never had the world seen such a large model with 175B parameters. GPT-3 and other similar models can perform tasks like few-shot learning across a wide array of tasks, including reading comprehension and question answering with very few or no training examples with much ease.

That said, this innovation and superior performance come at a cost. They are computationally intensive and have adverse effects on the environment. Researchers are now working to develop models that can be trained and used more efficiently.

To build GLaM, Google’s team built a high-quality 1.6 trillion token dataset that contains language usage representative of a wide range of use cases For each token, the gating network selects the two most appropriate experts to process the data. The full version of GLaM has 1.2 trillion total parameters across 64 experts per MoE layer with 32 MoE layers in total, but only activates a subnetwork of 97 billion (8% of 1.2 trillion) parameters per token prediction during inference. Each input token is dynamically routed to two selected expert networks out of 64 for prediction. 

Credit: Google AI

It is a mixture of experts (MoE) model, which means that it has different submodels that are specialised for different inputs. Experts in each layer are controlled by a gating network. They activate experts based on the input data. The gating network selects two most appropriate experts to process the data for each token. The full version of GLaM has 1.2 trillion total parameters across 64 experts per MoE layer with 32 MoE layers in total, but only activates a subnetwork of 97 billion (8% of 1.2 trillion) parameters per token prediction during inference. Compared with Megatron-Turing model, GLaM is on-par on the seven respective tasks if using a 5 percent margin, while using 5x less computation during inference.

Share
Picture of Meeta Ramnani

Meeta Ramnani

Meeta’s interest lies in finding out real practical applications of technology. At AIM, she writes stories that question the new inventions and the need to develop them. She believes that technology has and will continue to change the world very fast and that it is no more ‘cool’ to be ‘old-school’. If people don’t update themselves with the technology, they will surely be left behind.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.