Last updated March 9, 2022
In AI News & Update

Microsoft releases µTransfer, a new technique for hypertuning large neural networks

The total compute used to tune GPT-3 turned out to be a mere 7 per cent of the compute used to pretrain the model.

Published on March 9, 2022

by Poulomi Chatterjee

Microsoft Research has collaborated with OpenAI to release a paper titled, ‘Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer’ that describes a technique called µTransfer. This method was proven to make the expensive process of training wide neural networks cost-effective by reducing the amount of trial and error needed.

Tensor Programs was initially introduced by Microsoft Research in 2020. The study was based on µ-Parametrisation that enabled maximal feature learning in the infinite-width limit. The application of µTransfer can help speed up the work done on massive neural networks like GPT-3 and larger networks eventually.

The process of training hyperparameters in wide neural networks drains resources because each time, the network has to guess which hyperparameters to use. The paper shows that there exists a very specific parameterisation that maintains optimal hyperparameters across multiple model sizes.

The team partnered with OpenAI to assess how effective µTransfer would be for GPT-3. Post that, the technique was used to tune a small proxy model with 40 million parameters. The optimal hyperparameter combination that resulted from this was copied onto GPT-3’s 6.7 billion parameters. The study demonstrated that the total compute used to tune GPT-3 turned out to be a mere 7 percent of the compute used to pretrain the model.

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

By transferring from 40M parameters, µTransfer outperforms the 6.7B GPT-3, with tuning cost only 7% of total pretraining cost.

abs: https://t.co/kYiuGDiUpE
repo: https://t.co/TG4eZHErto pic.twitter.com/D8xbHxYRLS
— Aran Komatsuzaki (@arankomatsuzaki) March 8, 2022

“µP provides an impressive step toward removing some of the black magic from scaling up neural networks. It also provides a theoretically backed explanation of some tricks used by past works, like the T5 model. I believe both practitioners and researchers alike will find this work valuable,” Colin Raffel, co-creator of the T5 and assistant professor of Computer Science at the University of North Carolina, stated.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Interview with the team behind Microsoft’s µTransfer

Shraddha Goled 23/03/2022

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.