MITB Banner

Microsoft releases µTransfer, a new technique for hypertuning large neural networks

The total compute used to tune GPT-3 turned out to be a mere 7 per cent of the compute used to pretrain the model.
Share

Microsoft Research has collaborated with OpenAI to release a paper titled, ‘Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer’ that describes a technique called µTransfer. This method was proven to make the expensive process of training wide neural networks cost-effective by reducing the amount of trial and error needed. 

Tensor Programs was initially introduced by Microsoft Research in 2020. The study was based on µ-Parametrisation that enabled maximal feature learning in the infinite-width limit. The application of µTransfer can help speed up the work done on massive neural networks like GPT-3 and larger networks eventually. 

The process of training hyperparameters in wide neural networks drains resources because each time, the network has to guess which hyperparameters to use. The paper shows that there exists a very specific parameterisation that maintains optimal hyperparameters across multiple model sizes. 

The team partnered with OpenAI to assess how effective µTransfer would be for GPT-3. Post that, the technique was used to tune a small proxy model with 40 million parameters. The optimal hyperparameter combination that resulted from this was copied onto GPT-3’s 6.7 billion parameters. The study demonstrated that the total compute used to tune GPT-3 turned out to be a mere 7 percent of the compute used to pretrain the model. 

“µP provides an impressive step toward removing some of the black magic from scaling up neural networks. It also provides a theoretically backed explanation of some tricks used by past works, like the T5 model. I believe both practitioners and researchers alike will find this work valuable,” Colin Raffel, co-creator of the T5 and assistant professor of Computer Science at the University of North Carolina, stated. 

PS: The story was written using a keyboard.
Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India