First Trillion Parameter Model on HuggingFace – Mixture of Experts (MoE)

Google AI’s Switch Transformers Model is now openly accessible on HuggingFace.
First Trillion Parameter Model on HuggingFace - Mixture of Experts (MoE)
Listen to this story

Google AI’s Switch Transformers model, a Mixture of Experts (MoE) model, that was released a few months ago is now available on HuggingFace. The model scales up to 1.6 trillion parameters and is now openly accessible. 

Click here to check out the model on HuggingFace.

MoE models are considered to be the next step of Natural Language Processing (NLP) architectures that have highly efficient scalable properties. The architecture is considered similar to the classic T5 model, with a Feed Forward layer getting replaced by a Sparse Feed Forward Layer. Individual tokens are processed by different MLPs guided by the router module. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Younes Belkada from HuggingFace said, “The architecture can build on research for the next generation of NLP models including GPT-4.”

Read: GPT-4 is almost here, and it looks better than anything else


Download our Mobile App



The weights of the model are pre-trained and require fine-tuning before using them for projects. HuggingFace has also released a demo on how to fine-tune an MoE model on text summarisation that you can check out here.

What are Switch Transformers? Switch Transformers are effective natural language learners that are highly scalable. Simplifying MoE makes the models excel in natural language tasks in all the training regimes, allowing the model to be trained on billions to trillions of parameters, and substantially increasing the speed when compared to T5 baselines.

You can also check out the documentation about SwitchTransformers on HuggingFace website.

Click here to read the paper by Google AI on Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Mohit Pandey
Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR