First Trillion Parameter Model on HuggingFace – Mixture of Experts (MoE)

Google AI’s Switch Transformers Model is now openly accessible on HuggingFace.
First Trillion Parameter Model on HuggingFace - Mixture of Experts (MoE)
Listen to this story

Google AI’s Switch Transformers model, a Mixture of Experts (MoE) model, that was released a few months ago is now available on HuggingFace. The model scales up to 1.6 trillion parameters and is now openly accessible. 

Click here to check out the model on HuggingFace.

MoE models are considered to be the next step of Natural Language Processing (NLP) architectures that have highly efficient scalable properties. The architecture is considered similar to the classic T5 model, with a Feed Forward layer getting replaced by a Sparse Feed Forward Layer. Individual tokens are processed by different MLPs guided by the router module. 

Younes Belkada from HuggingFace said, “The architecture can build on research for the next generation of NLP models including GPT-4.”

Read: GPT-4 is almost here, and it looks better than anything else

The weights of the model are pre-trained and require fine-tuning before using them for projects. HuggingFace has also released a demo on how to fine-tune an MoE model on text summarisation that you can check out here.

What are Switch Transformers? Switch Transformers are effective natural language learners that are highly scalable. Simplifying MoE makes the models excel in natural language tasks in all the training regimes, allowing the model to be trained on billions to trillions of parameters, and substantially increasing the speed when compared to T5 baselines.

You can also check out the documentation about SwitchTransformers on HuggingFace website.

Click here to read the paper by Google AI on Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.

Download our Mobile App

Mohit Pandey
Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.