Active Hackathon

Microsoft Open Sources This “Mixture of Experts” Models Library

Tutel is an implementation of the mixture-of-experts technique for large-scale DNN model training.

Tutel is a library from Microsoft that enables building mixture of experts (MoE) models – a subset of large-scale AI models. Tutel is open source and has been included in fairseq, one of Facebook’s PyTorch toolkits, to enable developers across AI disciplines. 

Microsoft’s Ownership of MoE

MoE is composed of small clusters of “neurons” that are activated only under very precise conditions. Lower “layers” of the MoE model extract features, which specialists then evaluate. For instance, MoEs can develop a translation system, with each expert cluster learning to handle a distinct chunk of speech or grammatical norm. Deep learning architecture MoE has a computational cost that is less than the number of parameters, making scalability easy.


Sign up for your weekly dose of what's up in emerging technology.

MoEs have different advantages over other model architectures. They can specialise in response to situations, allowing the model to exhibit a broader range of behaviours. Indeed, MoE is one of the few methodologies proved to scale to over a trillion parameters, paving the door for models to power computer vision, speech recognition, natural language processing, and machine translation systems. Parameters are the components of a machine learning model that are learned from historical training data. The association between factors and sophistication has generally held up well, particularly in the language domain.

Tutel Features

Tutel is primarily concerned with optimising MoE-specific computing. The library is optimised, in particular, for Microsoft’s new Azure NDm A100 v4 series instances, which offer a sliding scale of NVIDIA A100 GPUs. In addition, Tutel features a “simple” interface designed to facilitate integration with other MoE systems, according to Microsoft. Alternatively, developers can leverage the Tutel interface to include standalone MoE layers directly into their DNN models.

Tutel’s comprehensive and adaptable MoE algorithmic support enables developers working in various AI disciplines to perform MoE more quickly and efficiently. Its high compatibility and extensive feature set ensure optimal performance when dealing with the Azure NDm A100 v4 cluster. Tutel is a free and open-source project that has been integrated into fairseq.

Optimisations to Tutel’s MOE

Tutel is a complement to previous high-level MoE solutions such as fairseq and FastMoE. It focuses on optimising MoE-specific computation and all-to-all communication and providing diverse and adaptable algorithmic MoE support. Tutel’s user interface is straightforward, making it simple to combine with other MoE systems. Alternatively, developers can use the Tutel interface to embed independent MoE layers directly into their own DNN models, gaining immediate access to highly optimised state-of-the-art MoE capabilities.

Computations for the MoE

Due to a lack of efficient implementations, MoE-based DNN models construct the MoE computation using a naive mixture of numerous off-the-shelf DNN operators given by deep learning frameworks such as PyTorch and TensorFlow. Due to redundant computing, this method incurs large performance overheads. Tutel develops and implements several highly efficient GPU kernels that provide operators for MoE-specific computation. In addition, Tutel will actively integrate emerging machine learning algorithms from the open-source community.


Microsoft is particularly interested in MoE because it makes efficient use of hardware. Computing power is only used by professionals with the specialised knowledge required to address a problem. The remainder of the model patiently awaits their turn, which increases efficiency. Microsoft demonstrates its commitment by launching Tutel, an open-source library for constructing models of equivalence. According to Microsoft, the Tutel programme helps developers expedite MoE models’ operation and maximise hardware use efficiency. 

MoE offers holistic training through techniques from various disciplines. Tutel has a considerable advantage over the fairseq architecture, as proved by researchers. It has also been incorporated into the DeepSpeed architecture, which benefits Azure services.
To know more about Tutel, read here.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.