Switch Transformer

NVIDIA, Microsoft Introduce New Language Model MT-NLG With 530 Billion Parameters, Leaves GPT-3 Behind
NVIDIA, Microsoft Introduce New Language Model MT-NLG With 530 Billion Parameters, Leaves GPT-3 Behind

MT-NLG has 3x the number of parameters compared to the existing largest models – GPT-3, Turing NLG, Megatron-LM and others.

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters
Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters

PLATO-XL is trained on a high-performance GPU cluster with 256 NVIDIA Tesla V100 32G GPU cards.

A Deep Dive into Google's Switch Transformer Architecture
A Deep Dive into Switch Transformer Architecture

Switch Transformer models were pretrained utilising 32 TPUs on the Colossal Clean Crawled Corpus, a 750 GB dataset composed of text snippets from Wikipedia, Reddit and others

Switch transformer
Google Trains A Trillion Parameter Model, Largest Of Its Kind

Google has developed and benchmarked Switch Transformers, a technique to train language models, with over…