Microsoft, NVIDIA test waters for a large-scale generative language model with promising results

We believe that our results and findings can help, shape, and facilitate future research in foundational, large-scale pretraining.
A Hands-On Guide to SwinIR: A Transformer for Image Restoration

Image restoration techniques such as image super-resolution (SR), image denoising, and JPEG compression artefact reduction strive to recreate a high-quality clean image from a low-quality degraded image.
The Controversy Behind Microsoft-NVIDIA’s Megatron-Turing Scale

The key is that 1T was never ‘trained to convergence.’
NVIDIA, Microsoft Introduce New Language Model MT-NLG With 530 Billion Parameters, Leaves GPT-3 Behind

MT-NLG has 3x the number of parameters compared to the existing largest models – GPT-3, Turing NLG, Megatron-LM and others.
Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters

PLATO-XL is trained on a high-performance GPU cluster with 256 NVIDIA Tesla V100 32G GPU cards.
A Deep Dive into Switch Transformer Architecture

Switch Transformer models were pretrained utilising 32 TPUs on the Colossal Clean Crawled Corpus, a 750 GB dataset composed of text snippets from Wikipedia, Reddit and others
Google Trains A Trillion Parameter Model, Largest Of Its Kind

Google has developed and benchmarked Switch Transformers, a technique to train language models, with over a trillion parameters. The research team said the 1.6 trillion parameter model is the largest of its kind and has better speeds than T5-XXL, the Google model that previously held the title. Switch Transformer According to the researchers, the Mixture […]