Last updated October 13, 2021
In AI Origins & Evolution

Wu Dao 2.0: China’s Answer To GPT-3. Only Better

Published on June 24, 2021
by Amit Raja Naik

The Chinese govt-backed Beijing Academy of Artificial Intelligence’s (BAAI) has introduced Wu Dao 2.0, the largest language model till date, with 1.75 trillion parameters. It has surpassed OpenAI’s GPT-3 and Google’s Switch Transformer in size. HuggingFace DistilBERT and Google GShard are other popular language models. Wu Dao means ‘enlightenment’ in English.

“Wu Dao 2.0 aims to enable ‘machines’ to think like ‘humans’ and achieve cognitive abilities beyond the Turing test,” said Tang Jie, the lead researcher behind Wu Dao 2.0. The Turing test is a method to check whether or not a computer can think like humans.

Smartphone maker Xiaomi, short-video giant Kuaishou, on-demand service provider Meituan, 100 plus scientists and multiple organisations have collaborated with BAAI on this project.

Wu Dao 2.0

The Wu Dao 2.0 is a pre-trained AI model that uses 1.75 trillion parameters to simulate conversational speech, writes poems, understand pictures and even generate recipes. The next generation Wu Dao model can also predict the 3D structures of proteins, similar to DeepMind’s AlphaFold and power virtual idols. Recently, China’s first virtual student, Hua Zhibing, was built on Wu Dao 2.0.

The language model Wu Dao 2.0 was trained with FastMoE, a Fast Mixture-of-Expert (MoE) training system similar to Google’s Mixture of Experts. Unlike Google’s MoE, FastMoE is an open source system based on Pytorch (Facebook’s open-source framework) with common accelerators. It provides a hierarchical interface for flexible model design and easy adaption to various applications like Transformer-XL and Megatron-LM. The source code of FastMoE is available here.

“[FastMoE] is simple to use, high-performance, flexible, and supports large-scale parallel training,” wrote BAAI in its official WeChat blog.

Result-wise, Wu Dao 2.0 has surpassed SOTA levels on nine benchmark tasks, including:

ImageNet (zero-shot) SOTA, exceeds OpenAI CLIP.
LAMA knowledge detection, more than AutoPrompt
LAMBADA Cloze (ability-wise), surpasses Microsoft Turing NLG
SuperGLUE (few-short), surpasses OpenAI GPT-3
UC Merced Land-Use (zero-shot) SOTA, exceeds OpenAI CLIP
MS COCO (text generation diagram), surpasses OpenAI DALL-E
MS COCO (English graphic retrieval), more than Google ALIGN and OpenAI CLIP
MS COCO (multilingual graphic retrieval), surpasses (the current best multilingual and multimodal model) UC2, M3P
Multi 30K (multilingual graphic retrieval), surpasses UC2, M3P

Showcasing benchmark tasks where Wu Dao 2.0 surpasses other SOTA models (Source: BAAI)

Towards multimodal model

Currently, AI systems are moving towards GPT-like multimodal and multitasking models to achieve artificial general intelligence (AGI). Experts believe there will be a rise in multimodal models in the coming months. Meanwhile, some are rooting for embodied AI, rejecting traditional bodiless models, such as neural networks altogether.

Unlike GPT-3 , Wu Dao 2.0 covers both Chinese and English with skills acquired by studying 4.9 terabytes of texts and images, including 1.2 terabytes of Chinese and English texts.

Google has also been working towards developing a multimodal model similar to Wu Dao. At Google I/O 2021, the search giant unveiled language models like LaMDA (trained on 2.6 billion parameters) and MUM (multitask unified model) trained across 75 different languages and 1000x times more powerful than BERT. At the time, Google CEO Sundar Pichai said that LaMDA, trained on only text, will soon shift to a multimodal model to integrate text, image, audio and video.

The training data of Wu Dao 2.0 include:

1.2 terabytes of English text data in the Pile dataset
1.2 terabytes of Chinese text in Wu Dao Corpora
2.5 terabytes of Chinese graphic data

Blake Yan, an AI researcher from Beijing, told South China Morning Post that these advanced models, trained on massive datasets, are good at transfer learning, just like humans. “Large -scale ‘pre-trained models’ are one of today’s best shortcuts to AGI,” said Yan.

“No one knows which is the right step,” said OpenAI on its GPT-3 demo blog post, “Even if larger ‘pre-trained models’ are the logical trend today, we may be missing the forest for the trees, and we may end up reaching a less determined ceiling ahead. The only clear aspect is that if the world has to suffer from ‘environmental damage,’ ‘harmful biases,’ or ‘high economic costs,’ not even reaching AGI would be worth it.”

Access all our open Survey & Awards Nomination forms in one place >>

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Wu Dao 2.0: China’s Answer To GPT-3. Only Better

Wu Dao 2.0

Towards multimodal model

Amit Raja Naik

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru