Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters

Earlier this week, the Chinese internet giant Baidu released PLATO-XL, a pre-trained dialogue generation model with up to 11 billion parameters. It adopts the architecture of a unified transformer with high computation and parameter efficiency. 

PLATO-XL carries out multi-party aware pre-training to better distinguish the characteristic information in social media conversation. As a result, it achieves superior performance compared to other approaches in both English and Chinese. Furthermore, PLATO-XL has effectively reduced the inconsistency phenomenon in multi-turn conversations, thanks to the multi-party aware pre-training. 

Soon, the company plans to release the source code on GitHub. “We will release our source code together with the English model at GitHub, hoping to facilitate frontier research in dialogue generation,” said Baidu researchers. 


Sign up for your weekly dose of what's up in emerging technology.

Language Models Vs Dialogue Generation Models 

The efficiency of the pre-training paradigm has been widely recognized in natural language processing (NLP) – the likes of Switch Transformer, GPT-3, BERT, XLNet, RoBERTa, LaMDA, etc. – where large-scale transformer models are trained with massive plain texts. However, most of these language models follow the trend of enlarging the model size, dataset set, or the amount of compute used for training. 

More specifically, in the case of OpenAI’s GPT-3 model with 175 billion parameters, the language model shows strong zero-shot learning capacities without task-specific fine-tuning on downstream tasks. That is where the dialogue generation model comes into the picture. Compared to general language models, dialogue generation models are usually pre-trained with human-like conversations collected from social media platforms – Reddit, Twitter, etc. 

Download our Mobile App

Some of the popular dialogue generation models include Microsoft’s DialoGPT, Google’s Meena, Facebook’s Blender and Baidu’s PLATO-2. In no time, these models have also been scaled up to billions of parameters and taken advantage of much more social media conversations for pre-training. However, in dialogue generation, there still lacks a clear conclusion about the correlation between model scale and conversation quality, alongside other limitations like unfair biases, misleading information, the inability of continuous learning, etc. 

Hopefully, the recently launched PLATO-XL will continue to improve the conversation quality on fairness and factuality. 

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters
(Source: Baidu Research)

For example, DialoGPT has three model sizes – 117 million, 345 million and 762 million parameters. Out of these, the 345 million is said to have obtained the best performance in their evaluations. Similarly, in the case of Blender, the 2.7 billion parameters achieved better performance as compared to the one with 9.4 billion parameters.

Baidu researchers believe that the conversation quality might benefit from the enlarged model scale with an appropriate pre-training design. PLATO-XL fits perfectly in the equation. Besides the open-domain conversations, the model includes two common conversational tasks – knowledge grounded dialogue and task-oriented conversation. 

Plus, the researchers have also explored the ability of PLATO-XL as the foundation model of conversational AI. Interestingly, their experiments indicated that PLATO-XL could outperform other dialogue generation models across multiple conversational tasks. 

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters
(Source: Baidu Research)

Inside PLATO-XL 

PLATO-XL adopts the unified architecture that allows simultaneous modelling of dialogue, understanding and response generation, which is more parameter efficient. Furthermore, a flexible mechanism of self-attention mask enables bidirectional encoding of dialogue history and unidirectional decoding of responses. Moreover, the unified transformer architecture proves to be efficient in the training of dialogue generation. 

Given the variable lengths of conversation samples, many invalid computations are caused by padding in the training process. Thanks to the unified transformer, it can greatly improve the training efficiency through the effective sorting of the input samples. 

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters
An overview of PLATO-XL network (Source: Baidu Research)

PLATO-XL, with 11 billion parameters, includes two dialogue models – Chinese and English. In addition, over 100 billion tokens of data are used in pre-training. The dialogue model is implemented on PaddlePaddle, a deep learning platform developed by Baidu. To train such a large model, PLATO-XL has adopted the techniques of gradient checkpoint and shared data parallelism provided by FleetX, PaddlePaddle’s distributed training library. It is trained on a high-performance GPU cluster with 256 NVIDIA Tesla V100 32G GPU cards. 

Wrapping up 

With this latest development, Baidu’s PLATO-2 has been upgraded to PLATO-XL, with over ten billion parameters, making it the world’s largest Chinese and English dialogue generation model. The researchers believe that it achieves superior performance in open-domain conversion and raises the expectation of what hundred-billion or even trillion parameter dialogue models – Blender, DialoGPT, EVA, PLATO-2, etc. – could do. Moreover, PLATO-XL proves significantly better performance than the current mainstream commercial chatbots. 

Baidu Launches World’s Largest Dialogue Generation Model With 11 Billion Parameters
(Source: Baidu Research

Furthermore, Baidu’s PLATO-XL expands new horizons in open-domain conversations, which is considered one of the most challenging tasks in NLP. Touted as the largest pre-training model for English and Chinese dialogue, PLATO-XL has hit a new level of conversational consistency and factuality – one step closer to the future of human-like learning and conversational abilities. 

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges