MITB Banner

Facebook Shared The Recipes For Building An Open-Domain Chatbot

Share

Recently, the researchers at Facebook AI open-sourced a new chatbot known as Blender. According to the researchers, this new chatbot performs more human-like interactions than the previous chatbots

Along with the announcement of the chatbot, the researchers shared the recipe behind the building and deploying of the same. They stated that for the first time ever, this chatbot has the ability to blend a diverse set of conversational skills in a single system, including empathy, knowledge and personality.

Building an open-domain chatbot is one of the complex and challenging domains in machine learning. In order to build a high-performance chatbot, the researchers worked on scaling neural models in the number of parameters as well as the size of the data they are trained on. The researchers stated, “Good conversation requires a number of skills that an expert conversationalist blends in a seamless way, providing engaging talking points, listening to their partners, as well as displaying knowledge, empathy and personality appropriately while maintaining a consistent persona.”

The Recipe

According to the researchers at Facebook AI, the recipe of the new chatbot incorporates not only large-scale neural models, with up to 9.4 billion parameters or 3.6x more than the largest existing system but also equally important techniques for blending skills and detailed generation.

The main steps of building this chatbot are scale, blending skills and generation strategies.

Scale  

To create a high-performance chatbot, the first step is the large-scale training. For this, the researchers pre-trained large Transformer neural networks up to 9.4 billion on large amounts of conversational data. They used previously available public domain conversations that involved 1.5 billion training examples of extracted conversation. 

Blending Skills

For blending skills, the researchers selected specific tasks that make the model focus on personality and engagingness, knowledge, and empathy. They used a recently introduced novel task called Blended Skill Talk (BST) set-up for training and evaluating the desirable skills. BST targets these aspects by providing training data and initial conversational context. Blended Skill Talk (BST) not only emphasised the desirable traits but also showed that this tuning can minimise undesirable traits such as toxicity, learnt from large corpora.

According to the researchers, BST consists of the following skills:-

  • Engaging use of personality 
  • Engaging use of knowledge
  • Display of empathy
  • Ability to blend all three seamlessly

Generation Strategies

To avoid repetitions during a conversation by the agents, researchers usually implement a number of generation strategies such as beam search, next token sampling, and n-gram blocking. 

However, in this work, the researchers consider three types of architectures, which are retrieval, generative, and retrieve-and-refine (RetNRef) models. For the implementations of retrieval systems and generator, they used poly-encoder architecture and Byte-Level BPE tokenisation trained on the pre-training data, respectively.

From a given dialogue history as input, a retrieval system select the next dialogue utterance by scoring a large set of candidate responses and outputting the highest-scoring one. Then, a standard Seq2Seq Transformer architecture was employed to generate responses rather than retrieve them from a fixed set. And lastly for retrieve and refine, the researchers considered two variants for the retrieval step, they are dialogue retrieval and knowledge retrieval. 

Dataset Used

For pre-training, the researchers used pushshift.io Reddit dataset, which is a variant of Reddit Discussions. According to the researchers, this dataset is a good candidate for helping train a dialogue model in the open-domain case. 

For fine-tuning, the researchers used 3 different types of datasets, which are ConvAI2, Empathetic Dialogues (ED) and Wizard of Wikipedia (WoW). ConvAI2 includes training data of 140k utterances, involves paired crowd workers having a conversation where they get to know each other. Empathetic Dialogues dataset consists of 50k utterances of crowd worker conversations grounded in an emotional situation, and the Wizard of Wikipedia task involves discussing a given topic in-depth, where the goal is to both engage the partner as well as display expert knowledge.

Wrapping Up

For the evaluation of the chatbot, the researchers benchmarked its performance against Google’s Meena chatbot through pairwise human evaluations. They further utilised the ACUTE-Eval method in order to show a series of dialogues between humans paired with each respective chatbot. 

The researchers released 90M, 2.7B and 9.4B parameter pre-trained and fine-tuned generative models as well as provided a script for interacting with the bot with safety filtering built-in. According to the researchers, this method has taken a step further and gained improved performance in terms of engagingness and humanness. However, there are still various issues such as non-trivial repetition, knowledge and factual correctness, contradiction and forgetfulness, among others with the model, which needs to be mitigated in future studies. 

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.