Recently, Facebook AI Research (FAIR) built and deployed a role-playing fantasy game world to boost the performance of conversational AI models such as virtual assistants. The researchers presented a fully-realised system for improving an open-domain dialogue task by utilising a deployed game for lifelong learning.
Human beings learn languages over their course of life from the interactions they have with other people. Yet, research in sophisticated natural language processing (NLP) models are done using the fixed dataset, without any ability for the model to interact with humans using language at training time at all.
Sign up for your weekly dose of what's up in emerging technology.
Usually, the researches in natural language processing are focused on crowdsourced static datasets and the supervised learning paradigm of training the model. These crowdsourced datasets are collected by paying the crowd-workers to perform interaction and annotation tasks.
However, relevant studies have shown that crowdsourced data has the issue of lack of naturalness and relevance to real-world use cases. This is because the research budgets for paying the crowd-workers mean that there is a limit to the data collection.
Also, as the crowd-workers are motivated by pay and not by an interest in the actual tasks themselves and the data distribution may not match the desired one. Similarly, there are also other issues, such as the static dataset paradigm which does not allow for a model to learn from its experiences of using language.
Behind the System
The researchers built and deployed a role-playing game in which the human players converse with the learning agents that are situated in an open-domain fantasy world. They studied the ability of an open-domain1 dialogue model to learn from conversations with intrinsically motivated humans iteratively.
They stated, “In order to engage humans at scale, we build and deploy a (free to play) game with a purpose whereby human players role-play characters and converse with other characters (that are our learning models) situated within the game world.”
To maximise engagement, the researchers chose a fantasy game world. The system iterates between collecting data of human-model interactions, retraining updated models on the newly collected data and redeploying them. Simultaneously, it provides a natural metric to evaluate and compare models online using the continuation rate of players.
The game built in this research is an interesting dialogue role-playing game, which is designed for both training and evaluating open-domain dialogue agents. The core game involves pairing two agents in a given setting, where one is a human player and the other is a dialogue agent with an underlying machine learning model.
The two players are assigned characters, with given names and backstories such as personas, and their current location and its description. The goal of each player is to role-play the dialogues of their characters in the given situation. The dialogues in the game are in the English language.
Each dialogue or mini-game consists of 6 turns of dialogue per agent, i.e. 12 turns in total. At the end of the mini-game, the human player has to choose options such as to move to a new location or end the game. There are a variety of mini-games, which gives different role-playing possibilities and hence, making the dialogue data more diverse in nature.
Benefits of this System
According to the researchers, there are many benefits of using the system, such as:
- This system is cost-effective than the traditional ways of collecting data and training the NLP models.
- The collected data is more effective at improving the continuation rates due to being more on-distribution than the crowdsourced data.
- As the model improves, the continue rates also increase. This, in result, will increase the collection of data.
- It provides lifelong dialogue learning in deployed systems with intrinsically motivated humans, rather than the crowd-workers.
As an outcome, the researchers successfully collected, retrained and redeployed models that improve both the offline automatic metrics and human continue rates. They claimed that the system is able to collect data at a rate that is 1/5th of the price per utterance of crowdsourcing, where the cost of the method is the cost of advertisements that make players aware of the game.
They showed that by training models on the conversations they have with humans in the game, the models progressively improve, as measured by automatic metrics and online engagement scores. Also, this learning is claimed to be more efficient than crowdsourced data when applied to conversations with real users and is far cheaper to collect.
Read the paper here.