Facebook AI has shared new research and two new datasets– TOPv2; and MTOP– to help develop sophisticated and effective Conversational AI systems. According to a Deloitte report, the Conversational AI market is projected to increase from AUD 6 billion in 2019 to AUD 22.6 billion by 2024, with a CAGR of 30.2 percent between 2019 and 2024.
Conversational AI systems have many limitations. For example, the improvements in the existing systems are limited to people speaking widely used languages such as English. Also, it’s quite difficult to scale the existing systems to support new use cases.
Semantic parsing – the task of converting a natural language utterance to a machine-understandable representation – is a critical component of virtual assistants. The existing natural language understanding (NLU) models depend on huge amounts of annotated training data for this task. However, the rub is, large datasets are not available for less popular languages.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
The method proposed by the researchers overcomes these limitations.

Research
Researchers including Xilun Chen, Asish Ghoshal, Yashar Mehdad, Luke Zettlemoyer and Sonal Gupta have published a paper focusing on the adaptation of task-oriented semantic parsers to low-resource domains. The team has proposed a novel method capable of surpassing a supervised neural model at a 10-fold data reduction. Simply put, the new state-of-the-art conversation AI system uses ten times less training data to perform unfamiliar and complex tasks.
The study provides details on a multilingual NLU model that outperforms single-language models. In addition, the approach works well to support more diverse use cases in multiple languages.
Researchers have improved the NLU models to support a wider range of domains without relying primarily on manually annotated training data. With as few as 25 training examples per intent or slot label, the technique can generate task-oriented semantic parsers for new domains.
“Several training strategies exist for domain adaptation. For instance, one can employ joint training that trains a single model with all the available data on both source and target domains. Another approach, which we found superior is the pre-training plus fine-tuning strategy, where a model is first trained on the source domains and then fine-tuned on the low-resource target domains,” as per the paper.
“On the other hand, as pre-trained language representations such as RoBERTa or BART are adopted, the latter strategy becomes a 3-stage training process: train RoBERTa/BART; fine-tune on the source domains; fine-tune again on the target domains.”
The three stages include:
- The first stage, which is out of scope for this paper, is the pre-training stage, where self-supervised language representations are learned.
- Then, it is fine-tuned on the source domains. The stage is called source training to avoid ambiguity with the final stage.
- The source-trained model is fine-tuned again on the target domains in the final stage- denoted as fine-tuning.
The approach distinguishes itself from previous methods on two fronts:
- First, the encoder-only pre-trained representations used in existing work are not ideal for the seq2seq model employed in task-oriented semantic parsing, and instead propose to use BART, a pre-trained model with an encoder-decoder architecture.
- More importantly, researchers adopt optimisation-based meta-learning to improve the model’s generalisation to new target domains with very few training samples.
“We collect the TOPv2 dataset, a large-scale multi-domain task-oriented semantic parsing dataset with eight domains and more than 180k annotated samples to evaluate our models, which we release to the research community,” said the researchers.
Facebook has also made available MTOP data set – a multilingual task-oriented parsing data set with about 100K total utterances spanning six languages, 11 domains, and 117 intent categories. More information on the data set can be found here.