Last updated February 4, 2021
In AI Origins & Evolution

How Pre-Finetuning Boosts Performance Of Language Models

Share

Published on February 4, 2021

by Shraddha Goled

Pre-training is a machine learning technique employed to train a model to recognise patterns using one task and apply the learned parameters in similar tasks. Much like how humans process new information.

Pre-training language models have made Natural Language Processing significantly cheaper, faster, and easier. Pre-trained models (instead of training the model from scratch) achieve better performance with less training data. Language model pre-training uses self-supervision, which doesn’t require any training data. Fine-tuning, on the other hand, is used to make endpoint adjustments to enhance performance.

Now, the researchers from Facebook have proposed an additional stage between pre-training and fine-tuning, called pre-finetuning — a large scale, multitask learning stage between the two, with over 50 datasets and 4.8 million labelled examples. The research showed pre-finetuning improves the performance of pre-trained discriminator, generation models, and sample efficiency during fine-tuning.

“We show that multitask supervised tuning, if done at a sufficiently large scale with many different tasks, can be an effective second stage of task-agnostic pre-training, removing the need to pre-select the best intermediate tasks,” the authors of the study, said.

What Is MUPPET?

Multitask training is a sub-field of machine learning where a shared model learns multiple tasks simultaneously. The technique is generally used on top of traditional pre-training. This approach comes with greater data efficiency, fast learning using auxiliary information, and reduced overfitting. Models such as multitask deep neural networks (MT DNN) have improved several language benchmarks.

Pre-finetuning is an intermediate technique bookended by pretraining and fine-tuning, and involves large multitask learning steps performed on 50 tasks such as classification, summarisation, question-answering, etc. Standard multitasking schemes often fail to learn ‘high-quality representations’. However, the newly introduced training technique, called Massive Multitask Representation or MUPPET, radically improves training stability and overall performance using loss scaling and task-heterogeneous batches.

For this model, RoBERTa and BART, two popular pre-trained models, were chosen as initial pre-trained models. For each task, a different prediction scheme was used. The pre-tuning procedure was performed for both models, and each model configuration was trained with 64 GPUs.

MUPPET’s Performance

After pretuning, RoBERTa and BART were tested on widely-used benchmarks such as RTE, BoolQ, RACE, SQuAD, and MNLI. It was observed that pre-fine tuning compromises performance when fewer tasks, up 15, are used. However, beyond this point, for a larger number of language tasks, pre-fine tuning leads to performance improvements. MUPPET models performed better than the pre-trained models.

Wrapping Up

The MUPPET model demonstrated:

Pre-trained models when further refined with pre-fine tuning significantly improve performance on downstream tasks
MUPPET, which uses loss scaling and task-heterogenous batches, is effective for learning at scale
Beyond a threshold (in this case, 15), representation improves linearly with the number of tasks.
Pre-fine tuned models require lesser amounts of data for fine-tuning, as compared to a vanilla pre-trained model.
It outperforms previous models such as Recognising Partial Text Entailment (RTE), HellaSWAG and shows improvement on pre-trained representations for Multi-Genre Natural Language Inference (MNLI), Common Sense QA, and Stanford Question Answering Dataset (SQuAD) dataset.

Read the full paper here.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.