How Pre-Finetuning Boosts Performance Of Language Models

Pre-training is a machine learning technique employed to train a model to recognise patterns using one task and apply the learned parameters in similar tasks. Much like how humans process new information.

Pre-training language models have made Natural Language Processing significantly cheaper, faster, and easier. Pre-trained models (instead of training the model from scratch) achieve better performance with less training data. Language model pre-training uses self-supervision, which doesn’t require any training data. Fine-tuning, on the other hand, is used to make endpoint adjustments to enhance performance.

Now, the researchers from Facebook have proposed an additional stage between pre-training and fine-tuning, called pre-finetuning — a large scale, multitask learning stage between the two, with over 50 datasets and 4.8 million labelled examples. The research showed pre-finetuning improves the performance of pre-trained discriminator, generation models, and sample efficiency during fine-tuning.


Sign up for your weekly dose of what's up in emerging technology.

“We show that multitask supervised tuning, if done at a sufficiently large scale with many different tasks, can be an effective second stage of task-agnostic pre-training, removing the need to pre-select the best intermediate tasks,” the authors of the study, said.



Multitask training is a sub-field of machine learning where a shared model learns multiple tasks simultaneously. The technique is generally used on top of traditional pre-training. This approach comes with greater data efficiency, fast learning using auxiliary information, and reduced overfitting. Models such as multitask deep neural networks (MT DNN) have improved several language benchmarks.

Pre-finetuning is an intermediate technique bookended by pretraining and fine-tuning, and involves large multitask learning steps performed on 50 tasks such as classification, summarisation, question-answering, etc. Standard multitasking schemes often fail to learn ‘high-quality representations’. However, the newly introduced training technique, called Massive Multitask Representation or MUPPET, radically improves training stability and overall performance using loss scaling and task-heterogeneous batches.

For this model, RoBERTa and BART, two popular pre-trained models, were chosen as initial pre-trained models. For each task, a different prediction scheme was used. The pre-tuning procedure was performed for both models, and each model configuration was trained with 64 GPUs.


MUPPET’s Performance

After pretuning, RoBERTa and BART were tested on widely-used benchmarks such as RTE, BoolQ, RACE, SQuAD, and MNLI. It was observed that pre-fine tuning compromises performance when fewer tasks, up 15, are used. However, beyond this point, for a larger number of language tasks, pre-fine tuning leads to performance improvements. MUPPET models performed better than the pre-trained models. 


Wrapping Up

The MUPPET model demonstrated:

  • Pre-trained models when further refined with pre-fine tuning significantly improve performance on downstream tasks
  • MUPPET, which uses loss scaling and task-heterogenous batches, is effective for learning at scale
  • Beyond a threshold (in this case, 15), representation improves linearly with the number of tasks.
  • Pre-fine tuned models require lesser amounts of data for fine-tuning, as compared to a vanilla pre-trained model.
  • It outperforms previous models such as Recognising Partial Text Entailment (RTE), HellaSWAG and shows improvement on pre-trained representations for Multi-Genre Natural Language Inference (MNLI), Common Sense QA, and Stanford Question Answering Dataset (SQuAD) dataset.

Read the full paper here.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM