Recently, researchers from Hugging Face showed task-specific prompting provides several benefits while fine-tuning pre-trained language models. In a paper called “How Many Data Points is a Prompt Worth?”, researchers stated prompting impacts the pre-trained models’ efficiency and is often worth hundreds of data points on average across classification tasks.
Fine-tuning via an explicit classifier head is one of the critical paradigms for adapting pretrained models for classification. Besides this approach, there are popular alternatives, such as adapting the pretrained language model directly as a predictor through autoregressive text generation, completion of a cloze task, among others. The cloze task method has previously been used for fine-tuning the popular T5 transformers.
When fine-tuning pre-trained language models for classification, researchers mostly used two techniques: a generic model head or a task-specific prompt for prediction. In the head-based transfer learning setting, a generic head layer takes pretrained representations to predict an output class. In the prompt-based method, a task-specific pattern string is designed to coax the model into producing a textual output corresponding to a given class.
According to researchers, both approaches can be used for fine-tuning with supervised training data. However, prompts further allow the user to customise patterns to help the model. In this research, the authors mainly discussed the later technique and how it benefits low-data regimes in pre-trained models.
“The intuition of prompts is that they introduce a task description in natural language, even with few training points,” said the researchers. Classification by direct language generation allows us to pick custom prompts for each task. The approach can also be used for zero-shot classification, priming, and fine-tuning to provide extra task information to the classifier, especially in the low-data regime.
As prompting has been used both for zero-shot and fine-tuning based methods, to understand the importance of promoting, the researchers introduced a metric called the average data advantage for quantifying the impact of a prompt in practice. “Our experiments find that the impact of task targeted prompting can nicely be quantified in terms of direct training data and that it varies over the nature of different tasks,” researchers said.
The Tech Behind
The researchers run all the experiments with RoBERTa-Large (355M parameters) from the RoBERTa model. The evaluation was performed on SuperGLUE and MNLI datasets. These datasets comprise various tasks, all in English, including entailment, multiple-choice question answering, MultiRC, commonsense reasoning, WiC, among others.
They compared the language models across various available data, starting with ten data points and building exponentially to the entire dataset. The researchers ran every experiment four times to reduce variance, for a total of 1,892 training runs, across all tasks. At every point, they reported the best performance achieved at that amount of data or lower.
The research showed that prompts provide a method for injecting task-specific guidance, beneficial in low-data regimes. It has also demonstrated that prompting offers a substantial advantage in terms of data efficiency for almost all the tasks and adds the equivalent of hundreds of data points on average.
Further analysis showed that prompting is most robust to pattern choice and can even learn without an informative verbalizer. On large datasets, prompting is similarly helpful in terms of data points, although they are less beneficial in performance.
Read the paper here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.