Active Hackathon

How Unsupervised Data Augmentation Can Improve NLP And Vision Tasks

Data augmentation is often limited to supervised learning, in which labels are required to transfer from original examples to augmented ones. Augmentation provides a steady but limited performance boost. This is because of the fact that they have been applied to a set of labelled examples which is usually of a small size. 

To overcome this limitation, Unsupervised Data Augmentation (UDA) has been developed to apply effective data augmentations to unlabeled data, which is often in larger quantities.


Sign up for your weekly dose of what's up in emerging technology.

In a recent paper titled, Unsupervised Data Augmentation (UDA) for Consistency Training, the authors demonstrate that one can also perform data augmentation on unlabeled data to significantly improve semi-supervised learning (SSL).

Consistency With Data Augmentation

The above picture is an overview of Unsupervised Data Augmentation (UDA). The process from left to right proceeds as follows:

  • First, a standard supervised loss is computed when labelled data is available. 
  • A consistency loss is computed between an example and its augmented version with the help of unlabeled data

The objective of data augmentation is to create a novel training data which resembles reality, by applying a transformation to an example without changing its label. 

Unsupervised Data Augmentation (UDA) makes use of both labelled data and unlabeled data and computes the loss function using standard methods for supervised learning to train the model.

To produce the predictions, the above model is applied to the unlabeled example and augmented. This model is applied to both the unlabeled example and its augmented counterpart to produce two model predictions. A consistency loss is then calculated. Here consistency loss should be read as the distance between two prediction distributions. 

The final loss computed by UDA is done by jointly optimising both the supervised loss from the labelled data and the unsupervised consistency loss from the unlabeled data.

Setting Benchmarks In NLP And Vision Tasks

As can be seen in the picture below, an example of text-based and image-based augmentation. Here, the sentence is translated further into more insightful interpretations. Whereas, the image augmented with different orientations.

The following augmentation strategies were used in achieving the above results:

  • For image classification, AutoAugment uses reinforcement learning to search for an “optimal” combination of image augmentation operations directly based on the validation performances. The trade-off between diversity and validity is rightly balanced by AutoAugment since it is optimised according to the validation set performances in the supervised setting.
  • Back translation to generate diverse paraphrases while preserving the semantics of the original sentences
  • TF-IDF based word replacing for Text Classification

To sharpen the predicted distribution produced on unlabeled examples the following three intuitive techniques were employed in the experiments:

  1. Confidence-based masking
  2. Entropy minimisation
  3. Softmax temperature controlling

Experiments are conducted on six language datasets including IMDb, Yelp-2, Yelp-5, Amazon-2, Amazon-5 and DBPedia

The Transformer model used in BERT as baseline model due to its recent success on many tasks. UDA combines well with representation learning, like BERT, and is very effective in a low-data regime where state-of-the-art performance is achieved.

The results in the paper show that UDA is surprisingly effective in the low-data regime. One such experiment done with only 20 labelled examples for sentiment analysis on IMDb yielded in an error rate of 4.20, which outperforms the existing state-of-the-art models.

Key Takeaways

  • Semi-Supervised Learning(SSL) can match and even outperform purely supervised learning that uses orders of magnitude more labelled data, 
  • Various other types of noise have been tested for consistency training (e.g., Gaussian noise, adversarial noise, and others).Data augmentation outperforms all of them.
  • SSL combines well with transfer learning, e.g., when fine-tuning from BERT and
  • The code is open sourced.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022