What Is Deep Active Learning: Challenges and Applications

According to NVIDIA, if humans were to label the data for a 100-car fleet driving for eight hours a day, they would require more than 1 million labellers. It takes autonomous vehicles nearly 11 billion miles of driving to perform just 20% better than a human. Real-world problems that machine learning models encounter come with uncertainties and deficiencies. 

So, keeping the model updated, in other words, making the model smarter even with incoming unknown data is a challenge. This is where Active learning (AL) comes into the picture. It involves selecting unlabeled data items to the label to best improve an existing classifier. Active learning is an ongoing active research sub-domain within deep learning space that is developed to help models make more accurate decisions.

Active Learning aims to select the most useful samples from the unlabeled dataset and pass it on to the annotators for labelling. However, active learning algorithms have struggled with high-dimensional data. Therefore, attention is now shifting towards filling the voids of active learning with the advantages of deep learning. 

In an exclusive survey conducted by the researchers at Northwest University, the state of deep active learning(DAL) was investigated. The researchers discussed various factors hindering the research and what significance this domain holds for AI.

Why Care About Deep Active Learning

According to the researchers at Northwest University, there are three main reasons to leverage the synergies of active and deep learning: 

Not Enough Data 

Active learning relies on a small amount of labeled sample data to learn and update the model, while deep learning is only as effective as data. More data, better results. Think: GPT-3. The labeled training samples provided by the traditional active learning methods are insufficient to support the training of traditional deep learning methods. In addition, wrote the researchers, the one-by-one sample query method commonly used in AL is also not applicable in the deep learning regime. 

Model uncertainty

The query strategy based on uncertainty is a key component of AL research. Whereas, the softmax response in deep learning, stated the survey, is unreliable as a measure of confidence, and the performance of this method will thus be even worse than that of random sampling.

Pipeline inconsistency

The processing pipelines of AL and DL are inconsistent. Since active learning algorithms focus primarily on the training of classifiers, the various query strategies utilised are largely based on fixed feature representations. Whereas feature learning and classifier training in deep learning are jointly optimised. The researchers believe that fine-tuning the DL models in the AL framework or treating them as two separate problems, may cause divergent issues.

Applications Of Deep Active Learning

The research in DAL is primarily focussed into problems in image processing. DAL is slowly finding its way into NLP as well. 

When it comes to computer vision tasks, DAL deals with how to efficiently manage query samples of high-dimensional data and cut down labelling costs. DAL allows the assigning of pseudo-labels to samples with high confidence and adds them to the highly uncertain sample set queried using the uncertainty-based AL method, then uses the expanded training set to train the DAL model image classifier. 

The survey stated that object detection and semantic segmentation could also benefit greatly from DAL. For instance, autonomous driving and medical image processing projects are limited by the higher sample labelling cost. The lower labelling cost of DAL fits quite well here.

The researchers listed gene expression, robotics, wearable devices, data analysis, social networking, ECG signal analysis as few other sectors where deep active learning has started to flourish. 

Going forward, the researchers believe that task independence is an important research direction, as it helps to make DAL models more directly and widely extensible to other tasks. However, they also admit that the research still remains insufficient as the corresponding DAL methods tend to focus only on the uncertainty-based selection method.

Check the complete survey here.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.