How Reinforcement Learning Can Help In Data Valuation

It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline. Noise can creep into high-quality datasets too.

That information can fall under the following categories:

  1. Human labelling errors
  2. Input from different locations or time
  3. Noisy capturing hardware

But, how does one evaluate the value of a single datum (information)? Training a model on the entire dataset and using its prediction performance as the value can be straightforward. However, evaluating the value of a specific piece of information can be tricky on large-scale datasets. 

With regards to this, data valuation has grown up to a widely researched space. Metrics like Shapley values are used to decide what part of data should be given more priority. 

Now, to make this whole process of data valuation robust and facilitate adaptive learning of data values jointly with the target task predictor model, the researchers from Google Cloud AI and UCLA, proposed a meta-learning framework — Data Valuation using Reinforcement Learning (DVRL).

Overview Of DVRL

DVRL is a data value estimator that learns how likely each piece of information can be used in training of the predictor model. This data value estimator is trained using a reinforcement signal of the reward, which is obtained on a small validation set that gives insights on the performance of the target task.

Instead of treating all data samples equally, this work suggests assigning certain information as a lower priority to obtain a higher performance model.

Step by step procedure of data valuation using DVRL:

  1. A batch of training samples is fed DVRL that gives an output, which is a selection probability of a multinomial distribution
  2. Based on this multinomial distribution, the sampler returns the selection vector 
  3. The target task predictor model is trained on samples with a selection vector using gradient-descent optimisation. 
  4. The data values are nothing but the selection probabilities, which rank the samples according to their importance
  5. The loss is evaluated on a small validation set and compared to the moving average of the previous losses to determine the reward 
  6.  Going by this reward, the reinforcement signal updates the data value estimator.

In short, DVRL integrates data valuation with the training of the target task predictor model and determines a reward by quantifying the performance, which in turn is used as a reinforcement signal to learn the likelihood of each datum being used in training of the predictor model.

Experiments show that DVRL consistently outperforms all benchmarks such as Data Shapley, LOO and Influence Function. The researchers state that the trend of noisy label discovery for DVRL can be very close to optimal particularly for the Adult, CIFAR-10 and Flower datasets.

Key Takeaways

This work not only provides a framework for data valuation but also posits the significance of data valuation in suggesting better practices for data collection. For organisations that sell data, data valuation frameworks can help determine the correct value-based pricing of data subsets. The authors wrote that it could enable new possibilities for constructing very large-scale training datasets in a much cheaper way.

According to the authors, the main contributions of this can be summarised as follows:

  • A novel meta-learning framework was proposed for data valuation that is optimised with the target task predictor model
  • Demonstration of how DVRL significantly outperforms competing methods on many images, tabular and language datasets
  • DVRL, unlike previous methods, is scalable to large datasets and complex models, and its computational complexity is not directly dependent on the size of the training set

Link to paper.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.