Active Hackathon

How Reinforcement Learning Can Help In Data Valuation

It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline. Noise can creep into high-quality datasets too.

That information can fall under the following categories:


Sign up for your weekly dose of what's up in emerging technology.
  1. Human labelling errors
  2. Input from different locations or time
  3. Noisy capturing hardware

But, how does one evaluate the value of a single datum (information)? Training a model on the entire dataset and using its prediction performance as the value can be straightforward. However, evaluating the value of a specific piece of information can be tricky on large-scale datasets. 

With regards to this, data valuation has grown up to a widely researched space. Metrics like Shapley values are used to decide what part of data should be given more priority. 

Now, to make this whole process of data valuation robust and facilitate adaptive learning of data values jointly with the target task predictor model, the researchers from Google Cloud AI and UCLA, proposed a meta-learning framework — Data Valuation using Reinforcement Learning (DVRL).

Overview Of DVRL

DVRL is a data value estimator that learns how likely each piece of information can be used in training of the predictor model. This data value estimator is trained using a reinforcement signal of the reward, which is obtained on a small validation set that gives insights on the performance of the target task.

Instead of treating all data samples equally, this work suggests assigning certain information as a lower priority to obtain a higher performance model.

Step by step procedure of data valuation using DVRL:

  1. A batch of training samples is fed DVRL that gives an output, which is a selection probability of a multinomial distribution
  2. Based on this multinomial distribution, the sampler returns the selection vector 
  3. The target task predictor model is trained on samples with a selection vector using gradient-descent optimisation. 
  4. The data values are nothing but the selection probabilities, which rank the samples according to their importance
  5. The loss is evaluated on a small validation set and compared to the moving average of the previous losses to determine the reward 
  6.  Going by this reward, the reinforcement signal updates the data value estimator.

In short, DVRL integrates data valuation with the training of the target task predictor model and determines a reward by quantifying the performance, which in turn is used as a reinforcement signal to learn the likelihood of each datum being used in training of the predictor model.

Experiments show that DVRL consistently outperforms all benchmarks such as Data Shapley, LOO and Influence Function. The researchers state that the trend of noisy label discovery for DVRL can be very close to optimal particularly for the Adult, CIFAR-10 and Flower datasets.

Key Takeaways

This work not only provides a framework for data valuation but also posits the significance of data valuation in suggesting better practices for data collection. For organisations that sell data, data valuation frameworks can help determine the correct value-based pricing of data subsets. The authors wrote that it could enable new possibilities for constructing very large-scale training datasets in a much cheaper way.

According to the authors, the main contributions of this can be summarised as follows:

  • A novel meta-learning framework was proposed for data valuation that is optimised with the target task predictor model
  • Demonstration of how DVRL significantly outperforms competing methods on many images, tabular and language datasets
  • DVRL, unlike previous methods, is scalable to large datasets and complex models, and its computational complexity is not directly dependent on the size of the training set

Link to paper.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.