How Reinforcement Learning Can Help In Data Valuation

It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like Scale.ai who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline. Noise can creep into high-quality datasets too.

That information can fall under the following categories:

  1. Human labelling errors
  2. Input from different locations or time
  3. Noisy capturing hardware

But, how does one evaluate the value of a single datum (information)? Training a model on the entire dataset and using its prediction performance as the value can be straightforward. However, evaluating the value of a specific piece of information can be tricky on large-scale datasets. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

With regards to this, data valuation has grown up to a widely researched space. Metrics like Shapley values are used to decide what part of data should be given more priority. 

Now, to make this whole process of data valuation robust and facilitate adaptive learning of data values jointly with the target task predictor model, the researchers from Google Cloud AI and UCLA, proposed a meta-learning framework — Data Valuation using Reinforcement Learning (DVRL).


Download our Mobile App



Overview Of DVRL

DVRL is a data value estimator that learns how likely each piece of information can be used in training of the predictor model. This data value estimator is trained using a reinforcement signal of the reward, which is obtained on a small validation set that gives insights on the performance of the target task.

Instead of treating all data samples equally, this work suggests assigning certain information as a lower priority to obtain a higher performance model.

Step by step procedure of data valuation using DVRL:

  1. A batch of training samples is fed DVRL that gives an output, which is a selection probability of a multinomial distribution
  2. Based on this multinomial distribution, the sampler returns the selection vector 
  3. The target task predictor model is trained on samples with a selection vector using gradient-descent optimisation. 
  4. The data values are nothing but the selection probabilities, which rank the samples according to their importance
  5. The loss is evaluated on a small validation set and compared to the moving average of the previous losses to determine the reward 
  6.  Going by this reward, the reinforcement signal updates the data value estimator.

In short, DVRL integrates data valuation with the training of the target task predictor model and determines a reward by quantifying the performance, which in turn is used as a reinforcement signal to learn the likelihood of each datum being used in training of the predictor model.

Experiments show that DVRL consistently outperforms all benchmarks such as Data Shapley, LOO and Influence Function. The researchers state that the trend of noisy label discovery for DVRL can be very close to optimal particularly for the Adult, CIFAR-10 and Flower datasets.

Key Takeaways

This work not only provides a framework for data valuation but also posits the significance of data valuation in suggesting better practices for data collection. For organisations that sell data, data valuation frameworks can help determine the correct value-based pricing of data subsets. The authors wrote that it could enable new possibilities for constructing very large-scale training datasets in a much cheaper way.

According to the authors, the main contributions of this can be summarised as follows:

  • A novel meta-learning framework was proposed for data valuation that is optimised with the target task predictor model
  • Demonstration of how DVRL significantly outperforms competing methods on many images, tabular and language datasets
  • DVRL, unlike previous methods, is scalable to large datasets and complex models, and its computational complexity is not directly dependent on the size of the training set

Link to paper.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.