MITB Banner

How Reinforcement Learning Can Help In Data Valuation

Share

It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like Scale.ai who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline. Noise can creep into high-quality datasets too.

That information can fall under the following categories:

  1. Human labelling errors
  2. Input from different locations or time
  3. Noisy capturing hardware

But, how does one evaluate the value of a single datum (information)? Training a model on the entire dataset and using its prediction performance as the value can be straightforward. However, evaluating the value of a specific piece of information can be tricky on large-scale datasets. 

With regards to this, data valuation has grown up to a widely researched space. Metrics like Shapley values are used to decide what part of data should be given more priority. 

Now, to make this whole process of data valuation robust and facilitate adaptive learning of data values jointly with the target task predictor model, the researchers from Google Cloud AI and UCLA, proposed a meta-learning framework — Data Valuation using Reinforcement Learning (DVRL).

Overview Of DVRL

DVRL is a data value estimator that learns how likely each piece of information can be used in training of the predictor model. This data value estimator is trained using a reinforcement signal of the reward, which is obtained on a small validation set that gives insights on the performance of the target task.

Instead of treating all data samples equally, this work suggests assigning certain information as a lower priority to obtain a higher performance model.

Step by step procedure of data valuation using DVRL:

  1. A batch of training samples is fed DVRL that gives an output, which is a selection probability of a multinomial distribution
  2. Based on this multinomial distribution, the sampler returns the selection vector 
  3. The target task predictor model is trained on samples with a selection vector using gradient-descent optimisation. 
  4. The data values are nothing but the selection probabilities, which rank the samples according to their importance
  5. The loss is evaluated on a small validation set and compared to the moving average of the previous losses to determine the reward 
  6.  Going by this reward, the reinforcement signal updates the data value estimator.

In short, DVRL integrates data valuation with the training of the target task predictor model and determines a reward by quantifying the performance, which in turn is used as a reinforcement signal to learn the likelihood of each datum being used in training of the predictor model.

Experiments show that DVRL consistently outperforms all benchmarks such as Data Shapley, LOO and Influence Function. The researchers state that the trend of noisy label discovery for DVRL can be very close to optimal particularly for the Adult, CIFAR-10 and Flower datasets.

Key Takeaways

This work not only provides a framework for data valuation but also posits the significance of data valuation in suggesting better practices for data collection. For organisations that sell data, data valuation frameworks can help determine the correct value-based pricing of data subsets. The authors wrote that it could enable new possibilities for constructing very large-scale training datasets in a much cheaper way.

According to the authors, the main contributions of this can be summarised as follows:

  • A novel meta-learning framework was proposed for data valuation that is optimised with the target task predictor model
  • Demonstration of how DVRL significantly outperforms competing methods on many images, tabular and language datasets
  • DVRL, unlike previous methods, is scalable to large datasets and complex models, and its computational complexity is not directly dependent on the size of the training set

Link to paper.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.