It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like Scale.ai who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline. Noise can creep into high-quality datasets too.
That information can fall under the following categories:
Sign up for your weekly dose of what's up in emerging technology.
- Human labelling errors
- Input from different locations or time
- Noisy capturing hardware
But, how does one evaluate the value of a single datum (information)? Training a model on the entire dataset and using its prediction performance as the value can be straightforward. However, evaluating the value of a specific piece of information can be tricky on large-scale datasets.
With regards to this, data valuation has grown up to a widely researched space. Metrics like Shapley values are used to decide what part of data should be given more priority.
Now, to make this whole process of data valuation robust and facilitate adaptive learning of data values jointly with the target task predictor model, the researchers from Google Cloud AI and UCLA, proposed a meta-learning framework — Data Valuation using Reinforcement Learning (DVRL).
Overview Of DVRL
DVRL is a data value estimator that learns how likely each piece of information can be used in training of the predictor model. This data value estimator is trained using a reinforcement signal of the reward, which is obtained on a small validation set that gives insights on the performance of the target task.
Instead of treating all data samples equally, this work suggests assigning certain information as a lower priority to obtain a higher performance model.
Step by step procedure of data valuation using DVRL:
- A batch of training samples is fed DVRL that gives an output, which is a selection probability of a multinomial distribution
- Based on this multinomial distribution, the sampler returns the selection vector
- The target task predictor model is trained on samples with a selection vector using gradient-descent optimisation.
- The data values are nothing but the selection probabilities, which rank the samples according to their importance
- The loss is evaluated on a small validation set and compared to the moving average of the previous losses to determine the reward
- Going by this reward, the reinforcement signal updates the data value estimator.
In short, DVRL integrates data valuation with the training of the target task predictor model and determines a reward by quantifying the performance, which in turn is used as a reinforcement signal to learn the likelihood of each datum being used in training of the predictor model.
Experiments show that DVRL consistently outperforms all benchmarks such as Data Shapley, LOO and Influence Function. The researchers state that the trend of noisy label discovery for DVRL can be very close to optimal particularly for the Adult, CIFAR-10 and Flower datasets.
This work not only provides a framework for data valuation but also posits the significance of data valuation in suggesting better practices for data collection. For organisations that sell data, data valuation frameworks can help determine the correct value-based pricing of data subsets. The authors wrote that it could enable new possibilities for constructing very large-scale training datasets in a much cheaper way.
According to the authors, the main contributions of this can be summarised as follows:
- A novel meta-learning framework was proposed for data valuation that is optimised with the target task predictor model
- Demonstration of how DVRL significantly outperforms competing methods on many images, tabular and language datasets
- DVRL, unlike previous methods, is scalable to large datasets and complex models, and its computational complexity is not directly dependent on the size of the training set