Can The ML Models Ever Be Devoid Of Labelling Biases

A machine learning model can only be as biased as the humans who frame it.

The datasets have to be prepared or have to be collected from some source which is a collateral of human interactions. The collected data will be cleaned and appended with classes. These sub-groups no matter how unbiased they were planned to be, there still lies an underwritten, underlying bias. To remove prejudices from a model might not be an impossible task but can any application which serves humans be immune to the human itself. And, even if the human element is considered, how much of it is too much.


Sign up for your weekly dose of what's up in emerging technology.

The typical life cycle of deployment machine learning models involves a training phase, where a typical data scientist develops a model with good predictive based on historical data. This model is put into production with the hope that it would continue to have similar predictive performance during the course of its deployment.

But there can be problems associated with the information that is deployed into the model such as:

Download our Mobile App

  • an incorrect model gets pushed
  • incoming data is corrupted
  • incoming data changes and no longer resembles datasets used during training

The notions of machine learning fairness can be bottled down to the following facets of data pre-processing:

  • Demographic parity
  • Equal opportunity
  • Equalized odds
  • Disparate impact

Machine learning engineers work around bias or the offsets in a model by drawing insights from the output, gauging the losses,gouging through tonnes of data and repeating till a agreeable results have been obtained.

This is a traditional process which takes time but works decently. An alternative to this approach to this is the Lagrangian approach, a mathematical method to find the local maxima and local minima of a function when provided with equality constraints. This too, comes with its own set of complexities.

Now, the researchers at Google, tackle this problem of bias in labeling, provide a mathematical formulation how biases arise in labeling and how can this be mitigated.  They propose a new framework to model how bias can arise in a dataset, on the assumption that there exists an unbiased ground truth.

The bias is corrected by re-weighting the training examples. These approximated weights are then tweaked for generating fair classifiers.

This framework can also be applied to settings where the features too, are subjected to bias.

Source: RaniHorev

The bias correction takes place as follows:

  • It is assumed that the biased dataset (y_bias) is the result of a manipulation of a (theoretical) unbiased dataset y_true.
  • Learn the values of λk, from constraint violations, which represent the connection between y_true and y_bias.
  • The learned λk values are used to calculate the weight wk of each training sample.
  • Biased samples get low weights and unbiased samples, high.
  • These weights are used to train unbiased classifier.
  • Evaluate the constraints from the loss function and retrain the classifier with new weights.

The above figure illustrates the approach to training an unbiased classifier and the researchers assumption of an unknown label and how it has been adjusted to produce labels.

This new method doesn’t modify any labels but instead the bias is corrected by changing how sample points are distributed across the dataset.

The researchers behind this new framework Heinrich Jiang and Ofir Nachum, tested their model on data from datasets like Bank marketing, communities and crime where each datapoint represents a community and the task is to predict the crime rate in a community and other datasets for predicting credibility of customers for issuance of credit cards.

And, when trained on MNIST dataset, this model outperformed unconstrained and Lagrangian methods. This framework is platform independent and can be used to reinforce machine learning fairness.

Know more about the work here.


Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox