MITB Banner

Top 6 Common Biases In ML Models

Share

Machine learning is not just about machines. At least not yet. There is still a human element in the loop, and it looks like this will continue for some time. In other words, artificial general intelligence (AGI) is a distant dream. Since humans are interfering in the learning processes of ML models, the underlying biases surface in the form of inaccurate results. 

Having an unbiased model is almost impossible as humans generate the data, and a model is only as good as the data it is fed. So, it is the job of the data engineer to keep an eye on the ways in which bias can enter the system. According to Google developers team, the following are the commonly encountered biases during the training of a machine learning model:

Automation Bias

Automation bias is believed to occur when a human decision-maker favours recommendations made by an automated decision-making system over the information made without automation, even when it is found that the automated version is dishing out errors.

Confirmation Bias

The tendency to search for or interpret information in a way that confirms one’s prejudices (hypothesis). Machine learning developers might sometimes tend to collect data or label them in a way that would satisfy their unresolved prejudices. These biases seep into the results and sometimes blow up on a large scale.  

This bias also stems from another form of bias known as experimenter bias, where the data scientist would train a model until their previously held hypothesis has been confirmed.

Group Attribution bias

This bias commonly occurs when the assumption that what is good for one person is good for the group is taken too seriously. The effects of this bias can further worsen if a convenience sampling is used for data collection. The attributions made in this way rarely reflect reality.

Out-Group Homogeneity Bias

Consider two groups of families. One group has a couple of twins, while the other does not. When a non-twin family is asked to distinguish between twins, they might falter, whereas the twin’s parents will identify with ease and might even give a nuanced description. So, for the non-twin family, these twins are all but the same. The brevity with which assumptions are made on groups outside ours leads to out-group homogeneity bias. Similarly, there is In-Group Bias as well, which works the other way around.

Selection Bias

Selection bias is a result of errors in the way sampling is done. For example, we need to build an ML model that predicts audience sentiments with regard to films. As part of collecting data, if the audience is handed over a survey form, then the following forms of bias can appear:

  • Coverage bias: When the population represented in the dataset does not match the population that the machine learning model is making predictions about. Taking the same movie example as above, by sampling from a population who chose to see the movie, the model’s predictions may not generalize to people who did not already express that level of interest in the film.
  • Sampling bias: This occurs when the sample is not random or diverse. Suppose only the reviews of front row people in a theatre are taken instead of a random group, then, needless to say, we will hardly grasp the sentiments of the audience.
  • Non-response bias: This bias is usually from the data end. A sigh of relief for the data collectors. This bias will occur when certain sections of the audience choose not to review the movie. Suppose the neutral audience keeps away from reviewing and only the ones with strong opinions, usually the fans, pile up in the reviews, then the results will lean in favour of the film. This bias is also known as participation bias.

Reporting Bias

Suppose an NLP model is trained on the dataset that contains news from the last few decades. Though calling news as biased is an understatement, there is a peculiar kind of bias that emerges out of the way the actions are documented. For example, if the word ‘laughed’ is more prevalent than ‘breathed’ in a story, then a machine learning model that takes the frequency of words into account will conclude that laughing is more common than breathing! 

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.