Garbage In, Garbage Out: The Problem Of Data Labeling

In 2018, Amazon built a machine learning (ML) algorithm for hiring new candidates. However, because of substantial gender bias, it was scrapped soon. The fault did not lay in the algorithm but in the data that it was trained on. 

The ML algorithm was trained on Amazon’s previous hiring data. However, since the tech giant did not have a substantial ratio of men and women in jobs, the algorithm became biased towards men. Systematic bias was rationalised and reinforced through ML.

Researchers from San Diego, Berkeley, and Webster Pacific investigated the training process of algorithms. The human labelled data exposed various faults in the data labelling process. The researchers analysed the data labelling practices and provided recommendations to improve human labelling.

Supervised learning in ML allows one to collect data and produce an output from a previous experience. Models trained on labelled data are only as good as the quality of data. Most ML experts only focus on learning, research, education and all that is done when a good standard of data is achieved. Still, it is imperative to discuss the reliability of data in the first place. 

Scope of the study

The researchers examined 141 papers from fields of medicine, biology, and humanities. Out of the sample of all the papers presented, 27 percent used machine labelled data; 41 percent used human-labelled data; 27 percent used novel human labelled datasets; 5 percent failed to provide any information. 

Findings of the study 

Only half of the studies that used human labelling confessed to providing the labellers with documents and videos to aid during data labelling. Additionally, there was a vast difference in metrics used to rate whether annotators agreed or disagreed with particular labels. 

The study empirically investigates and discusses a wide range of issues and concerns around the production or labelling and use of training data. The curation process to produce high-quality datasets for ML models involves skills, expertise, and care– especially when humans individually label items. 

The results can be false when datasets are not gold standard, although they are assumed to be so. Supervised ML models are typically evaluated using held-out data from the original dataset. If a flaw were made in the training dataset, the result would be disastrous. The consequences are grave when such algorithms are used for subjective decision making like hiring, justice, and loan processing. 

Best practices

Most of the data in social sciences are in the form of structured content analysis. The methodology used in classification is to convert qualitative and unstructured data into categorical or quantitative data. The method would require human annotators and labellers.

Human labellers should be well paid along with aids and training to help them navigate the monotonous work. Structured content is a domain-specific task. It requires the labeller to have domain knowledge and domain-independent expertise to manage a team of labellers.

The rise of crowdsourcing culture, with platforms like Amazon Mechanical Turk, has led to a decrease in the accuracy of data labelling. However, it can be increased by using machine labelled data or training before hiring human labellers to perform the task.


Even with the best of practices, there are some challenges in data labelling. To begin with– the lack of domain-specific expertise of labellers might misinterpret or miss critical details when examining datasets. Additionally, the reliability and reproducibility of data is also a concern. Furthermore, difficulties arise in getting a medium-sized team to build a consensus around reducing complex objects to quantifiable data. Moreover, between the field’s overall adherence to methodological best practices and researchers’ rates of reporting commitment to these practices, one can draw a hypothesis of an inverse relationship. This happens if such traditions become routine and mundane, leaving an implicit bias in publications. 

Download our Mobile App

Meenal Sharma
I am a journalism undergrad who loves playing basketball and writing about finance and technology. I believe in the power of words.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

Career Building in ML & AI

31st May | Online

31st May - 1st Jun '23 | Online

Rakuten Product Conference 2023

15th June | Online

Building LLM powered applications using LangChain

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox