Things To Remember While Using Demographic Data In ML Models

Published on February 18, 2021
by Kashyap Raibagi

Machine learning algorithms improve the accuracy of human decision-making by leaps and bounds. Such algorithms take multiple parameters into account to conduct analysis and come to a decision.

Cognitive biases influence human decisions. And when fallible humans build machine learning models, biases creep into algorithms. The decisions made by biased machines can have far-reaching consequences. The repercussions are more severe when models use demographic data like age, gender, race, or zip codes since they can impact communities as a whole.

In this article, we try to analyse the use of demographic data when building ML models to produce fair AI.

Pitfalls

Machine learning is used to develop decision-making systems across sectors. Models could help with diagnosis in healthcare, perform market segmentation in retail, or build recidivism models to reduce crime.

In some cases, demographic data is essential, especially when building diagnosis or prediction models in healthcare. For many illnesses, age, gender or socio-demographic factors like income or neighbourhood become crucial decision-making parameters.

For instance, age is a risk factor for many diseases, including cancer or cardiovascular conditions. Gender can play an important role in obesity disphormism or coronary artery diseases. Economically weaker neighbourhoods are at a higher risk of infectious diseases like dengue or tuberculosis.

However, introducing demographic characters has led to discrimination against people, predominantly minority or socio-economically weaker communities. For instance, a recidivism model used in the US consistently put blacks at a higher risk than white people in facing the heat of law, even when the formers’ crimes were significantly less severe.

In another instance, Amazon’s recruitment model did not rate candidates in a gender-neutral way as the model was trained on resumes mostly from men. This resulted in the system penalising resumes with the word women in them.

The discrimination engenders from introducing demographic details in models, reflecting the inherent bias in human beings. From an ethical perspective, using demographic information to make decisions, like assigning recidivism scores based on race or allocating a bank loan, is prejudicial.

Biases snuck in from incorporating demographic details without the machine learning developer’s knowledge like in Amazon’s recruitment model. In that case, developers should take extra caution while deploying such an algorithm in the real world. Third-party audits should be compulsory for any algorithms that make decisions for human beings.

Handle With Care

On the flip side, some machine learning models have shown the need to use inclusive demographic representation to mitigate bias.

For instance, Timnit Gebru, the AI ethicist who recently got fired from Google, published a paper in 2018 that found significant disparities in facial recognition systems developed by the Big Tech. Her study revealed that all classifiers in these models performed the best for lighter male individuals but the worst for dark women. Flawed facial recognition system models have led to a Black US citizen wrongly arrested due to misidentification.

Whether algorithms like facial recognition should be deployed in the first place is out of this article’s scope, the study showed that algorithm development needed more inclusivity and analysis on features specific to demography; racial traits in this case.

Inclusive demographic data could help mitigate bias, but the decision as to when and how to use them for bias mitigation is critical. Partnership on AI addressed such concerns in a report in 2020.

The first concern is, how should demographic data be defined. While the US and the EU have taken an effort to categorise demographic data as ‘protected class data’ or ‘sensitive personal data’, many countries, including India, have weak data protection laws. In such a case, collecting demographic data might do more harm than good.

Further, the decision-makers should be careful that their approach to mitigate bias is not itself biased. For instance, self-selection bias (collecting data from only those who want to give it to you) can compound the problem.

Lastly, once the data is collected, it is essential to ensure that it is used towards the original objective.

Wrapping Up

Some models present the absolute need for demographic details, especially in healthcare. In such cases, extra caution should be applied to mitigate biases. Further, we need stricter policies and regulations to enable the fair use of demographic data to build ML models or mitigate biases in existing models.

Access all our open Survey & Awards Nomination forms in one place >>

Kashyap Raibagi

Kashyap currently works as a Tech Journalist at Analytics India Magazine (AIM). Reach out at kashyap.raibagi@analyticsindiamag.com

Things To Remember While Using Demographic Data In ML Models

Pitfalls

Handle With Care

Wrapping Up

Kashyap Raibagi

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru