It is a study that is bound to create ruffles in race and gender bias debate. Princeton University based researchers have uncovered how “machine-learning algorithms trained with ordinary human language available have acquired stereotypical biases from text data that is reflected in our day-to-day culture”. The study, published in the Science journal titled Semantics derived automatically from language corpora contain human-like biases, details how the research can have major implications for AI and machine learning in perpetuating cultural stereotypes.
Is AI perpetuating prejudice? GloVE algorithm comes to the rescue
The research was carried out by Arvind Narayanan, assistant professor of computer science at Princeton University and the Center for Information Technology Policy (CITP) and other academicians Aylin Caliskan and Joanna Bryson, computer scientist from Princeton and Bath University. The researchers used the popular GloVe algorithm, an unsupervised learning algorithm and trained it on 840 billion words to uncover biases in human generated data.
According to Princeton resource, GloVe model for distributed word representation, created at Stanford University works like machine-learning version of the Implicit Association Test. The Implicit Association Test, first developed at the University of Washington is the academic go-to test used for social and pyshcology studies for measuring response times by human subjects who are asked to pair word concepts displayed on a computer screen. The GloVE algorithm was then applied to a pool of targeted words such as ‘nurse’, ‘librarian’, ‘teacher’, ‘scientist’, ‘engineer’, ‘man’, ‘male’, ‘female’ among others.
When racism and sexism creeps into AI
Some of most alarming finding from the study was a) gender bias in occupations. For example, words such as “female” and “woman” were linked with humanities and with arts professions while words such as “male” and “man” drummed up technical jobs closer to math and engineering professions. Another finding that turned heads was the race bias that runs deep in our society. According to the study, b) machine-learning programs quite often linked African American names with unpleasantness as opposed to European or American names.
Bias isn’t new in man-made algorithms
Sexism and racism isn’t new to AI that has long been decried for seeping discrimination in decision making and the so called white-guy problem. From perpetuating inequality in workplaces or legal systems, discrimination is built into machine learning algorithms. In 2015, search giant Google was rapped when its photo app software mistakenly identified two black people as gorillas. Ofcoures, Google apologized.
Even hiring algorithms are skewed wherein 72% of resumes are rooted out by before a human claps eyes on them. Last year, ResumeterPro revealed that up to 72% of resumes got canned by the applicant tracking systems before a human being even got the chance to review them.
The widely popular research Man is to Computer Programmer as Women is to Homemaker in word embeddings authored by Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai from Microsoft Research and Boston University raised widespread concerns with their findings of Google News articles that exhibit female/male gender stereotypes to a large extent.
Most notably amongst them all is Google’s research related to gender equality in Hollywood that threw up troubling insights how a) actresses were most watched in horror films; b) women rarely seen in Academy-award winning films; how role models on-screen made a difference , for example, archery swelled in popularity after characters such as Merida in Brave and Katniss in Hunger Games.
Where AI is going wrong and its implications
It is long believed that a machine learning algorithm is like a newborn baby that is fed and taught words and grammar through petabytes of data. What happens when it takes in all the textual data, the model observes relationships between words based on various factors, including how often they are used. Researchers believe biased datasets reflect the everyday reality, and what’s disturbing is they can amplify it. Narayanan reiterates that fairness and bias in machine learning can have a tremendous impact on our society.
Narayanan cites an example: when you build an intelligent system that learns enough about the properties of language, in the process it also acquires historical cultural associations, which may be objectionable. For example, machine-learning technologies used for, résumé screening were to imbibe cultural stereotypes, it may lead to biased outcomes affecting lives.
Weeding out bias in machine learning
There is a growing body of concern around rooting out gender and other prejudices out of the algorithms. Recently in their paper, researchers from Boston University and Microsoft came up with a methodology of de-biasing algorithms through Neutralize, Equalize and Soften approach in the Identify Gender Subspace applied on a corpus of Google News data. According to the paper, Neutralize ensures that gender neutral words are zero in the gender subspace while Equalize equalizes sets of words outside the subspace and thereby enforces the property that any neutral word is equidistant to all words in each equality set.
Meanwhile, Google in conjunction with The Geena Davis Institute on Gender in Media started collecting data on movies and developed a Geena Davis Inclusion Quotient (GD-IQ)—a tool that not only identifies a character’s gender, but gleans how long each actor spoke, and were on-screen. Essentially, GD-IQ uses machine learning to recognize patterns the movie goers would usually overlook.
As machine learning becomes the new normal in our daily lives, society and data have become two sides of the same coin and relationship between AI and gender diversity is taking roots. While we are long way off in removing human bias in word embeddings, the stirrings are already felt as legacy giants Microsoft and Google take the first step in making AI more gender neutral and weeding out other prejudices.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
What's Your Reaction?
Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.