Google’s New ML Fairness Gym Has A Clear Mission — Track Down Bias & Promote Fairness In AI

Human societies are extremely complex. The cultural, racial and geographical differences around the globe and the lack of curated data make ‘fairness’ in technology a huge challenge. Now, in an attempt to track the long term societal impacts of artificial intelligence, Google researchers recently released a machine learning fairness gym. They have done this by using Google’s OpenAI Gym.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Testing Fairness Using OpenAI Gym


Download our Mobile App



OpenAI’s Gym is a toolkit for developing and comparing reinforcement learning algorithms and is compatible with any numerical computation library, such as TensorFlow or Theano.

The gym library is a collection of test problems — environments — that one can use to work out reinforcement learning algorithms. Google researchers have used this platform to build their own fairness gym.

To explain how bias creeps into the models, the researchers in their blog, have given the example of lending money via credit score. The strategies or metrics that were used to classify whether an individual qualifies for loan or not, were unfair at times according to their analysis. 

In their paper titled, Fairness is not static, they discuss in detail about how the simulation experiments were carried out. They divided the agents in the environment into 3 types.

  • a static agent that implements a naïve, one-shot classification strategy.
  • a robust agent that implements a similar one-shot policy, but uses the robust classification algorithm. 
  • Then a continuous agent that gathers an initial set of unmanipulated applicants, then continuously retrains a non-robust classifier based on the subsequent manipulated scores and labels that it observes.

The continuous agent, believe the researchers, is a reasonable model of deployed machine learning systems. 

Using the gym, the Google team has found that in the lending money experiment, the equal opportunity agent (EO agent) overlends to the disadvantaged group (which initially has a lower average credit score) by sometimes applying a lower threshold for the group than would be applied by the max reward agent. 

This causes the credit scores of one group to decrease more than other group, resulting in a wider credit score gap between the groups than in the simulations with the max reward agent. 

Depending on whether the indicator of welfare is the credit score or total loans received, it could be argued that one agent is better or more detrimental to other groups than the max reward agent.

They also found out that equal opportunity constraints — enforcing equalised TPR between groups at each step — does not equalise TPR (true positive rates or actual positives) cases in aggregate over the simulation. 

This also indicates that how the equality of opportunity metric is difficult to interpret when the underlying population is evolving and suggests that more careful analysis is necessary to ensure that the ML system is having the desired effects.

Many existing tools for evaluating fairness concerns don’t work well on large scale datasets and models. 

Here is a list of tools that promote ML fairness:

Google’s Fairness Indicator

Fairness Indicators is built on top of TensorFlow Model Analysis, a component of TensorFlow Extended (TFX) that can be used to investigate and visualise model performance. Fairness Indicators can also be accessed in TensorBoard for evaluating other real-time metrics. 

Microsoft’s Fairlearn

The Fairlearn project seeks to enable anyone involved in the development of artificial intelligence systems to assess their system’s fairness and mitigate the observed unfairness. The Fairlearn repository contains a Python package and Jupyter notebooks with examples of usage.

IBM’s AI Fairness 360 

The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases and is designed to translate algorithmic research from the lab into the actual practice of domains as wide-ranging as finance, human capital management, healthcare, and education. 

These tools make it possible to investigate the performance of models and their underlying biases and even visualise the results like the way Google’s fairness indicators integrates with the What-If Tool to load those specific data points and help in counterfactual analysis.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

The Great Indian IT Reshuffling

While both the top guns of TCS and Tech Mahindra are reflecting rather positive signs to the media, the reason behind the resignations is far more grave.

OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data​​, and—perhaps to make it worse—that data might be yours.