Active Hackathon

Guide to AI Fairness 360: An Open Source Toolkit for Detection And Mitigation of Bias in ML Models

AIF 360

As time passed, AI and ML have become more integral parts of our day-to-day life. People in today’s world are exposed to this new wave of technology in one way or another without even knowing it. Some of the most common examples of it are home assistants(Alexa, Siri, etc), recommendation systems(Amazon, Youtube, etc), chatbots, etc. As per some of the reports, the AI market will grow rapidly in the near future with the capital of $190 billion. This sudden push of AI is major because of the cost reduction(source) factor and increased effectiveness of existing models((source).

61% of business executives with an innovation strategy say they are using AI to identify opportunities in data that would otherwise be missed (source). This exploration has led AI to make critical decisions in different domains like finance, law, administration, etc. But some people are still sceptical about its fairness, understandability, security, and accountability. This gives birth to state-of-the-art Trusted AI. Trusted AI explains the truthfulness of AI in four terms: 


Sign up for your weekly dose of what's up in emerging technology.
  • Fairness: Many case studies have shown AI models contain Unwanted Bias which gives prejudiced patterns and it is not easy to remove. Fairness deals with the model to give generic truth and try to discriminate.
  • Explainability: It refers to the explanation of predictions made by the AI. For example the people who are building the model need to give the explanation in terms of the model/system performance while if the people are the end user, we need to provide an explanation of the AI system results that will appeal to the audience.
  • Robustness: This ensures the AI system must be strong against malicious action. The best example of this adversarial example of a tortoise. Few pixel changes in the image of tortoise can lead the AI system to classify it as a rifle..
  • Assurance : This ensures the AI system to perform in an optimal manner and is secure to use. This helps in gaining the trust of the user to clearly communicate what this service is supposed to do and what are its limitations. 

To address these four points, the researchers of IBM have developed a toolkit called AI Fairness 360. It is the first system to bring together bias metrics, bias mitigation algorithms, bias metric explanation and industrial usability under one toolkit. The goal is to provide a comprehensive study of fairness metric and mitigation algorithms which helps the industry to make an ideal AI system. AIF360 is an open-source library containing algorithms for each and every step involved in the AI lifecycle. The package for this toolkit is available in both Python and R. This toolkit provides:

  • An architecture for dataset representation and algorithms for bias detection, mitigation and explanation.
  • An explanation of all the metrics.
  • An interactive web-user interface.

AIF360 impart both ease of use and extensibility. The figure mentioned below shows the generic pipeline of bias mitigation. It consists of loading the data into a dataset object, transforming it in a fairer dataset, applying fair pre-processing algorithms, learning classifiers from transformed datasets and obtaining prediction from the classifier. The metrics can be analysed original, transformed, and predicted datasets.

  • These are the state-of-the-art algorithms provided by this toolkit(below). To know about it more, please click here.
  • These are the metrics used by this toolkit(below). Learn more about it, here:
  1. Statistical Parity Difference
  2. Equal Opportunity Difference
  3. Average Odds Difference
  4. Disparate Impact
  5. Theil Index
  6. Euclidean Distance
  7. Mahalanobis Distance
  8. Manhattan Distance
  • Here are some important terminologies related to AIF360.

Let’s get started with the implementation part.


Install AIF360 library through pip.

!pip install aif360


Detecting and mitigating age bias on credit decisions

This tutorial will explain how AIF360 works. The algorithms used for bias mitigation is Optimized Preprocessing and the fairness metric is the mean difference. The dataset used in this tutorial is German Credit Dataset. The loan process in the finance sector represents a good example of illegal bias. In this example, we are going to predict if an applicant should be given credit based on various features from a typical credit application. 

The protected attribute(attributes that are of interest) will be “Age”, with “1” (older than or equal to 25) and “0” (younger than 25) being the values for the privileged and unprivileged groups, respectively. Firstly, we will check for bias in the initial training data, mitigate the bias, and recheck. The full code implementation is present here.

Import all the required files.

# Load all necessary packages
import sys
import numpy as np
from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing
from IPython.display import Markdown, display

Load the dataset, set the protected attribute to be age and split the whole dataset in training and testing in the ratio of 70:30. For this demo, we will use only the train data. For whole work flow.

#loading the dataset
# this dataset also contains protected attribute for "sex" which we do not
# consider in this evaluation
dataset_orig = GermanDataset(
    privileged_classes=[lambda x: x >= 25],      # age >=25 is considered privileged
    features_to_drop=['personal_status', 'sex'] # ignore sex-related attributes
#dividing the dataset into train and test
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

Now that we’ve identified the protected attribute ‘age’ and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset. One simple test is to compare the percentage of favourable results for the privileged and unprivileged groups, subtracting the former percentage from the latter. Here, a negative value will indicate a less favourable outcome for the underprivileged group. The code for this is undermentioned. 

##for computing fairness one simple test is to compare the percentage of favorable 
##results for the privileged and unprivileged groups, 
##subtracting the former percentage from the latter
##is implemented in the method called mean_difference on the BinaryLabelDatasetMetric class.
##The code below performs this check and displays the output.
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train,                                    unprivileged_groups=unprivileged_groups,                                         privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

The above value -0.1297 showed that the privileged group was getting almost 13% more positive outcomes in the training dataset. Since this is the problem for getting the biases in the model, we will mitigate this biasness in the training dataset. The algorithm we are going to use is Reweighing Algorithm that means this mitigation of biasness will be done before building the model. This algorithm will transform the dataset to have more equity in positive outcomes on the protected attribute for the privileged and unprivileged groups.

##just like the sklearn library, create the object of Reweight algorithm
##then fit and transform it on a training dataset.
RW = Reweighing(unprivileged_groups=unprivileged_groups,
dataset_transf_train = RW.fit_transform(dataset_orig_train)

Now we will again calculate the mean_difference on dataset_transf_train dataset i.e., fairness metric. We will find that now the mean_difference has become zero. 

metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train,                                                unprivileged_groups=unprivileged_groups,                                              privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

Now, we have mitigated the bias in the training data, we can build our required model and train it on dataset_transf_train. You can check the full demo, here. Steps for that are mentioned below:

  1. Install the required libraries, download the dataset and move files as suggested.
  2. Import the libraries.
  3. Initialize the dataset, protected_attribute and privileged and unprivileged groups
  4. Split the dataset into a train set, validation set and test set.
  5. Apply Logistic Regression on the above original datasets, validate it and test it.
  6. Calculate all the metrics for the original dataset.
  7. Plot all the metrics for the original dataset.
  8. Detect the biasness in original dataset.
  9. Mitigate the biasness and make a transformed dataset.
  10. Apply Logistic Regression on Transformed dataset.
  11. Calculate all the metrics for the transformed dataset.
  12. Plot all the metrics for the transformed dataset.
Predictions from original testing data     Predictions from transformed testing data
Classification threshold used = 0.8514Classification threshold used = 0.8514
Balanced accuracy = 0.6703Balanced accuracy = 0.6557
Statistical parity difference = -0.2230Statistical parity difference = -0.1453
Disparate impact = 0.5502Disparate impact = 0.6964
Average odds difference = -0.1668Average odds difference = -0.0845
Equal opportunity difference = -0.1882Equal opportunity difference = -0.1757
Theil index = 0.3618Theil index = 0.3733


  1. For Disparate Impact: Absolute value of (1-disparate impact) must be close to zero, for classifier predictions to be fair.  It is very clear from the graph that for the classifier trained with the original dataset, the at the best classification rate, the value of (1-disparate impact) is 0.5502 while for the classifier trained with the transformed dataset, at the best classification rate, the value of (1-disparate impact) is 0.3.
The left-side graph is on testing original dataset and right-side is on testing transformed dataset
  1. For average odd difference: 

 average odds difference = 0.5((FPR_unpriv-FPR_priv)+(TPR_unpriv-TPR_priv)

Hence, the value of the average odds difference must be close to zero, for the classifier predictions to be fair. From the graph, we can conclude that the value of the average odds difference for the classifier trained on the original dataset is quite high as compared to the classifier trained on the transformed dataset.

The left-side graph is on testing original dataset and right-side is on testing transformed dataset


In this article, we have discussed the AIF360 toolkit which helps to detect and mitigate the bias in the given data. It evaluates the security, providing robustness to the predictions. More and more algorithms and metrics are being developed and are added to this toolkit, to make the AI system more reliable. As IBM said “Moving forward, “build for performance” will not suffice as an AI design paradigm. We must learn how to build, evaluate and monitor for trust.” 

Resources used above:

More Great AIM Stories

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM