Last updated November 16, 2020
In AI Mysteries

How To Use TensorFlow For ML Fairness

Published on February 25, 2020
by Ram Sagar

The whole talk about machine learning fairness is finally entering the realms of practice as companies like Google are racing up to establish an ecosystem of fairer ML practice through their tools. Be it a What-if tool or Fairness Gym or Fairness indicators, Google’s AI team has been gifting developers community with a new tool every month. Today, they came up with a package that can help developers build models to constrain the metrics using TensorFlow.

Metrics in a machine learning context can mean many things. Here are a few examples:

Precision on members of certain groups
True positive rates of certain countries and their population
The recall rates of cancer diagnoses with regards to age and gender.

To achieve minimal damage due to wrongful combination of metrics, the team at TensorFlow, introduce TensorFlow Constrained Optimisation (TFCO).

Google Wants You Try TFCO

TFCO is mainly developed, wrote the developers, is to mitigate creation and optimisation constrained problems.

It is easy because these problems are written in terms of linear combinations of rates. Here “rate” can be the false positive rate, which is the number of negatively-labelled examples on which the model makes a positive prediction, divided by the number of negatively-labelled examples.

TFCO is a library for optimising inequality-constrained problems in TensorFlow. Usually, both the objective function, which can be accuracy maximised or reduced for loss and constraints are represented as Tensors, giving users the maximum amount of flexibility in specifying their optimisation problems.

Constructing these Tensors can be cumbersome, and users can avail the helper functions provided by the TensorFlow team for easier construction of constrained optimisation problems.

The above picture illustrates a binary classification over a dataset with two protected groups, which are blue and orange in this case. The densities are represented as ovals. The positive and negative signs denote the labels.

The decision boundary is drawn as a black dashed line separating positive predictions (regions above the line) and negative (regions below the line) labels, chosen to maximise accuracy.

In TFCO, the objective to minimise and constraints to impose are represented as algebraic expressions (using normal Python operators) of simple basic rates.

Here’s a snippet on how TensorFlow can be used to predict toxicity in wiki comments:

import tensorflow_constrained_optimization as tfco

#Download the data

toxicity_data_url = ("https://raw.githubusercontent.com/conversationai/"

"unintended-ml-bias-analysis/master/data/")

data_train = pd.read_csv(toxicity_data_url + "wiki_train.csv")

data_test = pd.read_csv(toxicity_data_url + "wiki_test.csv")

data_vali = pd.read_csv(toxicity_data_url + "wiki_dev.csv")

#Tokenize the comments

tokenizer = text.Tokenizer(num_words=hparams["max_num_words"])

tokenizer.fit_on_texts(data_train["comment"])

def prep_text(texts, tokenizer, max_sequence_length):

…..

#Consider a subset of the identity terms provided with the dataset and group them into four broad topic groups: sexuality, gender identity, religion and race.

terms = ‘sexuality’:[‘straight’,...]

'religion': ['christian', 'muslim', ….'],

'race': ['african', 'african american', 'black', 'white',.....']}

#Before proceeding to training the model, we will write functions

to evaluate the overall error rate,
the overall false negative rate and
the overall false positive rate for the given labels and predictions:

def error_rate(labels, predictions):

# Returns error rate for given labels and predictions.

# Recall that the labels are binary (0 or 1).

signed_labels = (labels * 2) - 1

return np.mean(signed_labels * predictions <= 0.0)

def false_negative_rate(labels, predictions):

# Returns false negative rate for given labels and predictions.

if np.sum(labels > 0) == 0: # Any positives?

return 0.0

else:

return np.mean(predictions[labels > 0] <= 0)

On plotting the findings, the unconstrained model accuracy report looks like this:

This is followed up by training for constraints on false-positive rates and robust optimisation.

This notebook by developers at Google shows how to train a fair classifier to predict whether a comment posted on a Wiki Talk page contains toxic content. The notebook discusses two criteria for fairness and shows how to enforce them by constructing a rate-based optimisation problem.

Check the full code here.

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

Watch More

How To Use TensorFlow For ML Fairness

Google Wants You Try TFCO

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.