Council Post: Global guidelines for building trustworthy AI

To preserve data privacy, model-agnostic approaches based on the general data anonymisation principles are preferred.

Building public trust in AI has become critical more than ever. The public and private entities have a huge role in shaping guidelines to ensure fairness in AI systems. In this article, we will discuss international guidelines for building trustworthy AI and how to incorporate them.

International guidelines

Of late, we have seen a lot of efforts to chalk out international guidelines for trustworthy AI systems. Ethics Guidelines for Trustworthy AI by the European Commission is a good case in point. As per their guidelines, the requirements of trustworthy AI include:  

1. Human agency and oversight, including fundamental rights, human agency and human oversight.

2. Technical robustness and safety, including resilience to attack and security, fallback plan and general safety, accuracy, reliability, and reproducibility.

3. Privacy and data governance, including respect for privacy, quality and integrity of data, and access to data.

4. Transparency, including traceability, explainability and communication.

5. Diversity, non-discrimination and fairness, including the avoidance of unfair bias, accessibility and universal design, and stakeholder participation.

6. Societal and environmental wellbeing, including sustainability and environmental friendliness, social impact, society and democracy.

7. Accountability, including auditability, minimisation and reporting of negative impact, trade-offs, and redress.

From the POV of the data science industry, every client need not be bothered about all the requirements. In general, the requirements (3), (4) and (5) are common for any data science workflow and we limit our discussion to these points. Additionally, we will go into the tools and techniques to meet these requirements.

Privacy and data governance

There are privacy-preserving machine learning practices that integrate some sort of data encryption as part of the learning algorithm. However, in an industrial setting, we need more degree of freedom in choosing the right models. Hence, to preserve data privacy, model-agnostic approaches based on the general data anonymisation principles are preferred.

Data anonymisation

Data anonymisation is a process where private and confidential information in data is preserved through processing techniques. The objective is to safeguard the identity of the data point from being traced back or misused.

Common data anonymisation techniques include data masking, pseudonymisation, generalisation, data swapping, data perturbation, synthetic data, etc.

Data governance 

In general, data governance refers to principles, policies and standards that covers the entire enterprise data management. Given the exponential rate at which the data is accumulated, it is important to govern this data right at the stage of gathering to processing it by an AI system. In between, we need data transfer regulations, storage regulations and principles for data cleaning and refinement. From an organisation’s perspective, data governance addresses the questions like what data they have, where they reside and how it will be processed. From a client perspective, it addresses the question like what data is transferred, who gets access, the purposes for which the data is used etc.


Transparency requirements include traceability, explainability and communication. Traceability refers to all the data science activities that are part of its workflow. This includes the data used, its cleaning, data labelling, feature transformation/selection, dimensionality reduction, algorithms used and their configuration, evaluation metrics and deployment details. The idea is to open a completely transparent environment for the stakeholders of an AI system.

Explainability refers to the practice of explaining a model decision to the users so that the user’s right to know about the modelling logic is preserved. This includes the popular XAI tools such as feature attributions, what-if analysis and counterfactual explanations.

Communication refers to the practices of communicating and understanding the activities from a user perspective. Meaning, even a user with non-technical background shall be able to comprehend the activities summary. This include a layman explanation of the algorithm used, informed consent given to the user about their data being processed by an automated system and informing them about the data being collected. The communication shall also include creating awareness about the capabilities and limitations of an AI system.

Diversity, non-discrimination and fairness

This requirement includes the avoidance of unfair bias; accessibility and universal design; and stakeholder participation. Ethical AI plays a crucial role in satisfying this requirement and we will discuss in detail the techniques for bias assessment and model de-biasing. 

Bias assessment

The bias assessment is concerned with the ethical aspects of a machine learning model. The idea is to check whether a deployed model shows bias in its prediction against any gender, race, nationality, etc. The data bias refers to the bias that already exists in the data. The bias assessment can be done at various levels – on the data directly, on the model predictions, on individuals or on the groups.

The popular quantification metrics for bias assessment include:

a) Disparate impact ratio (DIR)

DIR measures the ratio of probability of predicting favourable situations by a machine learning model for unprivileged class to the privileged class. For example, the machine learning model may be predicting whether a person will be sanctioned a loan or not. The unprivileged class, say, based on gender is female and the privileged class is male. So ideally DIR value is expected to be one to satisfy fairness. However, in the industry standards, it is usually checked if the value falls in the range [0.8, 1.2]. If the value lies outside of this range, the model is considered as biased towards the privileged class.

b) Equalised odds ratio (EO)

A machine learning model is said to satisfy equalised odds ratio if it satisfy the following condition,

P(R=1 | Y=y, D) = P(R=1 | Y=y, D)

The probability of prediction of the favoured situation by a machine learning model shall be the same irrespective whether the data point belongs to privileged or unprivileged class. Compared to DIR, the EO investigates the learning capability of a model. In simple words, AO checks if the data points corresponding to both the classes have equal true positive rate and equal false positive rate. For example, let Y be an attribute that says whether the data point is qualified for a degree admission. Now, EO checks whether a university accepts or rejects the application in the same rate for both the classes, say, Americans and Asians.

c) Average odds difference (AOD)

This quantity is defined as the average difference in false positive rates and true positive rates for both the privileged and unprivileged class. The ideal value of this quantity is zero and, in this case, corresponds to both classes having equal benefit. Any value apart from that corresponds to one class having extra benefit compared to its counterpart.

Apart from these quantities, there are a lot of other metrics that have been used, including statistical parity difference, Theil index etc.

Bias mitigation methods

In this section, I will discuss bias mitigation methods. These methods refer to techniques that nullify the bias that exists in a learned model or nullify the bias from the data itself. Commonly used tools for the model de-biasing can be classified as follows.

a) Pre-processing techniques

These methods modify the dataset itself to make it fairer. Examples are reweighing, disparate impact remover etc. In other words, bias mitigation management is absent during the model learning phase and it is done right at the data preparation stage.

b) In – processing techniques

These techniques create or modify the model to give fair predictions. That is, this technique corresponds to developing learning algorithms that are intelligent about the bias and rectify that during the learning phase itself. Examples are adversarial de-biasing, prejudice remover regularizer and exponentiated gradient.

c) Post – processing techniques

They modify the output so that the final decision is made fair, or these techniques manipulate the model output in a systematic manner to make fair decisions. Examples are Reject option classification and Threshold optimiser.

Putting everything together

Our idea is to form the components of trust requirements as checkpoints and to introduce corresponding evaluating mechanisms as checklists. The evaluators can make their validation through this well-defined process and thus an empirical scoring mechanism can be formulated on its basis. We can also add custom requirements to the list of checkpoints, for example, detecting model and dataset drift, initiating model re-training etc.

A demonstration of the Human-AI trust platform is given in Table 1. The demo has only four requirements with the Human-AI trust score weightage chosen at random. The trust score can be put at our convenience on the scale of 1 to 10 or 1 to 100. However, we can include any number of requirements with custom checkpoints and checklists. A complete list of requirements, related documentations and checklists can be derived from the below shown table. 

RequirementCheck pointsCheck listsHuman-AI trust score weight-age
Privacy & Data governance·  Data anonymisation ·  Data governance·  Psuedonomyzation·  Generalization ·  Transfer regulation·  Storage regulation 5%  5%
Transparency·  Traceability  ·  Explain-ability·  Data gathering·  Data labelling ·  Feature attribution·  Counterfactuals20 %  30 %
Diversity, non-discrimination & Fairness·  Bias assessment   ·  Bias mitigation·  DIR·  EO·  AOD ·  Pre-processing·  In-processing·  Out-processing  15 %   15 %
Production analysis·  Data drifts·  Target drifts·  Two sample KS test·  Chi squared test 10%

A demo of Human-AI trust platform. Demo requirements, its checkpoints and checklist along with randomly assigned weightage for each case.

To conclude, we have formulated a platform in which the abstract notion of trust is leveraged with the necessary qualifications to empirically evaluate the trustworthy AI systems. We briefly discussed the key stages of a typical data science workflow in which the platform ideas can be enabled and the ways to quantify the trust involved in those stages. Both the parties can mutually agree on the check points of data science workflow and a scoring mechanism is formulated that assigns weights for each score in every stage. The proposed platform provides a greater transparency in the entire life cycle of an AI system and can be easily adopted by customising the requirements.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill the form here.

Download our Mobile App

Shashank Shekhar
Shashank is a Data Sciences leader with diverse experience across verticals including Telecom, CPG, Retail, Hitech and E-commerce domains. He is currently heading the Artificial Intelligence Labs at Subex. In the past, he has worked at VMware, Amazon, Flipkart and Target and has been involved in solving various complex business problems using Machine Learning and Deep Learning.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox