Council Post: The Often Ignored Reality of Model Governance (Part 1)

The challenge and the temptation to try to apply these models to real-world problems are very significant. The complexity can get very deceptive and lead to suboptimal outcomes. The speed of building something complex often outweighs the solution's usefulness

Share

Published on January 9, 2023

by Mathangi Sri

Listen to this story

The glitz and glamour of Data Science remain unbeatable. The challenge and the temptation to try to apply these models to real-world problems are very significant. The complexity can get very deceptive and lead to suboptimal outcomes. The speed of building something complex often outweighs the solution’s usefulness. A poorly built solution costs time and money but also the trust in machine learning solutions’ usefulness. In this article, we will consider the importance of having a model governance system.

Why the specific focus for governance: Special attention to data science problems and hence the need for stricter governance mechanisms.

The Problem Landscape

Real-world data science problems are complex. In most cases, we are still struggling to determine what the problem is.

To quote from Sidney Sheldon, “The Doomsday Conspiracy”.

“Talk about looking for a needle in a haystack. I don’t even know where the haystack is.”

This is different from other engineering problems—the solution starts after the problem is well-defined.

Complexity Bias of Data Scientists

“Life is really simple, but we insist on making it complicated.” —Confucius

Humans in general have a tendency of taking the difficult route to solve simple problems. When presented with two solutions—one seemingly simple and one difficult one—there is a higher bias to choose the complex one. Data scientists given the availability of huge data and the possibility of a plethora of algorithms could gravitate towards solutions with higher order of complexity. For instance, they could construct any problem with non-linearity assumptions or add thousands of features. While solving those at the level of complexity, simpler solutions may often be overlooked that could have solved the problem better. Consider an ML model trying to recommend three out ten product categories with the top three frequent products carrying 90% of the purchased volume. Perhaps in this situation, a frequency-based approach will just get the recommendation problem sorted than a full-fledged ML model.

The Environment of Uncertainties

Data Science is fraught with uncertainties. We could broadly classify the uncertainties into mathematical uncertainty and user uncertainty. Mathematical uncertainty deals with the possibility of solving the underlying customer problem, and user certainty deals with whether AI’s action would change user behaviour.

Mathematical uncertainty: Not all problems may have a feasible mathematical solution. For example, we can forecast sales of a product. However, suppose we need more data on historical information about the product or its historical sales of the product or similar such products. In that case, forecasting may be similar to a random guess.
User unCertainty: Consider an Ecommerce company that wants to provide 5% off on electronics sales. Whether the offer is big enough to incentivise a user to purchase is still being determined. Till the experiments are run, we are still determining whether the supposed interventions will be effective. Mathematical uncertainty could be teased with offline “back-testing” of data. The user uncertainty needs a lot of trial and error and experiments to arrive at and model the behaviour.

The “Black box” nature of the problems

Most ML models need help in terms of explainability. The nonlinearity of most of these models makes it difficult to understand what’s going on behind the scenes. If a model contains fewer features, the interrelation between the variables also causes problems in providing a valid explanation of the underlying phenomenon.

Model Governance

One way to solve the problems listed above is to use heuristic rules or even predictive models to keep it to linear models. However, that would compromise the business impact that “well-built” ML models could otherwise obtain. Having a good governance process ensures that data scientists are able to try out the state-of-the-art models without compromising on the robustness of the solution as well as making the whole model-building process predictable. Not only should we build models that predict well but we should also build “predictable models”. The governance process ensures that the complexity is worth the uplift or the impact caused by the models.

How do you build a governance process?

The foremost thing to recognise is the need to do a deeper review of the predictive models. I would recommend an expert council who can review the models before they get picked for deployment. The council should enable the teams with the guidelines and checklist to build predictive models. They should periodically update these guidelines based on the developments and the improvements in the industry. There should also be an organisational mandate that makes model governance mandatory before it can move into production.

Role of the council

Any governance has the notoriety of bringing down pace and innovation. Hence, the expert council needs to make sure the process does not become bureaucratic and halter innovation and try out new things. The “kick” in a data science job lies in experimentation and innovation. It is important that this first principle is followed in spirit by the council. Hence, the council should be from a leadership role in data science with in-depth technical expertise.

Laying out guidelines for conceptualising, evaluating and building models.
Tools/libraries that could help in building robust models.
Document templates for model review.
Clearing out the review with the quickest ETA.
Providing new ideas or thoughts to solve the problem.
Questioning complexity but not being cynical about it.

What does governance look like

Model governance is a huge area to cover. We will focus in this area as the basic construct of governance, the broader guidelines.

Given most of the ML models are non-linear, we will have two strategies for governance which we will look at in the next article—one where we evaluate the black box as a whole and one where we peep inside this black box and evaluate the contents in depth.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.

Access all our open Survey & Awards Nomination forms in one place

Mathangi Sri

Mathangi Sri currently works as the Chief Data Officer at Yubi. She has 18+ years of proven track record in building world-class data science solutions and products. She has overall 20 patent grants in the area of intuitive customer experience and user profiles. She has recently published a book – “Practical Natural Language Processing with Python”. She also recently published a book with BPB Publications 'Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products.

Council Post: Testing the “box as a whole” (Part 2)

Mathangi Sri 16/01/2023

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

‘iPhone is the Greatest Piece of Technology Humanity has Ever Made,’ Says OpenAI’s Sam Altman

Siddharth Jindal

“There are a bunch of societal and interpersonal issues that are all very complicated about wearing a computer on your face,” says OpenAI chief, taking a dig at Meta smart