Listen to this story
The glitz and glamour of Data Science remain unbeatable. The challenge and the temptation to try to apply these models to real-world problems are very significant. The complexity can get very deceptive and lead to suboptimal outcomes. The speed of building something complex often outweighs the solution’s usefulness. A poorly built solution costs time and money but also the trust in machine learning solutions’ usefulness. In this article, we will consider the importance of having a model governance system.
Why the specific focus for governance: Special attention to data science problems and hence the need for stricter governance mechanisms.
The Problem Landscape
Real-world data science problems are complex. In most cases, we are still struggling to determine what the problem is.
Sign up for your weekly dose of what's up in emerging technology.
To quote from Sidney Sheldon, “The Doomsday Conspiracy”.
“Talk about looking for a needle in a haystack. I don’t even know where the haystack is.”
Download our Mobile App
This is different from other engineering problems—the solution starts after the problem is well-defined.
Complexity Bias of Data Scientists
“Life is really simple, but we insist on making it complicated.” —Confucius
Humans in general have a tendency of taking the difficult route to solve simple problems. When presented with two solutions—one seemingly simple and one difficult one—there is a higher bias to choose the complex one. Data scientists given the availability of huge data and the possibility of a plethora of algorithms could gravitate towards solutions with higher order of complexity. For instance, they could construct any problem with non-linearity assumptions or add thousands of features. While solving those at the level of complexity, simpler solutions may often be overlooked that could have solved the problem better. Consider an ML model trying to recommend three out ten product categories with the top three frequent products carrying 90% of the purchased volume. Perhaps in this situation, a frequency-based approach will just get the recommendation problem sorted than a full-fledged ML model.
The Environment of Uncertainties
Data Science is fraught with uncertainties. We could broadly classify the uncertainties into mathematical uncertainty and user uncertainty. Mathematical uncertainty deals with the possibility of solving the underlying customer problem, and user certainty deals with whether AI’s action would change user behaviour.
- Mathematical uncertainty: Not all problems may have a feasible mathematical solution. For example, we can forecast sales of a product. However, suppose we need more data on historical information about the product or its historical sales of the product or similar such products. In that case, forecasting may be similar to a random guess.
- User unCertainty: Consider an Ecommerce company that wants to provide 5% off on electronics sales. Whether the offer is big enough to incentivise a user to purchase is still being determined. Till the experiments are run, we are still determining whether the supposed interventions will be effective. Mathematical uncertainty could be teased with offline “back-testing” of data. The user uncertainty needs a lot of trial and error and experiments to arrive at and model the behaviour.
The “Black box” nature of the problems
Most ML models need help in terms of explainability. The nonlinearity of most of these models makes it difficult to understand what’s going on behind the scenes. If a model contains fewer features, the interrelation between the variables also causes problems in providing a valid explanation of the underlying phenomenon.
One way to solve the problems listed above is to use heuristic rules or even predictive models to keep it to linear models. However, that would compromise the business impact that “well-built” ML models could otherwise obtain. Having a good governance process ensures that data scientists are able to try out the state-of-the-art models without compromising on the robustness of the solution as well as making the whole model-building process predictable. Not only should we build models that predict well but we should also build “predictable models”. The governance process ensures that the complexity is worth the uplift or the impact caused by the models.
How do you build a governance process?
The foremost thing to recognise is the need to do a deeper review of the predictive models. I would recommend an expert council who can review the models before they get picked for deployment. The council should enable the teams with the guidelines and checklist to build predictive models. They should periodically update these guidelines based on the developments and the improvements in the industry. There should also be an organisational mandate that makes model governance mandatory before it can move into production.
Role of the council
Any governance has the notoriety of bringing down pace and innovation. Hence, the expert council needs to make sure the process does not become bureaucratic and halter innovation and try out new things. The “kick” in a data science job lies in experimentation and innovation. It is important that this first principle is followed in spirit by the council. Hence, the council should be from a leadership role in data science with in-depth technical expertise.
- Laying out guidelines for conceptualising, evaluating and building models.
- Tools/libraries that could help in building robust models.
- Document templates for model review.
- Clearing out the review with the quickest ETA.
- Providing new ideas or thoughts to solve the problem.
- Questioning complexity but not being cynical about it.
What does governance look like
Model governance is a huge area to cover. We will focus in this area as the basic construct of governance, the broader guidelines.
Given most of the ML models are non-linear, we will have two strategies for governance which we will look at in the next article—one where we evaluate the black box as a whole and one where we peep inside this black box and evaluate the contents in depth.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.