Listen to this story
|
Computer scientist Judea Pearl and science writer Dana Mackenzie wrote The Book of Why on causal reasoning. Pearl famously said, “data are profoundly dumb”. While data can be leveraged to make accurate predictions, even the most sophisticated machine learning techniques fail to explain how they came to the conclusions.
Pearl started working in artificial intelligence in the 1970s. He argued causation cannot be reduced to correlation. In short, you could never get causal information without using causal hypotheses.
Two decades before The Book Of Why was published, Pearl developed do-calculus, which facilitates the identification of causal effects in non-parametric models. His research paper proposed a method to determine if the available assumptions are sufficient for identifying the causal effects from non-experimental data.
Inspired by Pearl’s work in the causal inference study, Microsoft introduced software library DoWhy in 2018. It highlights the often neglected yet important assumptions underlying causal inference analyses. The library offers a programmatic interface to popular causal inference methods. In the last four years, this framework has become quite popular with contributions from dozens of data scientists. Last May, Microsoft moved DoWhy to an independent open-source governance model in a new PyWhy GitHub organisation. To build the model, Microsoft is collaborating with AWS.
Microsoft, causal inference, and DoWhy
Microsoft uses causal inference to make several important decisions, like estimating the impact of recommendation systems. The tech giant is working on fundamental advances that combine traditional machine learning with causal inference methods. Causal inference is focused on the effect of an action, unlike machine learning which is only concerned with the final outcome.
There are a number of critical research challenges in the evaluation of causal machine learning models and in formalising and integrating domain expertise into machine learning pipelines. The standard procedure usually involves doing all the steps from scratch such as finding the right identification strategy, devising an estimator, and conducting robustness checks. However, understanding the assumptions and validating them were cumbersome.
To deal with these challenges, Microsoft has released several open-source tools and libraries such as DoWhy. The library uses the Bayesian graphical model framework to represent assumptions formally. Here, users can specify what they know about the data generating process. The open-source library estimates causal effects based on historical data alone; it is particularly useful when you can’t run experiments due to time or cost constraints.
Credit: Microsoft
DoWhy focuses on four steps of an end-to-end causal inference analysis:
Modeling: Causal reasoning starts with creating a clear model of the causal assumptions being made.
Identification: In this step, strategies for identifying causal effects are created.
Estimation: Once the causal effect is identified, you can choose from a range of several statistical and machine learning-based estimation methods to answer the causal question.
Refutation: In this step, the underlying assumptions are tested.
Credit: Microsoft
In PyWhy, you can build and host interoperable libraries, tools, and other resources for a host of causal tasks and applications. It is connected through a common API on foundational causal operations, and the focus is on the end-to-end analysis process.
Similar libraries and frameworks from Microsoft
DoWhy is not the only library Microsoft has introduced on causal inference. Microsoft’s ALICE team introduced a Python package called EconML to apply machine learning techniques to estimate individualised causal responses from observational or experimental data. Incorporating individual machine learning steps into interpretable causal models improves the reliability of what-if predictions and makes causal analysis faster and easier.
Project Azua is also a good case in point. It helps in developing machine learning solutions for efficient decision making that show human expert-level performance across domains. The framework divides decisions into two types – best next question and best next action.
Microsoft continues to push the boundaries of causal learning through several new initiatives, approaches, statistical advances, and deep learning methods for end-to-end causal discovery and inference. Microsoft also recognises the importance of causal learning for fairness, explainability, and interoperability of machine learning models.