# Guide to pgmpy: Probabilistic Graphical Models with Python Code

Probabilistic Graphical Models(PGM) are a very solid way of representing joint probability distributions on a set of random variables. It allows users to do inferences in a computationally efficient way. PGM makes use of independent conditions between the random variables to create a graph structure representing the relationships between different random variables. Further, we can calculate the joint probability distribution of these variables by combining various parameters taken from the graph.

What are the types of Graph Models?

Mainly, there are two types of Graph models:
Bayesian Graph Models:  These models consist of Directed-Cyclic Graph(DAG) and there is always a conditional probability associated with the random variables. These types of models represent causation between the random variables.
Markov Graph Models:  These models are undirected graphs and represent non-causal relationships between the random variables.

pgmpy is a python framework to work with these types of graph models. Several graph models and inference algorithms are implemented in pgmpy. Pgmpy also allows users to create their own inference algorithm without getting into the details of the source code of it. Let’s get started with the implementation part.

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

Requirements

Installation

Install pgmpy via pyPI

`!pip install pgmpy`

pgmpy Demo – Create Bayesian Network

In this demo, we are going to create a Bayesian Network. Bayesian networks use conditional probability to represent  each node and are parameterized by it. For example : for each node is represented as P(node| Pa(node)) where Pa(node) is the parent node in the network.

An example of a student-model is shown below, we are going to implement it using pgmpy python library.

1. Import the required methods from pgmpy.
``` from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD ```
1. Initialize the model by passing the edge list as shown below.
``` # Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')]) ```

Define all the conditional probabilities  tables as shown in the diagram above. These CPD’s are formed by a method in pgmpy called TabularCPD.

``` # Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6], [0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7], [0.3]])
# The representation of CPD in pgmpy is a bit different than the CPD shown in the above picture. In pgmpy the columns
# are the evidence and rows are the states of the variable. ##represents P(grade|diff, intel)
cpd_g = TabularCPD(variable='G', variable_card=3,
values=[[0.3, 0.05, 0.9,  0.5],
[0.4, 0.25, 0.08, 0.3],
[0.3, 0.7,  0.02, 0.2]],
evidence=['I', 'D'],
evidence_card=[2, 2])
cpd_l = TabularCPD(variable='L', variable_card=2,
values=[[0.1, 0.4, 0.99],
[0.9, 0.6, 0.01]],
evidence=['G'],
evidence_card=[3])
cpd_s = TabularCPD(variable='S', variable_card=2,
values=[[0.95, 0.2],
[0.05, 0.8]],
evidence=['I'],
evidence_card=[2])
Add CPD’s(defined above) to the initialized model.
# Associating the CPDs with the network
Verify the above network by using a check_model() method. If it sum up to 1, means the CPD’s are defined correctly.
# check_model checks for the network structure and CPDs and verifies that the CPDs are correctly
# defined and sum to 1.
model.check_model() ```
1. In the above step, we haven’t provided the state name so pgmpy automatically initialized all the states as 0,1,2,…., so on but it also provides a method of exclusively setting the states. An example of this is shown below. And the whole code snippet is available here.
``` cpd_g_sn = TabularCPD(variable='G', variable_card=3,
values=[[0.3, 0.05, 0.9,  0.5],
[0.4, 0.25, 0.08, 0.3],
[0.3, 0.7,  0.02, 0.2]],
evidence=['I', 'D'],
evidence_card=[2, 2],
state_names={'G': ['A', 'B', 'C'],
'I': ['Dumb', 'Intelligent'],
'D': ['Easy', 'Hard']}) ```
1. Print the CPD’s for no-states defined by simply using the print command and exclusively defined states by using the get_cpds method. The code is available here. The output is shown below.
1. Next is to find independencies in the given bayesian network. There are types of independencies defined by the Bayesian Network.

Local Independencies : A variable which is independent of its non-descendents given its parents. It can be defined as P( X ⊥ NonDesc(X) | Pa(X)), where NonDesc(X) is the set of variables which are not descendents of X and Pa(X) is the set of variables which are parents of X

``` # Getting the local independencies of a variable.
model.local_independencies('G') ```

Or,

``` # Getting all the local independencies in the network.
model.local_independencies(['D', 'I', 'S', 'G', 'L']) ```

Global Independencies : There are many  structures possible for global independencies. For two nodes, there are two ways it can be connected.

In the above two cases it is obvious that change in any of the nodes will affect the other. Similar cases can be shown for three nodes.

1. Inference from bayesian models. In this step, we will predict values from the Bayesian Model discussed above. We are going to use Variable Elimination, a very basic method for inference. For example, we will compute the probability of G by marginalizing over all the other variables. The python code for this is given below.
``` from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
g_dist = infer.query(['G'])
print(g_dist) ```

For computing the conditional distribution such as P(G | D=0, I=1), we need to pass an extra argument.

`print(infer.query(['G'], evidence={'D': 'Easy', 'I': 'Intelligent'}))`

1. In this step, we will predict the values for new data points . The difference between step 6 and this step is, we are now interested in getting the most probable state of the variable instead of calculating probability distribution. In pgmpy this is known as MAP query. Here’s an example:

`infer.map_query(['G'])`

Or,

`infer.map_query(['G'], evidence={'D': 'Easy', 'I': 'Intelligent'})`

You can check the full demo here.

pgmpy Demo – Extensibility

As discussed above, pgmpy provides a method to create your own inference algorithm. In this demo, we are going to discuss the same. pgmpy contains methods like :

• BaseInference for inference
•  BaseFactor for model parameters
•  BaseEstimators for parameter and model learning
• For adding new features, create a new class which inherits a base class and the we can just simply use other functionality of pgmpy with this new class.

Following are the steps:

1. Import all the required methods and packages.
``` # A simple Exact inference algorithm
import itertools
from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product ```
1. Define your own inference class, by passing the base class from pgmpy. For this particular algorithm, we will multiply all the factors/CPD of the network and marginalize over variables to get the desired query.
``` class SimpleInference(Inference):
# By inheriting Inference we can use self.model, self.factors and self.cardinality in our class
def query(self, var, evidence):
# self.factors is a dict of the form of {node: [factors_involving_node]}
factors_list = set(itertools.chain(*self.factors.values()))
product = factor_product(*factors_list)
reduced_prod = product.reduce(evidence, inplace=False)
reduced_prod.normalize()
var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
return marg_prod ```
1. Now, like discussed in the above model, we will initialize the bayesian model, prepare all the conditional probability (for all variables) and then add it to the initialized model.
``` # Defining a model
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
model = BayesianModel([('A', 'J'), ('R', 'J'), ('J', 'Q'), ('J', 'L'), ('G', 'L')])
cpd_a = TabularCPD('A', 2, values=[[0.2], [0.8]])
cpd_r = TabularCPD('R', 2, values=[[0.4], [0.6]])
cpd_j = TabularCPD('J', 2, values=[[0.9, 0.6, 0.7, 0.1],
[0.1, 0.4, 0.3, 0.9]],
evidence=['A', 'R'], evidence_card=[2, 2])
cpd_q = TabularCPD('Q', 2, values=[[0.9, 0.2], [0.1, 0.8]],
evidence=['J'], evidence_card=[2])
cpd_l = TabularCPD('L', 2, values=[[0.9, 0.45, 0.8, 0.1],
[0.1, 0.55, 0.2, 0.9]],
evidence=['J', 'G'], evidence_card=[2, 2])
cpd_g = TabularCPD('G', 2, values=[[0.6], [0.4]])
model.add_cpds(cpd_a, cpd_r, cpd_j, cpd_q, cpd_l, cpd_g) ```
1. Now, calculate the inference from your customized inference algorithm and compare it with the VariableElimination method.
``` # Doing inference with our SimpleInference
infer = SimpleInference(model)
a = infer.query(var=['A'], evidence=[('J', 0), ('R', 1)]) ```

You can check the full demo here.

Conclusion

In this article, we have discussed the pgmpy python library which provides a simple API for working with Graphical models(bayesian model, markov model,etc. It is highly modular and quite extensible.

Official codes, Docs & Tutorials are available at:

A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Esri’s Journey in Shaping the Geospatial Landscape in India

Esri offers GeoAI within ArcGIS, providing ready-to-use models for working with various data types,

### Why Meta Ray-Ban will Fail

Humane Ai Pin just burst the bubble of Meta Ray-Ban like smart glasses.

### Synthetic Data Alone won’t Achieve AGI

LeCun thinks that Q* might be OpenAI’s attempt at “Planning”

### Pixxel’s Hyperspectral Odyssey

It is set to launch world’s first high-resolution hyperspectral satellite constellation by 2024 and

### Good News: Nobody Has to Work Anymore

Bill Gates recently said that people will eventually work only three days a week

### NVIDIA Rides High on InfiniBands

“The vast majority of the dedicated large scale AI factories standardise on InfiniBand,” said

### How NVIDIA is Helping Foxconn Unleash its EV Ambitions

Electronics manufacturers globally are enhancing digitalisation with NVIDIA’s AI, 3D, simulation, and autonomous tech.