# All you need to know about Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) refers to a class of methods for sampling from a probability distribution to construct the most likely distribution. Logistic distribution cannot be directly calculated, so instead generates thousands of values preferred as samples for the parameters of the function to create an approximation of the distribution.

Markov Chain Monte Carlo (MCMC) refers to a class of methods for sampling from a probability distribution to construct the most likely distribution. Logistic distribution cannot be directly calculated, so instead generates thousands of values preferred as samples for the parameters of the function to create an approximation of the distribution. In this article, we will deep dive into methods used by the MCMC algorithm to create such a distribution. Following are the topics to be covered.

1. What is MCMC?
2. How does MCMC work?
3. Applications of MCMC
4. Advantages and limitations of MCMC

Let’s start with understanding the concept of the Monte Carlo Markov Chain.

## What is MCMC?

The Markov chain Monte Carlo methods are designed so that the importance weights associated with each sample remain constant. This means the method aims to generate samples proportional to the posterior density so that it can arrive at the best estimates for the expectation value.

By creating a chain of correlated parameter values over n iterations, MCMC results in a posterior density that is the number of iterations spent in each specific region. Due to the ingenious MetropolisHastings algorithm which ensures stationarity of arbitrary Markov chains and adjusts it using a simple accept-reject mechanism to ensure the stationarity of probability density for the resulting process.

So it offers an indirect solution because one constructs an ergodic Markov chain with probability density as a stationary probability measure much more easily than it is to simulate directly from probability density. Let’s understand how the MCMC algorithm samples data and generates chains to achieve the required expected Bayesian distribution.

Are you looking for a complete repository of Python libraries used in data science, check out here.

## How does MCMC work?

The central idea of this algorithm is to generate new samples such that the distribution of the final samples is stationary, that it converges to something as well as retaining the consistency and equaling the probability density. Satisfying the first condition invokes detailed balance. This is the idea that probability is conserved when moving from one position to another, which is the function of reversibility.

Formally, this is just a case of factoring of probability; the probability of moving from one sample to another is equal to the probability of moving the opposite way from the present sample to the previous. By rearranging all the constraints, the final equality results from the fact that the distribution generates samples from the posterior.

Now the algorithm needs a procedure for actually calculating the probability of moving to a new position once the final equality has been determined. So, to achieve this computation it will break it into two parts. First, it will propose a new position based on a proposal distribution used for importance sampling.

Next, it will either accept or reject the new position depending on the transition probability. Combining these terms will compute the probability of moving to a new position. Similarly, for multiple data points, the algorithm will form a chain which is the Markov process.

In short, the algorithm will need a total of four-step to generate a chain which is listed.

• Propose a new position by generating a sample from the proposal distribution.
• Compute the transition probability for the newly generated samples.
• Assigning 0 and 1 either for acceptance or rejection respectively.
• To generate a Markov chain of states where the next proposed position only depends on the current position and forgets the previous position.

The problem with generating a chain of samples in practice is that the chain has a finite length and the starting point is 0. If the chain were infinitely long, all of the possible positions in parameter space would be visited, meaning the starting position is of lesser significance.

Nevertheless, in practice the sampling is terminated after only n iterations, starting from a location 0 with a very low probability means a large percentage of n samples will occupy this low-probability region, potentially biasing the results. Although it has no knowledge of where the position is concerning our posterior om it generally prefers to remove the initial chain that has started sampling from the area with higher probability.

But the new positions drawn from proposal distribution tend to depend on the current position. This means that the position we propose for future iteration will be correlated with the position at the present iteration, which itself will be correlated with the position at the past iteration and this will lead to a low acceptance fraction which will generate the chain containing these perfectly correlated samples, increasing the overall correlation. This problem is dealt with by introducing an autocovariance function that will calculate the correlation and will sort it out based on higher to lower.

Finally, we could say that the central motivating concern of MCMC methods is whether they can generate a chain of samples with an auto-correlation time small enough to outperform Importance Sampling.

### When and Where to use it?

A Markov chain Monte Carlo (MCMC) model is used to determine the posterior distribution and to draw samples from it. It is used to fit the model and to draw samples from the joint posterior distribution of the model parameters. It is used when we do not have any primer knowledge about the problem. So, we need to start the problem by assuming certain points and then by calculating the Bayesian probabilities of those guesses we can come to a solution.

## Applications of MCMC

Here are real-life applications of the MCMC algorithm.

Cryptography

Markov chain Monte Carlo is applied in the field of cryptography by Stanford university which was used to detect the prisoner’s coded notes. So the objective of the problem was to

• Compute the probability of the guess.
• Change the guess to another guess by making a random transposition of the values
• Compute the probability of a new guess; if the newer probability is greater than the previous one, accept.
• If not, flip the probabilities like tossing a coin; if it comes up heads, accept a newer guess.
• If the coin toss comes up tails, stay at f. The algorithm continues, trying to improve the current

After 100 iterations, the message is a mess. After two thousand iterations, the decrypted message makes sense. Ultimately the coded message was decrypted after a few more iterations.

Quantifying Gerrymandering in North Carolina

This is an application of the MCMC algorithm in the field of politics. In this, the team wants to evaluate whether a given political district faithfully represents the geo-political landscape. To sample the space of congressional redistricting plans, probability distributions are constructed that are concentrated on those that follow non-partisan rules from the proposed legislation. The non-partisan design criteria ensure that

• The state population is evenly divided between the thirteen congressional districts,
• The districts are connected and compact,
• Splitting counties is minimized
• African-American voters are sufficiently concentrated in two districts to affect the winner

Approximately 24,000 random redistricting plans are produced by a standard MCMC algorithm, which assumes people vote for parties rather than individuals. These election outcomes are used to quantify how representative a particular district is by observing its place in the ensemble. They are also used to quantify gerrymandering by identifying districts with an unusual partisan concentration of voters.

## Advantages and limitations of MCMC

There are advantages and disadvantages of every algorithm and here are some listed.

• Applied even if we are unable to draw samples directly.
• When there is no knowledge of the regions of high probability, the method works for complicated distributions in high dimensional-spaces
• Implementation is relatively easy
• Fairly reliable

• It requires more samples than simple importance sampling which makes it slower.
• Even empirically, evaluating the accuracy and convergence can be highly difficult.

## Final Verdict

The Markov Chain Monte Carlo is the numerical technique used for Bayesian Inference because it could calculate the posterior probabilities based on perfectly sampled data. With a hands-on implementation of this concept, we could understand the whole of the Monte Carlo Markov Chain.

## More Great AIM Stories

### Guide To Video Classification Using PytorchVideo Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

## Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct