Last updated September 10, 2018

‘Expectation Maximisation’ Explained With A Coin Toss

Published on September 10, 2018
by Abhijeet Katte

Parameter estimation is the process of using training data to estimate or calculate parameters of selected distributions. Many techniques like least square methods, probability plotting and other advanced methods like maximum likelihood estimation and expectation maximisation are encompassed in this general definition. There are many techniques that can estimate model parameters from latent unobserved variables. But the expectation maximisation (EM) algorithm is one of the most popular and useful parameter estimation technique.

Setting Up The Problem

When working on a very new problem you come across some data that does not have any labels. Here, one has to guess (assume) the distribution and then come up with a hypothesis to find the parameters of that distribution. Let us consider this example:

There are two coins that need to be flipped and the sequence of flipping results need to be captured. It looks something like this:

If it is a fair coin, then the probability of heads coming up as a parameter θ is 0.5. If the probability of heads is higher than the value of θ, it will be over this number. And it will be lower if the probability of heads is very low. A simple way to understand this is:

Number of Heads / Number of Heads + Number of Tails

Doing Better A Job At Estimation

Imagine that the above simple calculation is done for two coins A and B. Let us call the parameter estimates to be θA and θB. We call these the maximum likelihood values and they are responsible for the distribution that came up during the experimentation. In a way, you can imagine these parameters to be the values that maximise the likelihood of getting a particular distribution for A and B. This is the idea behind the likelihood function which serves the purpose of telling us that given some data and parameters (and a distribution) how well the parameters explain the data. It is the ultimate test to know if we reached a correct set of parameters. In the case of the coin flip, we take the binomial formula as the likelihood function. It can be represented as:

Since all we care about is the maximisation of θ, we can ignore other constants in that process. Hence we aim to maximise the expression θH(1−θ)T . We assume that k is constant because the prior as constant. There is no limitation on a likelihood function like a probability function where the values have to lie between 0 and 1.

Now the likelihood can be given as: L(θ / H,T) = θH(1−θ)T

If we plot the θ for values between 0 and 1 we get a similar maximum likelihood value similar to what we get programmatically.

Expectation Maximisation

Imagine now that the labels for the data are gone. It means that the data is all mixed. The data is now mixed and now we can’t separate the data from the two. Here is where Expectation Maximisation comes into play. The steps involved in EM algorithms are:

Guess the random initial estimates of θA and θB between 0 and 1.
Use the likelihood function use the parameter values to see how probable the data is.
Use this likelihood to generate a weighting for indicating the probability of each sequence produced using θA and θB. This is called the Expectation Step.
Add up the total number of weighted counts for heads and tails across all sequences (call these counts H′ and T′) for both parameter estimates. Produce new estimates for θA and θB using the maximum likelihood formula H′ / H′+T′ (the Maximisation step).
Repeat steps 2-4 until each parameter estimate has converged, or a set number of iterations has been reached. The total weight for each sequence should be normalised to 1.

EM may also tend to switch the parameter values between two coins.

Why EM works

With EM the parameters magically improve with each iteration and eventually converge. It is so because each data point is associated with a weight during each iteration of the EM algorithm. It shows us the how the parameters explain the data and if the newest parameters fit the data well. Sum of weights is always maintained to 1. Expectation set tells us how different data points contribute to the newly calculated maximum likelihood estimate which was calculated during the last maximisation step.

In the subsequent iterations, we need to find the likelihood that the data was generated by θA. If it greater than before, even greater weight is given for the particular data sequence. Since the sum of the weights can’t be more than 1 the weights converge towards 1 eventually.

Access all our open Survey & Awards Nomination forms in one place >>

Abhijeet Katte

As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.

Watch More

‘Expectation Maximisation’ Explained With A Coin Toss

Setting Up The Problem

Doing Better A Job At Estimation

Expectation Maximisation

Why EM works

Abhijeet Katte

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.