The Never Ending Fascination Of The Gaussian Distribution

Probability distributions are important in machine learning and data analysis. Not only data scientists but also also researchers and scientists from many other fields deal with probability distributions on a day to day basis. To put it simply, probability distribution is a simply a function which informs us of the likelihood of obtaining the possible values that a random variable can take.

For example, you are walking in your lane where you stay. You are recording the heights of all the building as you go along. Now, what you are doing is actually taking random samples and creating a probability distribution and this can be very useful going forward. They will tell us about which heights are more likely and what is the variance between heights and many other things. To this end, probability distribution can be discrete or continuous.

To simplify, one could think of discrete probability distributions taking strictly discrete number of values. And continuous probability distributions take continuous values. However, physicists, mathematicians, engineers favour a special type of probability distribution, widely known as Gaussian Distribution. The distribution is a continuous Gaussian distribution and it surfaces in our day-to-day life and in nature as well. The other name for the Gaussian distribution, is Normal distribution. It is named so because this particular distribution occurs everywhere and every other distribution is abnormal.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The Gaussian Distribution

The normal distribution, is known to many as the bell curve also. The Gaussian distribution is a two-parameter family of curves. It is represented by:

Here μ is is the mean and σ2 is known as the variance. The parameter µ determines the location of the distribution while σ determines the width of the bell curve. The normal distribution with mean 0 and standard deviation 1, is called the standard normal distribution. Also it is to be noted that the random variable with standard normal distribution is called a standard normal random variable. It is denoted by Z.

Download our Mobile App

The Central Limit Theorem

Technically speaking, The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger. This is an astonishing result and is very counter intuitive. This result is true and is not dependent on the shape of the population distribution. It is more and more prominent with sample size 30 or more than 30. Hence when we extract more and more samples from the population and take the sample means, it looks more and more like a normal distribution. This sudden show up of Gaussian (Normal) distribution makes it very special and gives rise to many phenomena.

Let us look at an application of the Central Limit Theorem. Suppose a man decides to travel through the desert and runs out of fuel in his car. He calls for some help and dials the emergency number to contact government services. He happens to be at the edge of the cell range and his voice is noisy and cant be heard clearly by executives trying to help him on the other end of the call. It would be great if the executive could clean up the noise using signals from some nearby 100 odd towers.

The signals can be denoted by: X1, …, X100,  where  Xi = S + Y

Here S =  true signal being sent to the towers

And    Y = noise in the signal.

Here we can assume that noises Y1, …, Y100 are independent and identically distributed. We can suppose the mean of the noise is 0 and the variance is  σ2 . We also assume that the noise has a normal distribution. The executive can simply clean up the signal by applying the simple averaging formula

X = ( X1 + · · · + X100  / 100) =  S + ( Y1+ · · · + Y100 / 100 )

Now we know that using the Central Limit Theorem,

( Y1 + · · · + Y100 / 100 ) is approximately N(0, σ2 / 100) (Gaussian Distribution)

Hence by understanding the nature of the noise we can reduce the noise considerably.

Gaussian Distributions Can Be Used To Solve Common Problems

As mentioned earlier, scientists in many fields use Gaussians distributions to solve commonly occuring scientific problems. Physicists use Gaussians to maximise entropy for a given energy which can be any kind of energy. Hence the Gaussian distribution governs the probability of a given particle in a bottle of gas at a certain temperature.

There are many operations on Gaussians that give interesting results. For example the following:

  1. Fourier transform of Gaussian is a Gaussian
  2. Sum of two independent Gaussian random variables is Gaussian
  3. Convolution of Gaussian with another Gaussian is a Gaussian
  4. Product of two Gaussian is a Gaussian

In another application, in Fourier analysis the Gaussian or normal distribution is one of the eigenvectors of the Fourier Transform which means the frequency components of a Fourier Transform is represented by a normal distribution. It is widely known that the blood pressure patterns of adult humans also follow the Gaussian distribution.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Abhijeet Katte
As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.