Last updated January 31, 2022
In AI Mysteries

Major data distributions a data scientist should know

Different distributions of data and their properties is one such area of statistics in which a data scientist has to have crystal clear clarity.

Published on January 30, 2022
by Sreejani Bhattacharyya

Statistics forms the foundation of data science. It is absolutely necessary for anyone trying to build a career in data science to have a good hold over the concepts of statistics and understand how they can be applied in business settings. Different distributions of data and their properties are one such area of statistics in which a data scientist has to have crystal clear clarity.

Let us take a look at a few of the most common distributions a data scientist encounters in their career.

Normal distribution

In a normal distribution, the data is arranged in a way that most of the values form a cluster in the middle and taper off in a symmetric fashion towards either extreme. It is also called a Gaussian distribution. It appears as a bell curve when shown graphically. In a standard normal distribution, the mean is zero, and the standard deviation takes the value of 1 along with a zero skew. The mean, median and mode are all the same in a normal distribution.

In a normal distribution, the midpoint has the maximum frequency. In normal distributions, there is a constant proportion of the area under the curve lying between the mean and any given distance from the mean when they are measured in terms of standard deviation units.

Normal distributions are represented in standard scores or Z scores. These scores give an idea of the distance between an actual score and the mean in terms of standard deviations.

Bernoulli distribution

In a Bernoulli distribution, there are two possible values for the random variable (A random variable is a variable whose value depends on the outcome of an experiment). They are of two types – discrete and continuous.

A Bernoulli distribution is a discrete distribution. It has two possible outcomes and a single trial (called a Bernoulli trial). A Bernoulli trial is one of the simplest experiments conducted in statistics. It comes with two possible outcomes of success and failure. Some examples of bernoulli trials include coin tosses, rolling a dice, etc. The probability values of mutually exclusive events that make up all the possible outcomes has to sum up to one.

The two possible outcomes in the Bernoulli distribution are indicated by n=0 and n=1. Here, n=1 indicating success has a probability p and n=0 indicating failure has a probability 1-p (0<=p<=1).

Uniform distribution

Uniform distribution is one of the simplest statistical distributions to understand. It is a probability distribution in which all the possible outcomes are equally possible to occur. Graphically, we can think of it as a straight horizontal line. Uniform distributions are of two types – discrete and continuous.

A discrete uniform distribution will have a finite number of outcomes, while a continuous uniform distribution will have an infinite number of measurable outcomes that are equally likely.

Poisson distribution

A Poisson distribution is a probability distribution that shows how many times an event is likely to occur over a fixed period of time and space. It is named after French mathematician Siméon Denis Poisson. It is a discrete distribution where the variables take only specific values. It is a limiting process of the binomial distribution.

T-distribution

It is a type of normal distribution used mainly for smaller sample sizes, and population standard deviation is unknown. It is also known as Student’s t-Distribution – it is also bell-shaped and symmetrical with zero mean. The shape undergoes a change with the change in degrees of freedom. It has a greater dispersion than the standard normal distribution. As the degrees of freedom increase, the closer the distribution starts to approximate a standard normal distribution.

The student distribution ranges from –∞ to ∞ (infinity). Some important applications of T-distribution include the Test of the Hypothesis of the population mean, Test of Hypothesis of the difference between the two means and Test of Hypothesis of the difference between two means with dependent samples.

Log-normal distribution

A log-normal distribution is a probability distribution of a random variable that has its logarithm normally distributed. A random variable of log-normal distribution takes only positive real values. A random variable that is log-normally distributed will only consider positive real values.

Access all our open Survey & Awards Nomination forms in one place >>

Sreejani Bhattacharyya

I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Major data distributions a data scientist should know

Normal distribution

Bernoulli distribution

Uniform distribution

Poisson distribution

T-distribution

Log-normal distribution

Sreejani Bhattacharyya

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.