# Major data distributions a data scientist should know

Different distributions of data and their properties is one such area of statistics in which a data scientist has to have crystal clear clarity.

Statistics forms the foundation of data science. It is absolutely necessary for anyone trying to build a career in data science to have a good hold over the concepts of statistics and understand how they can be applied in business settings. Different distributions of data and their properties are one such area of statistics in which a data scientist has to have crystal clear clarity.

Let us take a look at a few of the most common distributions a data scientist encounters in their career.

### Normal distribution

In a normal distribution, the data is arranged in a way that most of the values form a cluster in the middle and taper off in a symmetric fashion towards either extreme. It is also called a Gaussian distribution. It appears as a bell curve when shown graphically. In a standard normal distribution, the mean is zero, and the standard deviation takes the value of 1 along with a zero skew. The mean, median and mode are all the same in a normal distribution.

In a normal distribution, the midpoint has the maximum frequency. In normal distributions, there is a constant proportion of the area under the curve lying between the mean and any given distance from the mean when they are measured in terms of standard deviation units.

Normal distributions are represented in standard scores or Z scores. These scores give an idea of the distance between  an actual score and the mean in terms of standard deviations.

### Bernoulli distribution

In a Bernoulli distribution, there are two possible values for the random variable (A random variable is a variable whose value depends on the outcome of an experiment). They are of two types – discrete and continuous.

A Bernoulli distribution is a discrete distribution. It has two possible outcomes and a single trial (called a Bernoulli trial). A Bernoulli trial is one of the simplest experiments conducted in statistics. It comes with two possible outcomes of success and failure. Some examples of bernoulli trials include coin tosses, rolling a dice, etc. The probability values of mutually exclusive events that make up all the possible outcomes has to sum up to one.

The two possible outcomes in the Bernoulli distribution are indicated by n=0 and n=1. Here, n=1 indicating success has a probability p and n=0 indicating failure has a probability 1-p (0<=p<=1).

### Uniform distribution

Uniform distribution is one of the simplest statistical distributions to understand. It is a probability distribution in which all the possible outcomes are equally possible to occur. Graphically, we can think of it as a straight horizontal line. Uniform distributions are of two types – discrete and continuous.

A discrete uniform distribution will have a finite number of outcomes, while a continuous uniform distribution will have an infinite number of measurable outcomes that are equally likely.

### Poisson distribution

A Poisson distribution is a probability distribution that shows how many times an event is likely to occur over a fixed period of time and space. It is named after French mathematician Siméon Denis Poisson. It is a discrete distribution where the variables take only specific values. It is a limiting process of the binomial distribution.

### T-distribution

It is a type of normal distribution used mainly for smaller sample sizes, and population standard deviation is unknown. It is also known as Student’s t-Distribution – it is also bell-shaped and symmetrical with zero mean. The shape undergoes a change with the change in degrees of freedom. It has a greater dispersion than the standard normal distribution. As the degrees of freedom increase, the closer the distribution starts to approximate a standard normal distribution.

The student distribution ranges from –∞ to ∞ (infinity). Some important applications of T-distribution include the Test of the Hypothesis of the population mean, Test of Hypothesis of the difference between the two means and Test of Hypothesis of the difference between two means with dependent samples.

### Log-normal distribution

A log-normal distribution is a probability distribution of a random variable that has its logarithm normally distributed. A random variable of log-normal distribution takes only positive real values. A random variable that is log-normally distributed will only consider positive real values.

I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Why Atlassian Chose Not to Rush Through LLMs

Last week, Atlassian’s CTO Rajeev Rajan sat down with AIM to list down the company’s technological priorities

### 6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

### The Rise and Fall of JS Frameworks

Node.js is not broken enough to be fixed

### ‘Upskilling of Engineering Talent Key to Staying Relevant in Global Markets’

The company remains dedicated to upskill its employees and help them navigate new technologies and roles.

### Why Time is Ripe for the ‘Real’ GPT-4

OpenAI ups the ante to challenge Gemini with GPT-Vision

### Is MongoDB Vector Search the Panacea for all LLM Problems?

By introducing proprietary data, developers can narrow down the pool of possible responses, significantly reducing the likelihood of hallucinations

### Why Intel Closing the Gap with NVIDIA is Good News

Gaudi2’s performance surpassed NVIDIA H100’s on a state-of-the-art vision language model on Hugging Face’s performance benchmarks

### Can OpenAI Save SoftBank?

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

### Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.

### NVIDIA Catches Up to AMD, Intel with MCM Design

GH100 was also expected to have an MCM design, but it came with a monolithic architecture again.