10 Most Popular Statistical Hypothesis Testing Methods Using Python

Decision making and storytelling are two important facets of a data scientist’s job description. Models can be tweaked and computational powers can be pumped up. But to choose a certain test or a method, will have great implications on the product lifecycle. From cost-cutting to life-saving, hypothesis testing is prevalent in the world of statistics and with the conception of statistical machine learning, the tests have been imbibed and are made more accessible with the Python’s ever-increasing and improving, task-specific libraries.

Statistical tests are commonly classified as parametric and non-parametric tests. Parametric tests are conducted, with an assumption that the data follows a Gaussian distribution. If this assumption fails, then non-parametric tests are considered for hypothesis testing.

Here we list few widely used statistical tests(parametric and non-parametric) available in Python:

Chi-Squared Test

Chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.

Python Code

from scipy.stats import chi2_contingency
table = ...
stat, p, dof, expected = chi2_contingency(table)

Student’s t-test

Tests whether the means of two independent samples are significantly different.

Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.

Python Code

from scipy.stats import ttest_ind
data1, data2 = ...
stat, p = ttest_ind(data1, data2)

Analysis of Variance Test (ANOVA)

ANOVA is another widely popular test which is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.

Python Code

from scipy.stats import f_oneway
data1, data2, ... = ...
stat, p = f_oneway(data1, data2, ...)

Shapiro-Wilk Test

This test is used to check whether the sample data has a Gaussian distribution.

Python Code

from scipy.stats import shapiro
data1 = ....
stat, p = shapiro(data)

D’Agostino’s K^2 Test

Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.

Python Code

from scipy.stats import normaltest
data1 = ....
stat, p = normaltest(data)

Pearson’s Correlation Coefficient

A statistical test for checking correlation between two samples and whether they have a linear relationship.

Python Code

from scipy.stats import pearsonr
data1, data2 = ...
corr, p = pearsonr(data1, data2)

Spearman’s Rank Correlation

Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.

Python Code

from scipy.stats import spearmanr
data1, data2 = ...
corr, p = spearmanr(data1, data2)

Mann-Whitney U Test

A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.

Python Code

from scipy.stats import mannwhitneyu
data1, data2 = ...
stat, p = mannwhitneyu(data1, data2)

Kruskal-Wallis H Test

Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.

Python Code

from scipy.stats import kruskal
data1, data2, ... = ...
stat, p = kruskal(data1, data2, ...)

Friedman Test

Friedman test checks whether the distributions of two or more paired samples are equal or not.

Python Code

from scipy.stats import friedmanchisquare
data1, data2, ... = ...
stat, p = friedmanchisquare(data1, data2, ...)


The probability of rejecting the null hypothesis is a function of five factors: whether the test is one- or two-tailed, the level of significance, the standard deviation, the amount of deviation from the null hypothesis, and the number of observations. Having said that, statistical tests are also subject to criticism. For instance, while interpreting the p-value, the way multiple comparisons are done is tricky because p-values depend on both data observed and data that might have been observed but wasn’t.  Therefore, a statistician or an analyst or a data scientist should be aware of the fact that statistical significance does not imply practical significance and correlation doesn’t imply causation. Every test is only a means to an end which is, often vague.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox