MITB Banner

Watch More

Z-Tests vs T-Tests: How To Choose Among Two Important Hypothesis Tests

This article is an attempt to check under what condition we can go for a Z -Test or a T-Test. We will further implement these tests in python.
Z-Tests vs T-Tests

Today, as a data science professional, we all have heard of the buzz word Hypothesis Testing. Hypothesis Testing is basically an assumption that we make about the population parameter. We should know when to use which fundamental test for statistical analysis.

This article is an attempt to check under what condition we can go for a Z -Test or a T-Test. We will further implement these tests in python.

Z-Test

The dataset is downloaded from here.

In a z-test, we need to compare two given sample means. The sample follows a Gaussian distribution. A z-test is used when the population parameters like standard deviation are known.

Null Hypothesis: Population mean is same as the sample mean

Alternate Hypothesis: Population mean is not the same as the sample mean

Using the below formula we can calculate the z-statistic:

z = (x — μ) / (σ / √n)

x= sample mean

σ / √n = standard deviation of population

If the p-value is lower than 0.05, reject the hypothesis or else accept the null hypothesis.

One-Sample Z test

Let’s take a mean of 156 for this blood pressure dataset.

Null Hypothesis: There is no difference in the mean

Alternate Hypothesis: Means are different

import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests
ztest, pval = stests.ztest(df['bp_before'], x2=None, value=156)
print(float(pval))
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

 From the above result we can see p-value is greater than 0.05 so, the null hypothesis is accepted.

Two Sample Z-test

H0: mean of two samples is the same

H1: mean of two samples is not the same

ztest ,pval1 = stests.ztest(df['bp_before'], x2=df['bp_after'], value=0,alternative='two-sided')
print(float(pval1))
if pval1<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

The p-value is greater than 0.05 so the null hypothesis is rejected. There is a significant difference between the mean of the two groups.

T-test

The dataset can be downloaded from here.

The T-test is used to compare the mean of two given groups. The sample follows the Gaussian distribution. A t-test is used when parameters like the standard deviation of the population are not known.

We can calculate the t statistics by the given formula

t = (x1 — x2) / (σ / √n1 + σ / √n2)

x1 = sample 1 mean

x2 = sample 2 mean

n1 = sample 1 size

n2 = sample 2 size

One-Sample T-Test

The mass of a sample of n=20 are m = 8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, and 7.0 g.We need to check if there is any difference between the average mass of this sample and the average mass of all acorns of μ = 10.0 g.

Null Hypothesis: x̄ – μ = 0, that is there is no significant difference.

Alternate Hypothesis: x̄ – μ ≠ 0 (two-sided test)

t-critical for specified alpha level: t*= 2.093

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats # some useful stuff
wine_data = pd.read_csv("winemag-data-130k-v2.csv")
x = wine_data['points']
mu = x.mean()
sigma = x.std(ddof=0)
print("mu: ", mu, ", sigma:", sigma)

x = np.random.normal(loc=9.2,scale=1.5,size=30).round(1)
print(x)

#One Sample t test
x = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0,
     7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, 7.0]
mu = 10
t_critical = 2.093
x_bar = np.array(x).mean()
s = np.array(x).std(ddof=1) # subtract 1 from N to get unbiased estimate of sample standard deviation
N = len(x)
SE = s/np.sqrt(N)
t = (x_bar - mu)/SE
print("t-statistic: ",t)
# A one sample t-test that gives you the p-value too can be done with scipy as follows:
t, p = stats.ttest_1samp(x, mu)
print("t = ", t, ", p = ", p)
if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

p is lesser in magnitude than 0.05 we need to reject the null hypothesis. There is a statistically significant difference between the sample mean and the population mean of 10 g.

Two-Sample T-Test

The mass of N1=20 acorns and N2=30 acorns from oak trees downwind from the same coal power plant is measured. 

Null Hypothesis:1 = x̄2, or x̄2 – x̄1 = 0, that is, there is no difference between the sample means

Alternate Hypothesis:2 < x̄1, or x̄2 – x̄1 < 0 there is a difference between the sample means

# sample up wind
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]
# sample down wind
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]
# equal sample size and assume equal population variance
t_critical = 1.677
X1 = len(x1)
X2 = len(x2)
t1 = X1-1
t2 = X2-1
df = t1+t2
s1 = np.std(x1,ddof=1)
s2 = np.std(x2,ddof=1)
x1_bar = np.mean(x1)
x2_bar = np.mean(x2)
sp = np.sqrt((t1*s1**2 + t2*s2**2)/df)
se = sp*np.sqrt(1/X1 + 1/X2)
t = (x2_bar - x1_bar)/(se)
print("t-statistic", t)
# a two-sample independent t-test is done with scipy as follows
# NOTE: the p-value given is two-sided so the one-sided p value would be p/2
t, p_twosided = stats.ttest_ind(x2, x1, equal_var=True)
print("t = ",t, ", p_twosided = ", p_twosided, ", p_onesided =", p_twosided/2)

 p is lesser in magnitude than 0.05 we need to reject the null hypothesis. There is a statistically significant difference between the sample mean of the two different samples.

Paired T-Test

The paired sample t-test is also called a dependent sample t-test. Let’s take an example from a blood pressure dataset. We need to check the sample means of blood pressure of an individual before and after treatment.

H0: The mean difference between the two samples is 0

H1: The mean difference between the two samples is not 0

import pandas as pd
from scipy import stats
df = pd.read_csv("ztest.csv")
df[['bp_before','bp_after']].describe()
ttest,pval = stats.ttest_rel(df['bp_before'], df['bp_after'])
print(pval)
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

As p-value is less than 0.05 mean values of the two groups are not the same.

Conclusion

This post could give us an overview of when to use z-test and t-test in statistical tests. We can further extend our analysis by discussing the other statistical tests like ANOVA and Chi-Square Test. Finally, we came to the end of this article. I hope this article would have helped.

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Ankit Das

Ankit Das

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories