Now Reading
Z-Tests vs T-Tests: How To Choose Among Two Important Hypothesis Tests

Z-Tests vs T-Tests: How To Choose Among Two Important Hypothesis Tests

Ankit Das
Z-Tests vs T-Tests

Download our Mobile App


Today, as a data science professional, we all have heard of the buzz word Hypothesis Testing. Hypothesis Testing is basically an assumption that we make about the population parameter. We should know when to use which fundamental test for statistical analysis.

This article is an attempt to check under what condition we can go for a Z -Test or a T-Test. We will further implement these tests in python.



Z-Test

The dataset is downloaded from here.

In a z-test, we need to compare two given sample means. The sample follows a Gaussian distribution. A z-test is used when the population parameters like standard deviation are known.

Null Hypothesis: Population mean is same as the sample mean

Alternate Hypothesis: Population mean is not the same as the sample mean

Using the below formula we can calculate the z-statistic:


Stay Connected

Get the latest updates and relevant offers by sharing your email.

z = (x — μ) / (σ / √n)

x= sample mean

σ / √n = standard deviation of population

If the p-value is lower than 0.05, reject the hypothesis or else accept the null hypothesis.

One-Sample Z test

Let’s take a mean of 156 for this blood pressure dataset.

Null Hypothesis: There is no difference in the mean

Alternate Hypothesis: Means are different

import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests
ztest, pval = stests.ztest(df['bp_before'], x2=None, value=156)
print(float(pval))
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

 From the above result we can see p-value is greater than 0.05 so, the null hypothesis is accepted.

Two Sample Z-test

H0: mean of two samples is the same

H1: mean of two samples is not the same

ztest ,pval1 = stests.ztest(df['bp_before'], x2=df['bp_after'], value=0,alternative='two-sided')
print(float(pval1))
if pval1<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

The p-value is greater than 0.05 so the null hypothesis is rejected. There is a significant difference between the mean of the two groups.

T-test

The dataset can be downloaded from here.

The T-test is used to compare the mean of two given groups. The sample follows the Gaussian distribution. A t-test is used when parameters like the standard deviation of the population are not known.

We can calculate the t statistics by the given formula

t = (x1 — x2) / (σ / √n1 + σ / √n2)

x1 = sample 1 mean

x2 = sample 2 mean

n1 = sample 1 size

n2 = sample 2 size

One-Sample T-Test

The mass of a sample of n=20 are m = 8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, and 7.0 g.We need to check if there is any difference between the average mass of this sample and the average mass of all acorns of μ = 10.0 g.

Null Hypothesis: x̄ – μ = 0, that is there is no significant difference.

See Also
Decoding Most Used, Confused & Abused Jargons In Machine Learning

Alternate Hypothesis: x̄ – μ ≠ 0 (two-sided test)

t-critical for specified alpha level: t*= 2.093

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats # some useful stuff
wine_data = pd.read_csv("winemag-data-130k-v2.csv")
x = wine_data['points']
mu = x.mean()
sigma = x.std(ddof=0)
print("mu: ", mu, ", sigma:", sigma)

x = np.random.normal(loc=9.2,scale=1.5,size=30).round(1)
print(x)

#One Sample t test
x = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0,
     7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, 7.0]
mu = 10
t_critical = 2.093
x_bar = np.array(x).mean()
s = np.array(x).std(ddof=1) # subtract 1 from N to get unbiased estimate of sample standard deviation
N = len(x)
SE = s/np.sqrt(N)
t = (x_bar - mu)/SE
print("t-statistic: ",t)
# A one sample t-test that gives you the p-value too can be done with scipy as follows:
t, p = stats.ttest_1samp(x, mu)
print("t = ", t, ", p = ", p)
if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

p is lesser in magnitude than 0.05 we need to reject the null hypothesis. There is a statistically significant difference between the sample mean and the population mean of 10 g.

Two-Sample T-Test

The mass of N1=20 acorns and N2=30 acorns from oak trees downwind from the same coal power plant is measured. 

Null Hypothesis:1 = x̄2, or x̄2 – x̄1 = 0, that is, there is no difference between the sample means

Alternate Hypothesis:2 < x̄1, or x̄2 – x̄1 < 0 there is a difference between the sample means

# sample up wind
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]
# sample down wind
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]
# equal sample size and assume equal population variance
t_critical = 1.677
X1 = len(x1)
X2 = len(x2)
t1 = X1-1
t2 = X2-1
df = t1+t2
s1 = np.std(x1,ddof=1)
s2 = np.std(x2,ddof=1)
x1_bar = np.mean(x1)
x2_bar = np.mean(x2)
sp = np.sqrt((t1*s1**2 + t2*s2**2)/df)
se = sp*np.sqrt(1/X1 + 1/X2)
t = (x2_bar - x1_bar)/(se)
print("t-statistic", t)
# a two-sample independent t-test is done with scipy as follows
# NOTE: the p-value given is two-sided so the one-sided p value would be p/2
t, p_twosided = stats.ttest_ind(x2, x1, equal_var=True)
print("t = ",t, ", p_twosided = ", p_twosided, ", p_onesided =", p_twosided/2)

 p is lesser in magnitude than 0.05 we need to reject the null hypothesis. There is a statistically significant difference between the sample mean of the two different samples.

Paired T-Test

The paired sample t-test is also called a dependent sample t-test. Let’s take an example from a blood pressure dataset. We need to check the sample means of blood pressure of an individual before and after treatment.

H0: The mean difference between the two samples is 0

H1: The mean difference between the two samples is not 0

import pandas as pd
from scipy import stats
df = pd.read_csv("ztest.csv")
df[['bp_before','bp_after']].describe()
ttest,pval = stats.ttest_rel(df['bp_before'], df['bp_after'])
print(pval)
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

As p-value is less than 0.05 mean values of the two groups are not the same.

Conclusion

This post could give us an overview of when to use z-test and t-test in statistical tests. We can further extend our analysis by discussing the other statistical tests like ANOVA and Chi-Square Test. Finally, we came to the end of this article. I hope this article would have helped.

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
0
Happy
1
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top