When the population standard deviation is known and the sample size is more than 30 z-test is more appropriate to use for testing hypotheses than t-test?

When to use the z-test vs t-test?

When you know the population standard deviation you should use the z-test, when you estimate the sample standard deviation you should use the t-test.
The t-distribution has heavier tails (Leptokurtic Kurtosis) than the normal distribution to compensate for the higher uncertainty because we estimate the standard deviation. (the standard deviation of the standard deviation statistic)Usually, we don't have the population standard deviation, so we use the t-test.You should use the t-test!The t-test is always the correct test when you estimate the sample standard deviation. I guess the reason for the confusion is historical. The degrees of freedom equal sample size minus one. When the sample size is greater than 30, the t-distribution is very similar to the normal distribution.The t-distribution limit at infinity degrees of freedom is the normal distribution.In the past, people used tables to calculate the cumulative probability. For the t-table you need to have a separate set of data for any DF value, hence the Z-Table is more detailed and more accurate than the t-table.

Z distribution vs t distribution

You may see the Leptokurtic kurtosis shape of the t-distribution (DF=4), compares to the Normal distribution (Z).
As with any distribution, the area of both distributions equals one. The normal distribution is higher close to the center, while the t-distribution is higher on the tails.

Table of Contents Show

Z distribution vs t distribution
Z-test type I error - using sample standard deviation
Z-test type I error - using population standard deviation
About Hypothesis Testing
What is p-value?
Significance of p-value
One-tailed Test
Two tailed Test
What is Z-test?
One-sample Z-test
Two-sample Z-test
What is T-test?
One sample T-test
Two-sample T-test
T-test vs Z-test

Z-test type I error - using sample standard deviation

The following simulation ran over 300,000 samples of a normal population and compares the sample mean to the true mean, using the t-test and the z-test with a significance level of 0.05.
In both tests, we use the sample standard deviation.
Since the null assumption is correct, we expect the type I error, the probability to reject the correct H0, to be 0.05. (as this is the significance level definition). For the z-test, even for a sample size of 30, the type I error is ~0.06 instead of 0.05, this means that in the simulation we rejected 0.06 of the cases, instead of 0.05!
For the t-test, the type I error is around 0.05, as expected.The following charts show the actual type I error in the simulation

Blue Z - The actual type I error for the z-test when using the sample standard deviation.

Green T - The actual type I error for the t-test.

T vs Z - type I error chart

Following the simulation results of type I error, when using sample standard deviation in z-test.

Sample Size	Type I error
4	0.1443
5	0.1214
6	0.1069
7	0.0973
8	0.0903
10	0.082
12	0.0758
15	0.0697
17	0.068
20	0.0652
25	0.0615
30	0.0595
35	0.0589
40	0.0573
45	0.0563
50	0.0558
60	0.0546
80	0.0523

Z-test type I error - using population standard deviation

The following simulation ran over 300,000 samples of a normal population and compared the sample mean. This time we know the population standard deviation.
Blue Z (S) - The actual type I error for the z-test when using the sample standard deviation.
Red Z (σ) - The actual type I error for the z-test when using the population standard deviation.
We should use the more accurate population standard deviation, and not the estimated sample standard deviation, and the next simulation chart is as expected.

Why the following chart looks the same as the t-test vs z-test - type I error chart?

The green line, in the previous chart, shows the type I error for the t-test when using the correct test (sample S).The red line, in the current chart, shows the type I error for the z-test when using the correct test (σ).

We expect that For any statistical test, the type I error will be around the significance level (α0).

Z-test type I error chart

library(BSDA)reps < - 300000 # number of simulationsn1 < - 100; # sample size#populationsigma1 < - 12# true SDmu1 < - 40# true meann_vec < -c(4,5,6,7,8,10,12,15,17,20,25,30,35,40,45,50,60,80)pvt < - numeric (length(n_vec))pvz < - numeric(length(n_vec))j=1for (n1 in n_vec) # sample size{pvalues_t < - numeric(reps)pvalues_z < - numeric(reps)set.seed(1)for (i in 1:reps) {x1 < - rnorm(n1, mu1, sigma1) #take a smaples1=sd(x1)pvalues_t[i] < - t.test(x1,x2=NULL,mu = mu1,alternative="two.sided")$p.valuepvalues_z[i] < - z.test(x1, y=NULL, alternative = "two.sided", mu = mu1, sigma.x = s1)$p.valuepvalues_z[i] < - z.test(x1, y=NULL, alternative = "two.sided", mu = mu1, sigma.x = sigma1)$p.value}pvt[j] < - mean(pvalues_t < 0.05)pvz[j] < - mean(pvalues_z < 0.05)j=j+1

}

Z-test with sample S vs z-test with σ

We used the same code, but instead of t-test we used z-test with sigma1:
pvalues_z0[i] < - z.test(x1, y=NULL, alternative = "two.sided", mu = mu1, sigma.x = sigma1)$p.value

Are the observed changes in mean statistically significant?

This is perhaps a major consideration while making a critical hypothesis that gives a perfect analysis for a condition. Such analysis are the excellent candidates for hypothesis testing, or in other words, significance testing.

For testing the hypotheses various test statistics are performed, such as t-test and z-test, and that will be the main course of discussion during the blog.

We will cover main topics as;

About Hypothesis Testing
What is Z-test?
What is T-test?
Z-test vs T-test
Conclusion

About Hypothesis Testing

Let’s start with a simple situation: you are a company, monitoring the daily clicks on blogs and want to analyze whether the outcomes of the current month are different from the previous month’s outcomes.

For example, are they different due to a particular marketing campaign, or any other reason.

In order to check this piece of activity, hypothesis testing is performed in terms of null hypothesis and alternative hypothesis.

Hypotheses are the predictive statements that are capable of being tested in order to give connections between an independent variable and some dependent variables.

Here, the question to be researched for is converted into;

Null hypothesis (H0), it states that there is “no difference,” and
Alternative hypothesis (H1), it states that there is “the difference in population”.

Assuming that average clicks on blogs is 2000 per day before marketing campaign, you believe that population has now higher average clicks due to this campaign, such that

Here the observed mean is >2000, and expected population mean is 2000. Next step would be to run test statistics that compare the value of both means.

(Related blog: What is Confusion Matrix?)

What is p-value?

The calculated value of the test statistic is converted into a p-value that explains whether the outcome is statistically significant or not.

For a brief, a p-value is the probability that the outcomes, from sample data, have occurred by chance, and varies from 0% to 100%. In general, these values are written in decimal format, like a p-value of 5% is written as 0.05.

Lower p-values are considered to be favorable, as they indicate that data didn’t happen by chance.

For example, if p-value is 0.01, it means that there is 1% probability that, from an event, the results have appeared by chance. However, a p-value of 0.05 is ideally acceptable, signifying that data is valid.

Here, the test statistic is a numerical summary of the data which is compared to what would be expected under null hypothesis.

It can take many forms such as t-test (usually used when the dataset is small) or z-test etc (preferred when the dataset is large), or ANOVA test, etc.

Level of significance is the amount of some percentage that is required to reject a null hypothesis when it is true, it is denoted by 𝝰 (alpha). In general, alpha is taken as 1%, 5% and 10%.

Confidence level: (1-𝝰) is accounted as confidence level in which null hypothesis exists when it is true.

For instance, assuming the level of significance as 0.05, then smaller the p-value (generally p≤ 0.05), rejecting the null hypothesis. As this is a substantial confirmation against the null hypothesis that proves it is invalid.

Also, if the p-value is greater than 0.05, accepting the null hypothesis. As this gives evidence that alternate hypothesis is weak therefore null hypothesis can be accepted.

(Suggested blog: Mean, median, & mode)

Significance of p-value

The p-value is only a piece of information that signifies the null hypothesis is valid or not.

Ideally, following rules are used in determining whether to support or reject the null hypothesis;

If p > 0.10 : the observed difference is “not significant”
If p ≤ 0.10 : the observed difference is “marginally significant”
If p ≤ 0.05 : the observed difference is “significant”
If p ≤ 0.01 : the observed difference is “highly significant.”

(Must read: What is Precision, Recall & F1 Score in Statistics?)

One-tailed Test

At the level of significance as 0.05, a one-tailed test allows the alpha to test the statistical significance in one single direction of interest, this simply implies that alpha = 0.05 is at the one tail of distribution of test statistics.

A test is one-tailed when the alternative hypothesis is stated in terms of “less than” or “greater than”, but not both. A direction must be selected before testing.

It tells the effect of changes in one direction only, not in another direction.

One- tailed test can be performed in two forms, i.e.,

It is used when

Left tailed test

It is used when;

Right tailed test

Two tailed Test

While taking the significance level as 0.05, a two-tailed test allows half of the alpha level to test statistical significance at one single direction and half alpha level in another direction such that significance level of 0.025 in each tail of the distribution of test statistics.

Two tailed test

In two tailed tests, we test the hypothesis when the alternate hypothesis is not in the form of greater than or less than. When an alternate hypothesis is defined as there is difference in values (such as means of the sample), or observed value is not equal to the expected value.

Where a specific direction needs not to be defined before testing, a two-tailed test also takes into consideration the chances of both a positive and a negative effect.

(Suggested blog: Conditional Probability)

What is Z-test?

Z-test is the statistical test, used to analyze whether two population means are different or not when the variances are known and the sample size is large.

This test statistic is assumed to have a normal distribution, and standard deviation must be known to perform an accurate z-test.

A z-statistic, or z-score, is a number representing the value’s relationship to the mean of a group of values, it is measured with population parameters such as population standard deviation and used to validate a hypothesis.

For example, the null hypothesis is “sample mean is the same as the population mean”, and alternative hypothesis is “the sample mean is not the same as the population mean”.

(Also check: Importance of Statistics and Probability in Data Science)

One-sample Z-test

The z-statistics refers to the statistics computed for testing hypotheses, such that,

Given: From normally distributed population, a random sample of size n is selected with population mean μ and variance σ2, and
A sample mean X with sample size is greater than 30.

Two-sample Z-test

The above formula is used for one sample z-test, if you want to run two sample z-test, the formula for z-statistic is

(Read blog: Data Types in Statistics)

What is T-test?

In order to know how significant the difference between two groups are, T-test is used, basically it tells that difference (measured in means) between two separate groups could have occurred by chance.

This test assumes to have a normal distribution while based on t-distribution, and population parameters such as mean, or standard deviation are unknown.

The ratio between the difference between two groups and the difference within the group is known as T-score. Greater is the t-score, more is the difference between groups, and smaller is the t-score, more similarities are there among groups.

For example, a t-score value of 2 indicates that the groups are two times as different from each other as they are with each other.

(Must read: What is A/B Testing?)

Also, after running t-test, if the larger t-value is obtained, it is highly likely that the outcomes are more repeatable, such that

A larger t-score states that groups are different
A smaller t-score states that groups are similar.

Mainly, there are three types of t-test:

An Independent Sample t-test, compare the means for two groups.
A Paired Sample t-test, compare means from the same group but at different times, such as six months apart.
A One Sample t-test, test a mean of a group against the known mean.

One sample T-test

The t-statistics refers to the statistics computed for hypothesis testing when

Population variance is unknown with sample size is smaller than 30.
Sample standard deviation is used at place of population standard deviation, and,
The sample distribution must either be normal or approximately normal.

Two-sample T-test

T-test vs Z-test

It is certainly a tricky choice that a particular test statistics would be selected in what conditons, in the below diagram, a comparison is demonstrated between z-test and t-test relying on specific conditions.

Comparing T-test and Z-test

As the sample size differs from analysis to analysis, a suitable test for hypothesis testing can be adopted for any sample size. For example, z-test is used for it when sample size is large, generally n >30.

Whereas t-test is used for hypothesis testing when sample size is small, usually n < 30 where n is used to quantify the sample size.

The t-test is the statistical test that can be deployed to measure and analyze whether the means of two different populations are different or not when the standard deviation is not known.

The z-test is the parametric test, implemented to determine if the means of two different datasets are different from each other, when the standard deviation is known.

Both t-test and z-test employ the different use of distribution to correlate values and make conclusions in terms of hypothesis testing.

Notably, t-test is based on the Student’s t-distribution, and the z-test counts on Normal Distribution.

(Related blog: What is Statistics?)

Implementing both tests in testing of hypothesis, population variance is significant in obtaining the t-score and z-score.

While the population variance in the z-test is known, it is unknown in the t-test.

Some major assumptions are considered while conducting either t-test or z-test.

In a t-test,

All data points are assumed to be not dependent.
Samples values are taken and recorded accurately.
Work on smaller sample size, n should not exceed thirty but also shouldn't be less than five.

In the z-test,

All data points are independent,
Sample size is assumed to be large, n should have exceeded thirty.
Normal distribution for z with mean zero and variance as one.

Conclusion

The t-test and z-test are the substantive tests in determining the significance difference between sample and population. While the formulas are similar, the selection of a particular test relies on sample size and the standard deviation of population.

From the above discussion, we can conclude that t-test and z-test are relatively similar, but their applicability is different such as the fundamental difference is that the t-test is applicable when sample size is less than 30 units, and z-test is practically conducted when size of the sample crosses the 30 units.

(Must read: Clustering Methods and Applications)

Similarly, there are other essential differences as well which have been seen in the blog. We hope this made a clear understanding of the differences between the both z-test and t-test.