Because our inferences about the population mean rely on the sample mean, we focus on the distribution of the sample mean. Is it normal? What if our population is not normally distributed or we don’t know anything about the distribution of our population? Show
The Central Limit Theorem states that the sampling distribution of the sample means will approach a normal distribution as the sample size increases.
The Central Limit Theorem tells us that regardless of the shape of our population, the sampling distribution of the sample mean will be normal as the sample size increases. Sampling Distribution of the Sample ProportionThe population proportion (p) is a parameter that is as commonly estimated as the mean. It is just as important to understand the distribution of the sample proportion, as the mean. With proportions, the element either has the characteristic you are interested in or the element does not have the characteristic. The sample proportion (p̂) is calculated by where x is the number of elements in your population with the characteristic and n is the sample size. Example 1You are studying the number of cavity trees in the Monongahela National Forest for wildlife habitat. You have a sample size of n = 950 trees and, of those trees, x = 238 trees with cavities. The sample proportion is: The distribution of the sample proportion has a mean of and has a standard deviation of . The sample proportion is normally distributed if n is very large and isn’t close to 0 or 1. We can also use the following relationship to assess normality when the parameter being estimated is p, the population proportion: Confidence IntervalsIn the preceding chapter we learned that populations are characterized by descriptive measures called parameters. Inferences about parameters are based on sample statistics. We now want to estimate population parameters and assess the reliability of our estimates based on our knowledge of the sampling distributions of these statistics. Point EstimatesWe start with a point estimate. This is a single value computed from the sample data that is used to estimate the population parameter of interest.
We use point estimates to construct confidence intervals for unknown parameters.
Example 2We are 95% confident that our interval contains the population mean bear weight. If we created 100 confidence intervals of the same size from the same population, we would expect 95 of them to contain the true parameter (the population mean weight). We also expect five of the intervals would not contain the parameter. Figure 1. Confidence intervals from twenty-five different samples. In this example, twenty-five samples from the same population gave these 95% confidence intervals. In the long term, 95% of all samples give an interval that contains µ, the true (but unknown) population mean. Level of confidence is expressed as a percent.
What does this really mean?
< μ < or < p < where E is the margin of error. The confidence is based on area under a normal curve. So the assumption of normality must be met (see Chapter 1). Confidence Intervals about the Mean (μ) when the Population Standard Deviation (σ) is KnownA confidence interval takes the form of: point estimate ± margin of error. The point estimate
The margin of error
The critical value
Figure 2. The middle 95% area under a standard normal curve.
Table 1. Common critical values (Z-scores). Construction of a confidence interval about μ when σ is known:
Example 3Construct a confidence interval about the population mean. Researchers have been studying p-loading in Jones Lake for many years. It is known that mean water clarity (using a Secchi disk) is normally distributed with a population standard deviation of σ = 15.4 in. A random sample of 22 measurements was taken at various points on the lake with a sample mean of x̄ = 57.8 in. The researchers want you to construct a 95% confidence interval for μ, the mean water clarity. 1) = 1.96 2) = 3) = 57.8 ± 6.435 95% confidence interval for the mean water clarity is (51.36, 64.24). We can be 95% confident that this interval contains the population mean water clarity for Jones Lake. Now construct a 99% confidence interval for μ, the mean water clarity, and interpret. 1) = 2.575 2) = 3) = 57.8± 8.454 99% confidence interval for the mean water clarity is (49.35, 66.25). We can be 99% confident that this interval contains the population mean water clarity for Jones Lake. As the level of confidence increased from 95% to 99%, the width of the interval increased. As the probability (area under the normal curve) increased, the critical value increased resulting in a wider interval. Software SolutionsMinitabYou can use Minitab to construct this 95% confidence interval (Excel does not construct confidence intervals about the mean when the population standard deviation is known). Select Basic Statistics>1-sample Z. Enter the known population standard deviation and select the required level of confidence. Figure 3. Minitab screen shots for constructing a confidence interval. One-Sample Z: depthThe assumed standard deviation = 15.4 Variable N Mean StDev SE Mean 95% CI depth 22 57.80 11.60 3.28 (51.36, 64.24) Confidence Intervals about the Mean (μ) when the Population Standard Deviation (σ) is UnknownTypically, in real life we often don’t know the population standard deviation (σ). We can use the sample standard deviation (s) in place of σ. However, because of this change, we can’t use the standard normal distribution to find the critical values necessary for constructing a confidence interval. The Student’s t-distribution was created for situations when σ was unknown. Gosset worked as a quality control engineer for Guinness Brewery in Dublin. He found errors in his testing and he knew it was due to the use of s instead of σ. He created this distribution to deal with the problem of an unknown population standard deviation and small sample sizes. A portion of the t-table is shown below. Table 2. Portion of the student’s t-table. Example 4Find the critical value for a 95% confidence interval with a sample size of n=13.
The critical values from the students’ t-distribution approach the critical values from the standard normal distribution as the sample size (n) increases. Table 3. Critical values from the student’s t-table. Using the standard normal curve, the critical value for a 95% confidence interval is 1.96. You can see how different samples sizes will change the critical value and thus the confidence interval, especially when the sample size is small. Construction of a Confidence Interval about μ when σ is Unknown
Example 5Researchers studying the effects of acid rain in the Adirondack Mountains collected water samples from 22 lakes. They measured the pH (acidity) of the water and want to construct a 99% confidence interval about the mean lake pH for this region. The sample mean is 6.4438 with a sample standard deviation of 0.7120. They do not know anything about the distribution of the pH of this population, and the sample is small (n<30), so they look at a normal probability plot. Figure 4. Normal probability plot. The data is normally distributed. Now construct the 99% confidence interval about the mean pH. 1) = 2.831 2) = = 0.4297 3) = 6.443 ± 0.4297 The 99% confidence interval about the mean pH is (6.013, 6.863). We are 99% confident that this interval contains the mean lake pH for this lake population. Now construct a 90% confidence interval about the mean pH for these lakes. 1) = 1.721 2) = = 0.2612 3) = 6.443 ± 0.2612 The 90% confidence interval about the mean pH is (6.182, 6.704). We are 90% confident that this interval contains the mean lake pH for this lake population. Notice how the width of the interval decreased as the level of confidence decreased from 99 to 90%. Construct a 90% confidence interval about the mean lake pH using Excel and Minitab. Software SolutionsMinitabFor Minitab, enter the data in the spreadsheet and select Basic statistics and 1-sample t-test. One-Sample T: pHVariableNMeanStDevSE Mean90% CIpH22 6.443 0.712 0.152 (6.182, 6.704) Additional example: For Excel, enter the data in the spreadsheet and select descriptive statistics. Check Summary Statistics and select the level and confidence.ExcelMean 6.442909 Standard Error 0.151801 Median 6.4925 Mode #N/A Standard Deviation 0.712008 Sample Variance 0.506956 Kurtosis -0.5007 Skewness -0.60591 Range 2.338 Minimum 5.113 Maximum 7.451 Sum 141.744 Count 22 Confidence Level(90.0%) 0.26121 Excel gives you the sample mean in the first line (6.442909) and the margin of error in the last line (0.26121). You must complete the computation yourself to obtain the interval (6.442909±0.26121). Confidence Intervals about the Population Proportion (p)Frequently, we are interested in estimating the population proportion (p), instead of the population mean (µ). For example, you may need to estimate the proportion of trees infected with beech bark disease, or the proportion of people who support “green” products. The parameter p can be estimated in the same ways as we estimated µ, the population mean. The Sample Proportion
The Assumption of Normality when Estimating Proportions
Constructing a Confidence Interval about the Population ProportionConstructing a confidence interval about the proportion follows the same three steps we have used in previous examples.
Example 6A botanist has produced a new variety of hybrid soybean that is better able to withstand drought. She wants to construct a 95% confidence interval about the germination rate (percent germination). She randomly selected 500 seeds and found that 421 have germinated. First, compute the point estimate Check normality: You can assume a normal distribution. Now construct the confidence interval: 1) = 1.96 2) = 3) The 95% confidence interval for the germination rate is (81.0%, 87.4%). We can be 95% confident that this interval contains the true germination rate for this population. Software SolutionsMinitabYou can use Minitab to compute the confidence interval. Select STAT>Basic stats>1-proportion. Select summarized data and enter the number of events (421) and the number of trials (500). Click Options and select the correct confidence level. Check “test and interval based on normal distribution” if the assumption of normality has been verified. Test and CI for One ProportionSampleXNSample p95% CI14215000.842000(0.810030, 0.873970)Using the normal approximation. ExcelExcel does not compute confidence intervals for estimating the population proportion. Confidence Interval SummaryWhich method do I use? The first question to ask yourself is: Which parameter are you trying to estimate? If it is the mean (µ), then ask yourself: Is the population standard deviation (σ) known? If yes, then follow the next 3 steps: Confidence Interval about the Population Mean (µ) when σ is Known
If no, follow these 3 steps: Confidence Interval about the Population Mean (µ) when σ is Unknown
If you want to construct a confidence interval about the population proportion, follow these 3 steps: What do sampling distributions describe the distribution of?A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population. It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.
What do you mean by parameters and statistic in sampling distribution?A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean).
Do sampling distributions describe parameters?The sampling distribution of a (sample) statistic is important because it enables us to draw conclusions about the corresponding population parameter based on a random sample. For example, when we draw a random sample from a normally distributed population, the sample mean is a statistic.
What are distribution parameters in statistics?A parameter of a distribution is a number or a vector of numbers describing some characteristic of that distribution.
|