10. Comparing Two Population Means
10.3 Difference between Two Variances – the F Distributions
Here we have to assume that the two populations (as opposed to sample mean distributions) have a distribution that is almost normal as shown in Figure 10.2.
Figure 10.2: Two normal populations lead to two distributions that represent distributions of sample variances. The distribution results when you build up a distribution of the ratio of the two sample values.
The ratio follows an -distribution if . That distribution has two degrees of freedom: one for the numerator (d.f.N. or ) and one for the denominator (d.f.D. or ). So we denote the distribution more specifically as . For the case of Figure 10.2, and . The ratio, in general is the result of the following stochastic process. Let be random variable produced by a stochastic process with a distribution and let be random variable produced by a stochastic process with a distribution. Then the random variable will, by definition, have a distribution.
The exact shape of the distribution depends on the choice of and , But it roughly looks like a distribution as shown in Figure 10.3.
and are related :
so the statistic can be viewed as a special case of the statistic.
For comparing variances, we are interested in the follow hypotheses pairs :
Right-tailed | Left-tailed | Two-tailed |
We’ll always compare variances () and not standard deviations () to keep life simple.
The test statistic is
where (for finding the critical statistic), and .
Note that when , a fact you can use to get a feel for the meaning of this test statistic.
Values for the various critical values are given in the F Distribution Table in the Appendix. We will denote a critical value of with the notation :
Where:
= Type I error rate
= d.f.N.
= d.f.D.
The F Distribution Table gives critical values for small right tail areas only. This means that they are useless for a left-tailed test. But that does not mean we cannot do a left-tail test. A left-tail test is easily converted into a right tail test by switching the assignments of populations 1 and 2. To get the assignments correct in the first place then, always define populations 1 and 2 so that . Assign population 1 so that it has the largest sample variance. Do this even for a two-tail test because we will have no idea what on the left side of the distribution is.
Example 10.3 : Given the following data for smokers and non-smokers (maybe its about some sort of disease occurrence, who cares, let’s focus on dealing with the numbers), test if the population variances are equal or not at .
Smokers | Nonsmokers |
Note that so we’re good to go.
Solution :
1. Hypothesis.
2. Critical statistic.
Use the F Distribution Table; it is a bunch of tables labeled by “” that we will designate at , the table values that signify right tail areas. Since this is a two-tail test, we need . Next we need the degrees of freedom:
So the critical statistic is
3. Test statistic.
With this test statistic, we can estimate the -value using the F Distribution Table. To find , look up all the numbers with d.f.N = 25 and d.f.N = 17 (24 17 are the closest in the tables so use those) in all the the F Distribution Table and form your own table. For each column in your table record and the value corresponding to the degrees of freedom of interest. Again, corresponds to for a two-tailed test. So make a row above the row with . (For a one-tailed test, we would put .)
0.20 0.10 0.05 0.02 0.01 0.10 0.05 0.025 0.01 0.005 |
|
1.84 2.19 2.56 3.08 3.51 3.6 is over here somewhere so |
Notice how we put an upper limit on because was larger than all the values in our little table.
Let’s take a graphical look at why we use in the little table and for finding for two tailed tests :
But in a two-tailed test we want split on both sides:
4. Decision.
Reject . The -value estimate supports this :
5. Interpretation.
There is enough evidence to conclude, at with an -test, that the variance of the smoker population is different from the non-smoker population.
▢