10. Comparing Two Population Means

10.3 Difference between Two Variances – the F Distributions

Here we have to assume that the two populations (as opposed to sample mean distributions) have a distribution that is almost normal as shown in Figure 10.2.

Figure 10.2: Two normal populations lead to two \chi^{2} distributions that represent distributions of sample variances. The F distribution results when you build up a distribution of the ratio of the two \chi^{2} sample values.

The ratio \frac{s_{1}^{2}}{s_{2}^{2}} follows an F-distribution if \sigma_{1} = \sigma_{2}. That F distribution has two degrees of freedom: one for the numerator (d.f.N. or \nu_{1}) and one for the denominator (d.f.D. or \nu_{2}). So we denote the distribution more specifically as F_{\nu_{1}, \nu_{2}}. For the case of Figure 10.2, \nu_{1} = n_{1} - 1 and \nu_{2} = n_{2} - 1. The F ratio, in general is the result of the following stochastic process. Let X_{1} be random variable produced by a stochastic process with a \chi^{2}_{\nu_{1}} distribution and let X_{2} be random variable produced by a stochastic process with a \chi^{2}_{\nu_{2}} distribution. Then the random variable F = X_{1}/X_{2} will, by definition, have a F_{\nu_{1}, \nu_{2}} distribution.

The exact shape of the F_{\nu_1, \nu_2} distribution depends on the choice of \nu_1 and \nu_2, But it roughly looks like a \chi^2 distribution as shown in Figure 10.3.


Figure 10.3: A generic F distribution.

F and t are related :

    \[F_{1,\nu} = t^{2}_{\nu}\]

so the t statistic can be viewed as a special case of the F statistic.

For comparing variances, we are interested in the follow hypotheses pairs :

Right-tailed Left-tailed Two-tailed
H_0: \sigma^2_1 \leq \sigma^2_2 H_0: \sigma^2_1 \geq \sigma^2_2 H_0: \sigma^2_1 = \sigma^2_2
H_1: \sigma^2_1 > \sigma^2_2 H_1: \sigma^2_1 < \sigma^2_2 H_1: \sigma^2_1 \neq \sigma^2_2

We’ll always compare variances (\sigma^2) and not standard deviations (\sigma) to keep life simple.

The test statistic is

    \[ F_{\rm test} = F_{\nu_1, \nu_2} = \frac{s^{2}_{1}}{s^{2}_{2}} \]

where (for finding the critical statistic), \mu_{1} = n_{1} - 1 and \mu_{2} = n_{2} - 1.

Note that F_{\nu_1, \nu_2} = 1 when s_{1}^{2}=s_{2}^{2}, a fact you can use to get a feel for the meaning of this test statistic.

Values for the various F critical values are given in the F Distribution Table in the Appendix. We will denote a critical value of F with the notation :

    \[F_{\rm crit} = F_{\alpha, \hspace{.1in}\nu_{1}, \hspace{.1in} \nu_2}\]


\alpha = Type I error rate
\nu_{1} = d.f.N.
\nu_{2} = d.f.D.

The F Distribution Table gives critical values for small right tail areas only. This means that they are useless for a left-tailed test. But that does not mean we cannot do a left-tail test. A left-tail test is easily converted into a right tail test by switching the assignments of populations 1 and 2. To get the assignments correct in the first place then, always define populations 1 and 2 so that \sigma^{2}_{1} > \sigma^{2}_{2}. Assign population 1 so that it has the largest sample variance. Do this even for a two-tail test because we will have no idea what F_{\rm crit} on the left side of the distribution is.

Example 10.3 : Given the following data for smokers and non-smokers (maybe its about some sort of disease occurrence, who cares, let’s focus on dealing with the numbers), test if the population variances are equal or not at \alpha = 0.05.

Smokers Nonsmokers
n_{1} = 26 n_{2} = 18
s_{1}^{2}=36 s_{2}^{2}=10

Note that s_{1}^{2} > s_{2}^{2} so we’re good to go.

Solution :

1. Hypothesis.

    \begin{equation*} $H_0: \sigma^2_1 = \sigma^2_2$ \\ $H_1: \sigma^2_1 \neq \sigma^2_2$ \end{equation*}

2. Critical statistic.

Use the F Distribution Table; it is a bunch of tables labeled by “\alpha” that we will designate at \alpha_{T}, the table values that signify right tail areas. Since this is a two-tail test, we need alpha_{T} = \alpha/2. Next we need the degrees of freedom:

    \[\mbox{d.f.N.} = \nu_{1} = n_{1} - 1 = 26-1 = 25\]

    \[\mbox{d.f.D.} = \nu_{2} = n_{2} - 1 = 18-1 = 17\]

So the critical statistic is

    \[F_{\rm crit} = F_{\alpha/2, \nu_1, \nu_2} = F_{0.05/2, 25, 7} = F_{0.025, 25, 17} = 2.56.\]

3. Test statistic.

    \[F_{\nu_1, \nu_2} = \frac{s^2_1}{s^2_2}\]

    \[F_{\rm test} = F_{25, 17} = \frac{36}{10} = 3.6\]

With this test statistic, we can estimate the p-value using the F Distribution Table. To find p, look up all the numbers with d.f.N = 25 and d.f.N = 17 (24 \& 17 are the closest in the tables so use those) in all the the F Distribution Table and form your own table. For each column in your table record \alpha_{T} and the F value corresponding to the degrees of freedom of interest. Again, \alpha_{T} corresponds to p/2 for a two-tailed test. So make a row above the \alpha_{T} row with p = 2 \alpha_{T}. (For a one-tailed test, we would put p=\alpha_{T}.)

0.20      0.10      0.05      0.02      0.01
0.10       0.05     0.025    0.01      0.005
F 1.84       2.19      2.56       3.08      3.51         3.6 is over here somewhere so p<0.01

Notice how we put an upper limit on p because F_{\rm test} was larger than all the F values in our little table.

Let’s take a graphical look at why we use p=2\alpha in the little table and \alpha_{T} = \alpha/2 for finding F_{\rm crit} for two tailed tests :


But in a two-tailed test we want \alpha split on both sides:


4. Decision.

Reject H_{0}. The p-value estimate supports this :

    \[ ( p < 0.01) < (\alpha = 0.05) \]

5. Interpretation.

There is enough evidence to conclude, at \alpha = 0.05 with an F-test, that the variance of the smoker population is different from the non-smoker population.