12. ANOVA

12.2 Post hoc Comparisons

If H_{0} is rejected in a one-way ANOVA, you will frequently want to know where the differences in the means are. For example if we tested H_{0}: \mu_{1} = \mu_{2} = \mu_{3} and rejected H_{0} in a one-way ANOVA then we will want to know if \mu_{1} \neq \mu_{2} or \mu_{2} \neq \mu_{3}, etc.

To see which means are different after doing an ANOVA we could just compare all possible combinations of pairs using t-tests. But such an approach is no good because the assumed type I error rates, \alpha, associated with the t-tests would be wrong. The \alpha rate would be higher because in making such multiple comparisons you incur a greater chance of making an error.

So we need to correct our test statistic and/or the corresponding \alpha value when we do such multiple comparisons. We will cover two such multiple comparison approaches in detail :

  1. Scheffé test
  2. Tukey test

and we will look at the Bonferroni approach.

Doing multiple comparisons after an ANOVA is known as post hoc testing. It is the traditional approach for comparing several means. The opening “omnibus” ANOVA lets you know if there are any differences at all. If you fail to reject the ANOVA H_{0} then you are done. Only when you reject H_{0} do you put in the effort of comparing means pairwise. This traditional approach, designed to minimize the necessary calculations, is not the only way to compare multiple means. The other approach is to forget about the ANOVA and then use t-tests to compare means pairwise on in combinations[1] of means until you use up the N degrees of freedom in the dataset. Here we will stick with the traditional approach.

12.2.1 Scheffé test

The test statistic for the Scheffé test is

    \[ F_{s,{\rm test}} = F_{s} = \frac{(\bar{x}_{i} - \bar{x}_{j})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{i}} + \frac{1}{n_{j}} \right)} \]

Note that F_{s} is basically a t^{2} quantity (recall that F_{1,\nu} = t^{2}) but with a pooled estimate s_{p}^{2} of the common population variance \sigma given by the value of s_{\rm W}^{2} from the ANOVA. In other words F_{s} uses information from all of the data to estimate \sigma instead of from just groups i and j as a t-test would (see Equation 10.5). Note that the Scheffé test does not require equal group sizes n_{i}.

The critical statistic is a modification of the critical statistic from the ANOVA is

    \[ F^{\prime}_{\rm crit} = F^{\prime} = (k-1) F_{\alpha,\nu_{1}, \nu_{2}} = (k-1) F_{\rm crit, ANOVA} \]

where \nu_{1} and \nu_{2} are the ANOVA degrees of freedom. The critical statistic is the same for all pairwise comparisons regardless of the sample sizes, n_{i} and n_{j}, of the pair of groups being compared.

Example 12.2 : The ANOVA of Example 12.1 found that at least one of the three means was different from the others. Use the Scheffé test to find the significant differences between the means. There has to be at least one.

Solution :

0. Data reduction.

Collect the necessary information from the omnibus ANOVA. We’ll need:

    \[ n_{1}=n_{2}=n_{3} = 6 \hspace*{2em} \bar{x}_{1} = 15.5,\: \bar{x}_{2} = 4.0,\: \bar{x}_{3} = 5.83 \]

    \[ s_{\rm W}^{2} = 45.5 \hspace*{2em} F_{\rm crit, ANOVA} = F_{0.05, 2, 15} = 3.68 \]

1. Hypotheses.

There are 3 hypotheses pairs to test :

    \[H_{0}: \mu_{1} = \mu_{2}, \hspace{1cm} H_{0}: \mu_{1} = \mu_{3}, \hspace{1cm} H_{0}: \mu_{2} = \mu_{3}\]

    \[H_{1}: \mu_{1} \neq \mu_{2}, \hspace{1cm} H_{1}: \mu_{1} \neq \mu_{3}, \hspace{1cm} \[H_{1}: \mu_{2} \neq \mu_{3}\]

2. Critical statistic.

One value for all three hypothesis tests:

    \[F_{\rm crit} = (k-1) F_{0.05, 2, 15} = (3-1)(3.68) = (2)(3.68) = 7.367\]

3. Test statistic.

There are three of them:

\mu_{1} vs. \mu_{2} :

    \[F_{s} = \frac{(\bar{x}_{1} - \bar{x}_{2})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{1}} + \frac{1}{n_{2}}  \right)}=\frac{(15.5 - 4.0)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6}  \right)} = 8.72\]

\mu_{1} vs. \mu_{3} :

    \[F_{s} = \frac{(\bar{x}_{1} - \bar{x}_{3})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{1}} + \frac{1}{n_{3}}  \right)}=\frac{(15.5 - 5.83)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6}  \right)} = 6.17\]

\mu_{2} vs. \mu_{3} :

    \[F_{s} = \frac{(\bar{x}_{2} - \bar{x}_{3})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{2}} + \frac{1}{n_{3}}  \right)} =\frac{(4.0 - 5.83)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6}  \right)} = 0.22\]

4. Decision.

For \mu_{1} vs. \mu_{2}, reject H_{0}. For \mu_{1} vs. \mu_{3}, do not reject H_{0}. For \mu_{2} vs. \mu_{3}, do not reject H_{0}.

5. Interpretation.

The results of the Scheffé test at \alpha = 0.05 conclude that only the mean numbers of interchange employees between toll roads 1 and 2 are significantly different.

12.2.2 Tukey Test

The test statistic for the Tukey test is

    \[q = \frac{|\bar{x}_{i} - \bar{x}_{j}|}{\sqrt{s_{\rm W}^{2}/n}}\]

where, again, s_{\rm W}^{2} is from the omnibus ANOVA, \bar{x}_{i} is the mean of group i and we must have equal sample sizes for all groups: n = n_{i} for all i. There is a Tukey test statistic for unequal n, and it is used by SPSS, but we won’t cover that here.

The critical statistic, q_{\mbox{crit}}, comes from a table of critical values from a new distribution called the q distribution. The critical values are tabulated in the Tukey Test Critical Values Table. To use this table, you need two numbers going in :

  1. k = number of groups
  2. \nu = \nu_{2} = degrees of freedom for s_{W}^{2}
Reject H_{0} when q > q_{\mbox{crit}}. In this case we don’t have a picture of the q distribution handy (although it is basically the absolute value of t), so we just use the q > q_{\mbox{crit}} rule similar to how we use the p-value.

Example 12.3 : Repeat Example 12.2 using the Tukey test instead of the Scheffé test.

Solution : 0. Data Reduction.

We use the same data from the omnibus ANOVA :

    \[ n_{1}=n_{2}=n_{3} = 6 \hspace*{2em} \bar{x}_{1} = 15.5,\: \bar{x}_{2} = 4.0,\: \bar{x}_{3} = 5.83 \]

    \[ s_{\rm W}^{2} = 45.5 \hspace*{2em} F_{\rm crit, ANOVA} = F_{0.05, 2, 15} = 3.68 \]

1. Hypotheses.

The 3 hypotheses pairs to test are the same :

    \[H_{0}: \mu_{1} = \mu_{2}, \hspace{1cm} H_{0}: \mu_{1} = \mu_{3}, \hspace{1cm} H_{0}: \mu_{2} = \mu_{3}\]

    \[H_{1}: \mu_{1} \neq \mu_{2}, \hspace{1cm} H_{1}: \mu_{1} \neq \mu_{3}, \hspace{1cm} \[H_{1}: \mu_{2} \neq \mu_{3}\]

2. Critical statistic.

Use the Tukey Test Critical Values Table. Go into the table with

  • Number of groups = k = 3.
  • \nu = \nu_{2} = N - k = nk - k = (6)(3) - 3 = 18 - 3 = 15.

and \alpha = 0.05 to find

    \[ q_{\mbox{crit}} = 3.67 \]

3. Test statistic.

Again, there are three of them :

\mu_{1} vs. \mu_{2} :

    \[ q = \frac{| \overline{x}_{1} - \overline{x}_{2} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 15.5 - 4.0 |}{\sqrt{45.5/6}} = 4.14 \]

\mu_{1} vs. \mu_{3}:

    \[ q = \frac{| \overline{x}_{1} - \overline{x}_{3} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 15.5 - 5.83 |}{\sqrt{45.5/6}} = 3.51 \]

\mu_{2} vs. \mu_{3}:

    \[ q = \frac{| \overline{x}_{2} - \overline{x}_{3} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 4.0 - 5.83 |}{\sqrt{45.5/6}} = 0.66 \]

4. Decision.

Reject H_{0} when q > q_{\mbox{crit}}. This only happens for one hypothesis pair : For \mu_{1} vs. \mu_{2}, reject H_{0}. For \mu_{1} vs. \mu_{3}, do not reject H_{0}. For \mu_{2} vs. \mu_{3}, do not reject H_{0}.

5. Interpretation.

The results of the Tukey test at \alpha = 0.05 conclude that only the mean numbers of interchange employees between toll roads 1 and 2 are significantly different. (Same result as the Scheff\'{e} test. Usually this happens but when it doesn’t, you need to use some kind of non-mathematical judgement.)

12.2.3 Bonferroni correction

A more conservative (less power) approach to multiple comparisons (post hoc testing) is to use Bonferroni’s method. The fundamental idea of the Bonferroni correction is to add the probabilities of making individual type I errors to get an overall type I error rate.
Implementing the idea is simple. Do a bunch of t-tests and multiply the p-value by a correction factor C. There are a number of ways to choose C (you will have to dig to find out which method SPSS uses). The easiest (and most conservative) is to set C equal to the number of pairwise comparisons done. So if you have k groups then C is given by the binomial coefficient:

    \[ C = \left( \begin{array}{c} k \\ 2 \end{array} \right). \]

Another way is to look at the total degrees of freedom, \nu_{\mbox{pairs}}, associated with the pairwise t-tests and compare it to the total degrees of freedom in the data, \nu = N (or one could argue \nu = N-1), to come up with

    \[ C = \frac{\nu_{\mbox{pairs}} + \nu}{\nu}. \]

Since there is some ambiguity as to what we should use for C, we will not do Bonferroni post hoc testing by hand. However, be able to recognize Bonferroni results in SPSS, treating the value of C as an SPSS blackbox parameter.

 

 

 


  1. Combinations of means may be compared using "contrasts". For example \mu_{1} + \mu_{2} might be compared with 2\mu_{3}. Contrasts are not covered in Psy 234.