12.2 Post hoc Comparisons

Gordon E. Sarty

12. ANOVA

12.2 Post hoc Comparisons

If $H_{0}$ is rejected in a one-way ANOVA, you will frequently want to know where the differences in the means are. For example if we tested $H_{0}: \mu_{1} = \mu_{2} = \mu_{3}$ and rejected $H_{0}$ in a one-way ANOVA then we will want to know if $\mu_{1} \neq \mu_{2}$ or $\mu_{2} \neq \mu_{3}$ , etc.

To see which means are different after doing an ANOVA we could just compare all possible combinations of pairs using $t$ -tests. But such an approach is no good because the assumed type I error rates, $\alpha$ , associated with the $t$ -tests would be wrong. The $\alpha$ rate would be higher because in making such multiple comparisons you incur a greater chance of making an error.

So we need to correct our test statistic and/or the corresponding $\alpha$ value when we do such multiple comparisons. We will cover two such multiple comparison approaches in detail :

Scheffé test
Tukey test

and we will look at the Bonferroni approach.

Doing multiple comparisons after an ANOVA is known as post hoc testing. It is the traditional approach for comparing several means. The opening “omnibus” ANOVA lets you know if there are any differences at all. If you fail to reject the ANOVA $H_{0}$ then you are done. Only when you reject $H_{0}$ do you put in the effort of comparing means pairwise. This traditional approach, designed to minimize the necessary calculations, is not the only way to compare multiple means. The other approach is to forget about the ANOVA and then use $t$ -tests to compare means pairwise on in combinations^[1] of means until you use up the $N$ degrees of freedom in the dataset. Here we will stick with the traditional approach.

12.2.1 Scheffé test

The test statistic for the Scheffé test is

$F_{s,{\rm test}} = F_{s} = \frac{(\bar{x}_{i} - \bar{x}_{j})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{i}} + \frac{1}{n_{j}} \right)}$

Note that $F_{s}$ is basically a $t^{2}$ quantity (recall that $F_{1,\nu} = t^{2}$ ) but with a pooled estimate $s_{p}^{2}$ of the common population variance $\sigma$ given by the value of $s_{\rm W}^{2}$ from the ANOVA. In other words $F_{s}$ uses information from all of the data to estimate $\sigma$ instead of from just groups $i$ and $j$ as a $t$ -test would (see Equation 10.5). Note that the Scheffé test does not require equal group sizes $n_{i}$ .

The critical statistic is a modification of the critical statistic from the ANOVA is

$F^{\prime}_{\rm crit} = F^{\prime} = (k-1) F_{\alpha,\nu_{1}, \nu_{2}} = (k-1) F_{\rm crit, ANOVA}$

where $\nu_{1}$ and $\nu_{2}$ are the ANOVA degrees of freedom. The critical statistic is the same for all pairwise comparisons regardless of the sample sizes, $n_{i}$ and $n_{j}$ , of the pair of groups being compared.

Example 12.2 : The ANOVA of Example 12.1 found that at least one of the three means was different from the others. Use the Scheffé test to find the significant differences between the means. There has to be at least one.

Solution :

0. Data reduction.

Collect the necessary information from the omnibus ANOVA. We’ll need:

$n_{1}=n_{2}=n_{3} = 6 \hspace*{2em} \bar{x}_{1} = 15.5,\: \bar{x}_{2} = 4.0,\: \bar{x}_{3} = 5.83$

$s_{\rm W}^{2} = 45.5 \hspace*{2em} F_{\rm crit, ANOVA} = F_{0.05, 2, 15} = 3.68$

1. Hypotheses.

There are 3 hypotheses pairs to test :

$H_{0}: \mu_{1} = \mu_{2}, \hspace{1cm} H_{0}: \mu_{1} = \mu_{3}, \hspace{1cm} H_{0}: \mu_{2} = \mu_{3}$

$H_{1}: \mu_{1} \neq \mu_{2}, \hspace{1cm} H_{1}: \mu_{1} \neq \mu_{3}, \hspace{1cm} \[H_{1}: \mu_{2} \neq \mu_{3}$

2. Critical statistic.

One value for all three hypothesis tests:

$F_{\rm crit} = (k-1) F_{0.05, 2, 15} = (3-1)(3.68) = (2)(3.68) = 7.367$

3. Test statistic.

There are three of them:

$\mu_{1}$ vs. $\mu_{2}$ :

$F_{s} = \frac{(\bar{x}_{1} - \bar{x}_{2})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}=\frac{(15.5 - 4.0)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6} \right)} = 8.72$

$\mu_{1}$ vs. $\mu_{3}$ :

$F_{s} = \frac{(\bar{x}_{1} - \bar{x}_{3})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{1}} + \frac{1}{n_{3}} \right)}=\frac{(15.5 - 5.83)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6} \right)} = 6.17$

$\mu_{2}$ vs. $\mu_{3}$ :

$F_{s} = \frac{(\bar{x}_{2} - \bar{x}_{3})^{2}}{s_{\rm W}^{2} \left( \frac{1}{n_{2}} + \frac{1}{n_{3}} \right)} =\frac{(4.0 - 5.83)^{2}}{45.5 \left( \frac{1}{6} + \frac{1}{6} \right)} = 0.22$

4. Decision.

For $\mu_{1}$ vs. $\mu_{2}$ , reject $H_{0}$ . For $\mu_{1}$ vs. $\mu_{3}$ , do not reject $H_{0}$ . For $\mu_{2}$ vs. $\mu_{3}$ , do not reject $H_{0}$ .

5. Interpretation.

The results of the Scheffé test at $\alpha = 0.05$ conclude that only the mean numbers of interchange employees between toll roads 1 and 2 are significantly different.

▢

12.2.2 Tukey Test

The test statistic for the Tukey test is

$q = \frac{|\bar{x}_{i} - \bar{x}_{j}|}{\sqrt{s_{\rm W}^{2}/n}}$

where, again, $s_{\rm W}^{2}$ is from the omnibus ANOVA, $\bar{x}_{i}$ is the mean of group $i$ and we must have equal sample sizes for all groups: $n = n_{i}$ for all $i$ . There is a Tukey test statistic for unequal $n$ , and it is used by SPSS, but we won’t cover that here.

The critical statistic, $q_{\mbox{crit}}$ , comes from a table of critical values from a new distribution called the $q$ distribution. The critical values are tabulated in the Tukey Test Critical Values Table. To use this table, you need two numbers going in :

$k$ = number of groups
$\nu = \nu_{2}$ = degrees of freedom for $s_{W}^{2}$

Reject $H_{0}$ when $q > q_{\mbox{crit}}$ . In this case we don’t have a picture of the $q$ distribution handy (although it is basically the absolute value of $t$ ), so we just use the $q > q_{\mbox{crit}}$ rule similar to how we use the $p$ -value.

Example 12.3 : Repeat Example 12.2 using the Tukey test instead of the Scheffé test.

Solution : 0. Data Reduction.

We use the same data from the omnibus ANOVA :

$n_{1}=n_{2}=n_{3} = 6 \hspace*{2em} \bar{x}_{1} = 15.5,\: \bar{x}_{2} = 4.0,\: \bar{x}_{3} = 5.83$

$s_{\rm W}^{2} = 45.5 \hspace*{2em} F_{\rm crit, ANOVA} = F_{0.05, 2, 15} = 3.68$

1. Hypotheses.

The 3 hypotheses pairs to test are the same :

$H_{0}: \mu_{1} = \mu_{2}, \hspace{1cm} H_{0}: \mu_{1} = \mu_{3}, \hspace{1cm} H_{0}: \mu_{2} = \mu_{3}$

$H_{1}: \mu_{1} \neq \mu_{2}, \hspace{1cm} H_{1}: \mu_{1} \neq \mu_{3}, \hspace{1cm} \[H_{1}: \mu_{2} \neq \mu_{3}$

2. Critical statistic.

Use the Tukey Test Critical Values Table. Go into the table with

Number of groups = $k = 3$ .
$\nu = \nu_{2} = N - k = nk - k = (6)(3) - 3 = 18 - 3 = 15$ .

and $\alpha = 0.05$ to find

$q_{\mbox{crit}} = 3.67$

3. Test statistic.

Again, there are three of them :

$\mu_{1}$ vs. $\mu_{2}$ :

$q = \frac{| \overline{x}_{1} - \overline{x}_{2} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 15.5 - 4.0 |}{\sqrt{45.5/6}} = 4.17$

$\mu_{1}$ vs. $\mu_{3}$ :

$q = \frac{| \overline{x}_{1} - \overline{x}_{3} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 15.5 - 5.83 |}{\sqrt{45.5/6}} = 3.51$

$\mu_{2}$ vs. $\mu_{3}$ :

$q = \frac{| \overline{x}_{2} - \overline{x}_{3} |}{\sqrt{s_{W}^{2}/n}} = \frac{| 4.0 - 5.83 |}{\sqrt{45.5/6}} = 0.66$

4. Decision.

Reject $H_{0}$ when $q > q_{\mbox{crit}}$ . This only happens for one hypothesis pair : For $\mu_{1}$ vs. $\mu_{2}$ , reject $H_{0}$ . For $\mu_{1}$ vs. $\mu_{3}$ , do not reject $H_{0}$ . For $\mu_{2}$ vs. $\mu_{3}$ , do not reject $H_{0}$ .

5. Interpretation.

The results of the Tukey test at $\alpha = 0.05$ conclude that only the mean numbers of interchange employees between toll roads 1 and 2 are significantly different. (Same result as the Scheff $\'{e}$ test. Usually this happens but when it doesn’t, you need to use some kind of non-mathematical judgement.)

12.2.3 Bonferroni correction

A more conservative (less power) approach to multiple comparisons (post hoc testing) is to use Bonferroni’s method. The fundamental idea of the Bonferroni correction is to add the probabilities of making individual type I errors to get an overall type I error rate.
Implementing the idea is simple. Do a bunch of $t$ -tests and multiply the $p$ -value by a correction factor $C$ . There are a number of ways to choose $C$ (you will have to dig to find out which method SPSS uses). The easiest (and most conservative) is to set $C$ equal to the number of pairwise comparisons done. So if you have $k$ groups then $C$ is given by the binomial coefficient:

$C = \left( \begin{array}{c} k \\ 2 \end{array} \right).$

Another way is to look at the total degrees of freedom, $\nu_{\mbox{pairs}}$ , associated with the pairwise $t$ -tests and compare it to the total degrees of freedom in the data, $\nu = N$ (or one could argue $\nu = N-1$ ), to come up with

$C = \frac{\nu_{\mbox{pairs}} + \nu}{\nu}.$

Since there is some ambiguity as to what we should use for $C$ , we will not do Bonferroni post hoc testing by hand. However, be able to recognize Bonferroni results in SPSS, treating the value of $C$ as an SPSS blackbox parameter.

Combinations of means may be compared using "contrasts". For example $\mu_{1} + \mu_{2}$ might be compared with $2\mu_{3}$ . Contrasts are not covered in Psy 234. ↵

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Applied Statistics for Psychology Students Copyright © 2022 by Gordon E. Sarty is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

12.2.1 Scheffé test

12.2.2 Tukey Test

12.2.3 Bonferroni correction

License

Share This Book