15. Chi Squared: Goodness of Fit and Contingency Tables

15.1 Goodness of Fit

For both the \chi^{2} goodness of fit and the \chi^{2} contingency table tests, the test statistic is

    \[\chi^{2} = \sum_{i=1}^{C} \frac{(O_{i} - E_{i})^{2}}{E_{i}}\]

where
O_{i} = Observed frequency of category i (the measurement)
E_{i} = Expected frequency of category i (H_{0}).
C = number of categories.
For the goodness of fit test, the degrees of freedom for the critical statistic is \nu = C-1.

Limitation : In order for the \chi^{2} test of frequencies to be valid (because of noise has a binomial distribution), all frequencies (O and E) must be \geq 5 to be considered reliable.

Example 15.1 (Goodness of Fit example)

The advisor of an ecology club believes that the club consists of 10\% freshmen, 20\% sophomores, 40\% juniors and 30\% seniors. The actual membership this year consisted of 14 freshmen, 19 sophomores, 51 juniors and 16 seniors. At \alpha = 0.10 test the advisor’s conjecture.

Solution :

0. Data reduction. Compute the observed and expected frequencies. In this example the total number of students is 14 + 19 + 51 + 16 = 100 so if we label the categories as :

category 1 = freshmen
category 2 = sophomores
category 3 = juniors
category 4 = seniors
then E_{1} = 10, E_{2} = 20, E_{3} = 40, E_{4} = 30 (converting percentages to frequencies)
and O_{1} = 14, O_{2} = 19, E_{3} = 51, E_{4} = 16.

1. Hypotheses.

    \begin{eqnarray*} H_{0} &:& E_{1} = 10,\;\; E_{2} = 20, \;\; E_{3} = 40, \;\; E_{4} = 30 \\ H_{1} &:& E_{1},\;\; E_{2}, \;\; E_{3} \mbox{\ and } E_{4} \mbox{\ are not distributed that same as } H_{0} \end{eqnarray*}

2. Critical statistic. Using the Chi-Square Distribution Table with \alpha = 0.10 (note that we only worry about the right tail as with F test statistics in ANOVA), \nu = 4-1=3 we find

    \[ \chi^{2}_{\mbox{crit}} = 6.251 \]

3. Test statistic.

    \begin{eqnarray*} \chi^{2} & = & \sum_{i=1}^{4} \frac{(O_{i} - E_{i})^{2}}{E_{i}} \\ & = & \frac{(14 - 10)^{2}}{10} + \frac{(19 - 20)^{2}}{20} + \frac{(51 - 40)^{2}}{40} + \frac{(16 - 30)^{2}}{30} \\ & = & 11.208 \end{eqnarray*}

4. Decision.

Reject H_{0}.

5. Interpretation. The advisor’s conjecture is wrong at \alpha = 0.10. A plot of observed and expected frequencies (which we will plot as overlapping frequency polygons) shows how the observed frequencies are not a good fit to the expected frequencies :

Here the fit of the data to the H_{0} profile is not very good. If the fit between the observed frequencies (data) profile and the expected frequencies (H_{0}) profile is good, then \chi^{2} will be small.

15.1.1  : Test of Normality using the \chi^{2} goodness of fit test

To test the hypotheses :

H_{0} : The DV is normally distributed
H_{1} : The DV is not normally distributed

using the goodness of fit \chi^{2} test[1] we first need to define the number of categories to use. The choice of how many categories to use is a bit of an art[2]. To work our way through the example below, we’ll take the category definition as a given. Then we’ll find that we’ll have to change that definition in order to have a valid \chi^{2} test. This is how things will usually go in real life. The procedure for testing normality with a goodness of fit test is illustrated by example :

Example 15.2 : Suppose we have a dataset of 200 values of some measured DV. That is, suppose we have a sample of size n=200 from a single population. Suppose further that 90 \leq DV \leq 179. That is H=179, L=90 and the range is R = 89. Let us (arbitrarily) divide the range into G=6 categories. Then (recall Chapter 2) the class width is

    \[ W = \frac{R+1}{G} = \frac{90}{6} = 15. \]

Suppose, finally, that the frequency table for the data is :

Cat. Cat. Boundaries Freq f Center X_{m} fX_{m} fX^{2}_{m}
1 89.5 — 104.5 24 97 2328 225,816
2 104.5 — 119.5 62 112 6944 777,728
3 119.5 — 134.5 72 127 9144 1,161,288
4 134.5 — 149.5 26 142 3692 524,264
5 149.5 — 164.5 12 157 1884 295,788
6 164.5 — 179.5 4 172 688 118,366
\sum f=n=100 \sum fX_{m}=24680 \sum fX^{2}_{m}=3103200

At this point it will be useful for you to do a short exercise : Plot a histogram of this frequency table. If the data are normally distributed then the histogram will look approximately like a normal curve. The \chi^{2} goodness of fit test that we will do quantifies this eyeball test.

Next, compute \overline{x} and s using the sums from the table. Recall the group formulae :

    \[ \overline{x} = \frac{\sum f X_{m}}{n} = \frac{24680}{200} = 123.4 \]

and

    \begin{eqnarray*} s & = & \sqrt{\frac{\sum f X_{m}^{2} - \frac{(\sum fX_{m})^{2}}{n}}{n-1}} \\ & = & \sqrt{\frac{3,103,220 - \frac{(24680)^{2}}{200}}{199}} \\ & = & \sqrt{290} = 17.03 \end{eqnarray*}

Now we are mostly ready to go through the \chi^{2} goodness of fit hypotheses test :

0. Data reduction.

The frequency table for our data give the observed frequencies. Now we need to compute the expected frequencies by considering areas under the normal distribution that has the same mean, \overline{x} = 123.4, and standard deviation, s = 17.03, as our data. We’ll get those areas from the Standard Normal Distribution Table after z-transforming our data. Once we have the area, A_{i}, for each category i, then we convert it to the expected frequency using E_{i} = nA_{i}. These calculations are completed in the following table where the z-transforms of the category boundaries are computed using the usual z = (x - \overline{x})/s. Notice that we used - \infty and + \infty in place of the z-transforms of L and H just to catch the very tiny areas in the tails of the z distribution. In the last column O_{i} = f_{i} are copied from the data frequency table.

Class Class Boundaries z–transformed Standard Normal Distribution Table Areas E_{i} O_{i}
1 89.5 — 104.5 -\infty to -1.11 A_{1} = 0.1335 26.7 24
2 104.5 — 119.5 -1.11 to -0.23 A_{2} = 0.2755 55.1 62
3 119.5 — 134.5 -0.23 to 0.65 A_{3} = 0.3332 66.64 72
4 134.5 — 149.5 0.65 to 1.53 A_{4} = 0.1948 38.96 26
5 149.5 — 164.5 1.53 to 2.41 A_{5} = 0.0550 11.0 12
6 164.5 — 179.5 2.41 to +\infty A_{6} = 0.0080 1.6 4

The areas on the z distribution look like :

Recall that the goodness of fit \chi^{2} test is valid only if all the frequencies are \geq 5. The frequencies of class 6 are too low. As a quick fix, we’ll combine classes 5 and 6 into a new class 5. The class width of this new class will be twice that of the other classes but we can live with that. So, finally, the observed and expected frequencies that we’ll use for the hypothesis test are :

Class i E_{i} O_{i}
1 26.7 24
2 55.1 62
3 66.64 72
4 38.96 26
5 12.6 16

1. Hypotheses.

H_{0} : The population is normally distributed.
H_{1} : The population os not normally distributed.

2. Critical statistic.

From the Chi-Square Distribution Table with \alpha = 0.05 and \nu = C - 1 = 5 - 1 = 4 find

    \[ \chi^{2}_{\mbox{crit}} = 9.488 \]

3. Test statistic.

    \begin{eqnarray*} \chi^{2}_{\mbox{test}} & = & \frac{(24 - 26.7)^{2}}{26.7} + \frac{(62 - 55.1)^{2}}{55.1} + \frac{(72 - 66.64)^{2}}{66.64} + \frac{(26 - 38.96)^{2}}{38.96} + \frac{(16 - 12.6)^{2}}{12.6} \\ & = & 6.797 \end{eqnarray*}

4. Decision.

Do not reject H_{0}.

5. Interpretation. The population appears to be normally distributed.

 


  1. This is a test of the assumptions that might underlie a test of interest. This test, like most hypotheses tests applied to test assumptions, will find the desired assumption to be true when you fail to reject H_{0}. There are other tests for normality that we don't cover in this course. One of the more popular tests for normality is the Komolgorov-Smirnov test for comparing distributions.
  2. The choice of how many categories to choose for making a histogram is in general a wide open question.