10. Comparing Two Population Means

10.6 SPSS Lesson 6: Independent Sample t-Test

To follow along, load in the Data Set titled “pHLevel.sav”:

SPSS screenshot © International Business Machines Corporation.

This is the first time we have an independent variable, Species in this case, and it has two values, setosa and versicolor, that label the two populations. Notice, especially, that we do not have separate columns for each sample. There is only one dependent variable, Sepal.Length in this case. As we cover the more advanced statistical tests in later chapters (part of PSY 234) the nature and complexity of the independent variable will evolve but we will always have just one independent variable.

Running the t-test is easy, pick Analyze \rightarrow Compare Means \rightarrow Independent Samples T Test :

SPSS screenshot © International Business Machines Corporation.

Select Sepal.Length as the Test Variable (dependent variable) and Species as the group variable (independent variable) :

SPSS screenshot © International Business Machines Corporation.

You need to do some work to let SPSS know that the two levels of the “grouping variable” are 1 and 2 (as can be seen in the Variable View window). So hit Define Groups… and enter:

SPSS screenshot © International Business Machines Corporation.

Hit Continue, then OK (the Options menu will allow you to set the confidence level percent) to get:

SPSS screenshot © International Business Machines Corporation.

The first table shows descriptive statistics for the two groups independently. These numbers, excluding standard error numbers can be plugged into the t_{\rm test} formulae for pencil and paper calculations.

The important table is the second table. First, what hypothesis are we testing? It is important to write it out explicitly:

    \[H_{0}:  \mu_{1} - \mu_{2} = 0\]

(10.6)   \begin{equation*} H_{1}:  \mu_{1} - \mu_{2} \neq 0 \end{equation*}

This, as you recall, is our test of interest. When we did this test by hand, we had to do a preliminary F test the see if we could assume homoscedasticity or not :

    \[H_{0}: & \sigma_{1}^{2} = \sigma_{2}^{2}\]

(10.7)   \begin{equation*} H_{1}: & \sigma_{1}^{2} \neq \sigma_{2}^{2} \end{equation*}

That preliminary test is given to us as Levine’s test in the first two columns of the second table. Levine’s test is similar to but not exactly the same as the F test we used but it also uses F as a test statistic. Here we see F_{\rm test} = 12.061 with p=0.001, so we reject H_{0} and assume that population variances are unequal. That means we look at only the second line of the second table corresponding to “Equal variances not assumed”. SPSS computes t and p using both t formulae but it does not decide for you which one is correct. You need to decide that yourself on the basis of the Levine’s test.

Again the information is fairly redundant. Looking across the second row we have t_{\rm test} = -3.741 (note that it is the same as the t in the first row – that’s because the sample is large, making z a good approximation for both), \nu = 32 (notice the fractional \nu here for the heteroscedastic case — recall Equation (10.3)), p = 0.001 (note that it is for a two-tailed hypothesis, if your hypothesis is one-tailed then divide p by 2), \overline{x}_{1} - \overline{x}_{2} = -0.208, and the standard error, the denominator of the t test statistic formula (t is mean over standard error). The p value is small, so we reject H_{0}, the difference of the sample means is significant. The last two columns give the 95% confidence interval as

(10.8)   \begin{equation*} 0.75429 < \mu_{1} - \mu_{2} < 1.10571 \end{equation*}

Notice that zero is not in the confidence interval, consistent with rejecting H_{0}.

We can also make an error bar plot. Go through Graphs \rightarrow Legacy Dialogs \rightarrow Errorbar and pick Simple and “Summaries for groups of cases” in the next menu and:

SPSS screenshot © International Business Machines Corporation.

which results in:

SPSS screenshot © International Business Machines Corporation.

or you could generate a boxplot comparison:

SPSS screenshot © International Business Machines Corporation.

Finally, we throw in a couple of effect size (descriptive) measures. One is the standardized effect size defined as:

(10.9)   \begin{equation*} d = t \sqrt{\frac{n_{1} + n_{2}}{n_{1} n_{2}}} = \frac{\overline{x}_{1}-\overline{x}_{2}}{s_{p}} \end{equation*}

where s_{p} is the pooled variance as given by Equation (10.4). Another measure is the strength of association

(10.10)   \begin{equation*} \eta^{2} = \frac{t^{2}}{t^{2} + (n_{1}+n_{2} -2)} \end{equation*}

which measures a kind of “correlation'” between x_{1} and x_{2}. The larger t, the closer \eta^{2} is to 1.


Share This Book