14. Correlation and Regression

# 14.10 Multiple Regression

Multiple regression is to the linear regression we just covered as one-way ANOVA is to -way ANOVA. In -way ANOVA we have one DV and discrete IVs. With multiple regression we have one DV (univariate) and continuous IVs. We will label the DV with and the IVs with . The idea is to predict with via

or, using summation notation

Sometimes we (and SPSS) write . The explicit formula for the coefficients and are long so we won’t give them here but, instead, we will rely on SPSS to compute the coefficients for us. Just the same, we should remember that the coefficients are computed using the least squares method, where the sum of the squared deviations is minimized. That is, and the are such that

is minimized. (Here we are using to represent data point .) If you like calculus and have a few minutes to spare, the equations for and the can be found by solving:

for and the . The result will contain al the familiar terms like , , etc. It also turns out that the “normal equations” for and the that result have a pattern that can be captured with a simple linear algebra equation that we will see in Chapter 17.

Some terminology: the (including ) are known as partial regression coefficients.

# 14.10.1: Multiple regression coefficient, r

An overall correlation coefficient, , can be computed using pairwise bivariate correlation coefficients as defined in the previous Section 14.2. This overall correlation is defined as , the bivariate correlation coefficient of the predicted values versus the data . For the case of 2 IVs, the formula is

where is the bivariate correlation coefficient between and , etc. It is true that as with the bivariate .

Example 14.6 : Suppose that you have used SPSS to obtain the regression equation

for the following data :

 Student GPA, Age, Score, A 3.2 22 550 10.24 484 302500 1760 12100 70.4 B 2.7 27 570 7.29 729 324900 1539 15390 72.9 C 2.5 24 525 6.25 576 275625 1312.5 12600 60 D 3.4 28 670 11.56 784 448900 2278 18760 95.2 E 2.2 23 490 4.84 529 240100 1078 11270 50.6

Compute the multiple correlation coefficient.

Solution :

First we need to compute the pairwise correlations , , and . (Note that = , etc. because the correlation matrix is symmetric.)

Now use these in :

# 14.10.2: Significance of r

Here we want to test the hypotheses :

where is the population multiple regression correlation coefficient.

To test the hypothesis we use

with

here:

(Note: This “-test” is similar to but not the same as the “ANOVA” output given by SPSS when you run a regression.)

Example 14.7 : Continuing with Example 14.6, test the significance of .

Solution :

1. Hypotheses.

2. Critical statistic. From the Rank Correlation Coefficient Critical Values Table (i.e., the critical values for the Spearman correlation) with

find

3. Test statistic.

4. Decision.

Reject .

5. Interpretation.

is significant.

# 14.10.3: Other descriptions of correlation

1. Coefficient of multiple determination: . This quantity still has the interpretation as fraction of variance explained by the (multiple regression) model.

gives a better (unbiased) estimate of the population value for by correcting for degrees of freedom just as the sample with its degrees of freedom equal to gives an unbiased estimate of the population .

Example 14.8 : Continuing Example 14.6, we had so

and