14. Correlation and Regression
14.10 Multiple Regression
Multiple regression is to the linear regression we just covered as one-way ANOVA is to -way ANOVA. In -way ANOVA we have one DV and discrete IVs. With multiple regression we have one DV (univariate) and continuous IVs. We will label the DV with and the IVs with . The idea is to predict with via
or, using summation notation
Sometimes we (and SPSS) write . The explicit formula for the coefficients and are long so we won’t give them here but, instead, we will rely on SPSS to compute the coefficients for us. Just the same, we should remember that the coefficients are computed using the least squares method, where the sum of the squared deviations is minimized. That is, and the are such that
is minimized. (Here we are using to represent data point .) If you like calculus and have a few minutes to spare, the equations for and the can be found by solving:
for and the . The result will contain al the familiar terms like , , etc. It also turns out that the “normal equations” for and the that result have a pattern that can be captured with a simple linear algebra equation that we will see in Chapter 17.
Some terminology: the (including ) are known as partial regression coefficients.
14.10.1: Multiple regression coefficient, r
An overall correlation coefficient, , can be computed using pairwise bivariate correlation coefficients as defined in the previous Section 14.2. This overall correlation is defined as , the bivariate correlation coefficient of the predicted values versus the data . For the case of 2 IVs, the formula is
where is the bivariate correlation coefficient between and , etc. It is true that as with the bivariate .
Example 14.6 : Suppose that you have used SPSS to obtain the regression equation
for the following data :
Student | GPA, |
Age, |
Score, |
||||||
A | 3.2 | 22 | 550 | 10.24 | 484 | 302500 | 1760 | 12100 | 70.4 |
B | 2.7 | 27 | 570 | 7.29 | 729 | 324900 | 1539 | 15390 | 72.9 |
C | 2.5 | 24 | 525 | 6.25 | 576 | 275625 | 1312.5 | 12600 | 60 |
D | 3.4 | 28 | 670 | 11.56 | 784 | 448900 | 2278 | 18760 | 95.2 |
E | 2.2 | 23 | 490 | 4.84 | 529 | 240100 | 1078 | 11270 | 50.6 |
Compute the multiple correlation coefficient.
Solution :
First we need to compute the pairwise correlations , , and . (Note that = , etc. because the correlation matrix is symmetric.)
Now use these in :
▢
14.10.2: Significance of r
Here we want to test the hypotheses :
where is the population multiple regression correlation coefficient.
To test the hypothesis we use
with
here:
(Note: This “-test” is similar to but not the same as the “ANOVA” output given by SPSS when you run a regression.)
Example 14.7 : Continuing with Example 14.6, test the significance of .
Solution :
1. Hypotheses.
2. Critical statistic. From the Rank Correlation Coefficient Critical Values Table (i.e., the critical values for the Spearman correlation) with
find
3. Test statistic.
4. Decision.
Reject .
5. Interpretation.
is significant.
▢
14.10.3: Other descriptions of correlation
- Coefficient of multiple determination: . This quantity still has the interpretation as fraction of variance explained by the (multiple regression) model.
- Adjusted :
gives a better (unbiased) estimate of the population value for by correcting for degrees of freedom just as the sample with its degrees of freedom equal to gives an unbiased estimate of the population .
Example 14.8 : Continuing Example 14.6, we had so
and
▢