14. Correlation and Regression
14.10 Multiple Regression
Multiple regression is to the linear regression we just covered as one-way ANOVA is to -way ANOVA. In
-way ANOVA we have one DV and
discrete IVs. With multiple regression we have one DV (univariate) and
continuous IVs. We will label the DV with
and the IVs with
. The idea is to predict
with
via
or, using summation notation
Sometimes we (and SPSS) write . The explicit formula for the coefficients
and
are long so we won’t give them here but, instead, we will rely on SPSS to compute the coefficients for us. Just the same, we should remember that the coefficients are computed using the least squares method, where the sum of the squared deviations is minimized. That is,
and the
are such that
is minimized. (Here we are using to represent data point
.) If you like calculus and have a few minutes to spare, the equations for
and the
can be found by solving:
for and the
. The result will contain al the familiar terms like
,
, etc. It also turns out that the “normal equations” for
and the
that result have a pattern that can be captured with a simple linear algebra equation that we will see in Chapter 17.
Some terminology: the (including
) are known as partial regression coefficients.
14.10.1: Multiple regression coefficient, r
An overall correlation coefficient, , can be computed using pairwise bivariate correlation coefficients as defined in the previous Section 14.2. This overall correlation is defined as
, the bivariate correlation coefficient of the predicted values
versus the data
. For the case of 2 IVs, the formula is
where is the bivariate correlation coefficient between
and
, etc. It is true that
as with the bivariate
.
Example 14.6 : Suppose that you have used SPSS to obtain the regression equation
for the following data :
Student | GPA,![]() |
Age,![]() |
Score,![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
A | 3.2 | 22 | 550 | 10.24 | 484 | 302500 | 1760 | 12100 | 70.4 |
B | 2.7 | 27 | 570 | 7.29 | 729 | 324900 | 1539 | 15390 | 72.9 |
C | 2.5 | 24 | 525 | 6.25 | 576 | 275625 | 1312.5 | 12600 | 60 |
D | 3.4 | 28 | 670 | 11.56 | 784 | 448900 | 2278 | 18760 | 95.2 |
E | 2.2 | 23 | 490 | 4.84 | 529 | 240100 | 1078 | 11270 | 50.6 |
![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
Compute the multiple correlation coefficient.
Solution :
First we need to compute the pairwise correlations ,
, and
. (Note that
=
, etc. because the correlation matrix is symmetric.)
Now use these in :
▢
14.10.2: Significance of r
Here we want to test the hypotheses :
where is the population multiple regression correlation coefficient.
To test the hypothesis we use
with
here:
(Note: This “-test” is similar to but not the same as the “ANOVA” output given by SPSS when you run a regression.)
Example 14.7 : Continuing with Example 14.6, test the significance of .
Solution :
1. Hypotheses.
2. Critical statistic. From the Rank Correlation Coefficient Critical Values Table (i.e., the critical values for the Spearman correlation) with
find
3. Test statistic.
4. Decision.
Reject .
5. Interpretation.
is significant.
▢
14.10.3: Other descriptions of correlation
- Coefficient of multiple determination:
. This quantity still has the interpretation as fraction of variance explained by the (multiple regression) model.
- Adjusted
:
gives a better (unbiased) estimate of the population value for
by correcting for degrees of freedom just as the sample
with its degrees of freedom equal to
gives an unbiased estimate of the population
.
Example 14.8 : Continuing Example 14.6, we had so
and
▢