"

14. Correlation and Regression

14.8 SPSS Lesson 11: Linear Regression

Open “Hypertension.sav” from the Data Sets:

SPSS screenshot © International Business Machines Corporation.

This dataset has a number of variables having to do with a study that is looking for a way to predict injury on the basis of strength. So y_{1} is the dependent (y) variable. To get one independent variable, we’ll arbitrarily pick y_{2} as our independent variable x. Next pick Analyze → Regression → Linear,

SPSS screenshot © International Business Machines Corporation.

and move the independent and dependent variables into the right slots :

SPSS screenshot © International Business Machines Corporation.
SPSS screenshot © International Business Machines Corporation.

You can look through the submenus if you like but they primarily give options for multiple regression and require the consideration of the independent variable as a vector instead of as a number — this elevation of data from a number to a vector is the basis of multivariate statistics so we’ll leave that for now. Running the analysis produces four output tables. You can ignore the “Variables Entered/Removed” table (it is for advanced multiple regression analysis). The other tables show :

The “Model Summary” gives r and s_{\rm est} plus r^{2} and r^{2}_{\rm adj} that we’ll discuss when we look at multiple regression. The ANOVA table gives information about the significance of r (and therefore of the overall significance of the regression). We used t to test the significance of r. You can recover the t test statistic from F in the ANOVA table; t = \sqrt{F}. Here p = 0.454 so the model fit is not significant (do not reject H_{0}). Even though the fit is not significant, the regression can still be done and this is reported in the last output table. The coefficients are reported in the B column. They are called Unstandardized Coefficients because the data, x and y, have not been z-transformed. The first line gives the intercept (a or b_{0}), the second line the slope (b or b_{1}) so

    \begin{eqnarray*} y & = & a + bx \\ y & = & b_{0} + b_{1}x \\ y & = & 64.309 -0.173 x \\ \end{eqnarray*}

For each of the two regression coefficients, a standard error can be computed, along with confidence intervals for the coefficients, and the significance of the coefficients (H_{0}: b_{i} = 0) tested with a t test statistic. We haven’t covered that aspect of linear regression but we can see the standard errors, t test statistics and associated p values in the “Coefficients” output table. Here b_{0}, the intercept, is significant while b_{1} the slope, is not. The last thing to notice is Beta in the Standardized Coefficients column. Imagine that we z-transform our variables x and y to z_{x} and z_{y} and then did a linear regression on the z-transformed variables. Then the result would be

    \begin{eqnarray*} z_{y} & = & \beta z_{x} \\ z_{y} & = & -0.201 z_{x} \end{eqnarray*}

In this case the regression is still insignificant, z-transforming can’t change that. There is no intercept in this case because the average of each of z-transformed variables is zero and this leads to an intercept of zero.

Finally, let’s see how we can plot the regression line. Generate a scatterplot and then double click on the plot and then click on the little icon that shows a line through scatterplot data :

SPSS screenshot © International Business Machines Corporation.

and

SPSS screenshot © International Business Machines Corporation.

The equation of the regression line is computed instantly and is plotted.