14. Correlation and Regression

# 14.5 Linear Regression

Linear regression gives us the best equation of a line through the scatter plot data in terms of *least squares*. Let’s begin with the equation of a line:

where is the intercept and is the slope.

The data, the collection of points, rarely lie on a perfect straight line in a scatter plot. So we write

as the equation of the best fit line. The quantity is the predicted value of (predicted from the value of ) and is the measured value of . Now consider :

The difference between the measured and predicted value at data point , , is the* deviation*. The quantity

is the *squared deviation*. The *sum of the squared deviations* is

The least squares solution for and is the solution that minimizes , the sum of squares, over all possible selections of and . Minimization problems are easily handled with differential calculus by solving the differential equations:

The solution to those two differential equations is

and

**Example 14.3** : Continue with the data from Example 14.1 and find the best fit line. The data again are:

Subject | |||||

A | 6 | 82 | 492 | 36 | 6724 |

B | 2 | 86 | 172 | 4 | 7396 |

C | 15 | 43 | 645 | 225 | 1849 |

D | 9 | 74 | 666 | 81 | 5476 |

E | 12 | 58 | 696 | 144 | 3364 |

F | 5 | 90 | 450 | 25 | 8100 |

G | 8 | 78 | 624 | 64 | 6084 |

Using the sums of the columns, compute:

and

So

▢

**14.5.1**: **Relationship between correlation and slope**

The relationship is

where

are the standard deviations of the and datasets considered separately.