14. Correlation and Regression

# 14.5 Linear Regression

Linear regression gives us the best equation of a line through the scatter plot data in terms of least squares. Let’s begin with the equation of a line:

where is the intercept and is the slope.

The data, the collection of points, rarely lie on a perfect straight line in a scatter plot. So we write

as the equation of the best fit line. The quantity is the predicted value of (predicted from the value of ) and is the measured value of . Now consider :

The difference between the measured and predicted value at data point , , is the deviation. The quantity

is the squared deviation. The sum of the squared deviations is

The least squares solution for and is the solution that minimizes , the sum of squares, over all possible selections of and . Minimization problems are easily handled with differential calculus by solving the differential equations:

The solution to those two differential equations is

and

Example 14.3 : Continue with the data from Example 14.1 and find the best fit line. The data again are:

 Subject A 6 82 492 36 6724 B 2 86 172 4 7396 C 15 43 645 225 1849 D 9 74 666 81 5476 E 12 58 696 144 3364 F 5 90 450 25 8100 G 8 78 624 64 6084

Using the sums of the columns, compute:

and

So

# 14.5.1: Relationship between correlation and slope

The relationship is

where

are the standard deviations of the and datasets considered separately.