14. Correlation and Regression

# 14.5 Linear Regression

Linear regression gives us the best equation of a line through the scatter plot data in terms of least squares. Let’s begin with the equation of a line: where is the intercept and is the slope. The data, the collection of points, rarely lie on a perfect straight line in a scatter plot. So we write as the equation of the best fit line. The quantity is the predicted value of (predicted from the value of ) and is the measured value of . Now consider : The difference between the measured and predicted value at data point , , is the deviation. The quantity is the squared deviation. The sum of the squared deviations is The least squares solution for and is the solution that minimizes , the sum of squares, over all possible selections of and . Minimization problems are easily handled with differential calculus by solving the differential equations: The solution to those two differential equations is and Example 14.3 : Continue with the data from Example 14.1 and find the best fit line. The data again are:

 Subject     A 6 82 492 36 6724 B 2 86 172 4 7396 C 15 43 645 225 1849 D 9 74 666 81 5476 E 12 58 696 144 3364 F 5 90 450 25 8100 G 8 78 624 64 6084      Using the sums of the columns, compute: and So  # 14.5.1: Relationship between correlation and slope

The relationship is where are the standard deviations of the and datasets considered separately. 