14. Correlation and Regression
14.5 Linear Regression
Linear regression gives us the best equation of a line through the scatter plot data in terms of least squares. Let’s begin with the equation of a line:
where is the intercept and is the slope.
The data, the collection of points, rarely lie on a perfect straight line in a scatter plot. So we write
as the equation of the best fit line. The quantity is the predicted value of (predicted from the value of ) and is the measured value of . Now consider :
The difference between the measured and predicted value at data point , , is the deviation. The quantity
is the squared deviation. The sum of the squared deviations is
The least squares solution for and is the solution that minimizes , the sum of squares, over all possible selections of and . Minimization problems are easily handled with differential calculus by solving the differential equations:
The solution to those two differential equations is
Example 14.3 : Continue with the data from Example 14.1 and find the best fit line. The data again are:
Using the sums of the columns, compute:
14.5.1: Relationship between correlation and slope
The relationship is
are the standard deviations of the and datasets considered separately.