14. Correlation and Regression
14.2 Correlation
The correlation coefficient we will use here is called the “Pearson product moment correlation coefficient” and will be represented by the following symbols :
— population correlation
— sample correlation
The correlation is always a number between and : and . If (or ) equals 0 then that means there is no correlation between and . A minus sign means a minus slope, a plus sign means a positive slope.
The formula for is[1] :
(14.1)
Example 14.1 : Compute the correlation between and for the data on Section 14.1 used for the scatter plot.
Solution : To compute , first make a table, fill in the data columns (on the right of the double vertical line below), fill in the other computed columns, sum the columns and finally plug the sums into the formula for :
Subject | |||||
---|---|---|---|---|---|
A | 6 | 82 | 492 | 36 | 6724 |
B | 2 | 86 | 172 | 4 | 7396 |
C | 15 | 43 | 645 | 225 | 1849 |
D | 9 | 74 | 666 | 81 | 5476 |
E | 12 | 58 | 696 | 144 | 3364 |
F | 5 | 90 | 450 | 25 | 8100 |
G | 8 | 78 | 624 | 64 | 6084 |
Plug in the numbers :
Here there is a strong negative relationship between and . That is, as goes up, goes down with a fair degree of certainty. Note the is not the slope. All we know here, from the correlation coefficient, is that the slope is negative and the scatterplot ellipse is long and skinny.
▢
Standard warning about correlation and causation : If you find that and are highly correlated (i.e. is close to or ) then you cannot say that causes or that causes or that there is and causal relationship between and at all. In other words, it is true that if causes or that causes then will be correlated with but the reverse implication does not logically follow. So beware of looking for relations between variables by looking at correlation alone. Simply finding correlations by themselves doesn’t prove anything.
The significance of is assessed by a hypothesis test of
To test this hypothesis, you need to convert to via:
and use to find . The Pearson Correlation Coefficient Critical Values Table offers a shortcut and lists critical values that correspond to the critical values.
Example 14.2 : Given , and , test if is significant.
Solution :
1. Hypothesis.
2. Critical statistic.
From the t Distribution Table with and for a two-tailed test find
As a short cut, you can also look in the Pearson Correlation Coefficient Critical Values Table for , to find the corresponding
3. Test statistic.
4. Decision.
Using the :
or using the Pearson Correlation Coefficient Critical Values Table short cut :
we conclude that we can reject .
5. Interpretation. The correlation is statistically significant at .
▢
- The formula for is the same with all and in the population used. ↵