14.2 Correlation

Gordon E. Sarty

14. Correlation and Regression

14.2 Correlation

The correlation coefficient we will use here is called the “Pearson product moment correlation coefficient” and will be represented by the following symbols :

$\rho$ — population correlation

$r$ — sample correlation

The correlation is always a number between $-1$ and $+1$ : $-1 \leq r \leq +1$ and $-1 \leq \rho \leq +1$ . If $r$ (or $\rho$ ) equals 0 then that means there is no correlation between $x$ and $y$ . A minus sign means a minus slope, a plus sign means a positive slope.

The formula for $r$ is^[1] :

(14.1) $\begin{equation*} r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n (\sum x^{2}) - (\sum x)^{2}][n (\sum y^{2}) - (\sum y)^{2}]}} \end{equation*}$

Example 14.1 : Compute the correlation between $x$ and $y$ for the data on Section 14.1 used for the scatter plot.

Solution : To compute $r$ , first make a table, fill in the data columns (on the right of the double vertical line below), fill in the other computed columns, sum the columns and finally plug the sums into the formula for $r$ :

Subject	$x$	$y$	$xy$	$x^{2}$	$y^{2}$
A	6	82	492	36	6724
B	2	86	172	4	7396
C	15	43	645	225	1849
D	9	74	666	81	5476
E	12	58	696	144	3364
F	5	90	450	25	8100
G	8	78	624	64	6084
$n = 7$	$\sum x = 57$	$\sum y = 511$	$\sum xy = 3745$	$\sum x^{2} = 579$	$\sum y^{2} = 38993$

Plug in the numbers :

$\begin{eqnarray*} r & = & \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n (\sum x^{2}) - (\sum x)^{2}][n (\sum y^{2}) - (\sum y)^{2}]}}\\ & = & \frac{7(3745) - (57)(511)}{\sqrt{[7 (579) - (57)^{2}][7 (38993) - (511)^{2}]}}\\ & = & -0.944 \end{eqnarray*}$

Here there is a strong negative relationship between $x$ and $y$ . That is, as $x$ goes up, $y$ goes down with a fair degree of certainty. Note the $r$ is not the slope. All we know here, from the correlation coefficient, is that the slope is negative and the scatterplot ellipse is long and skinny.

▢

Standard warning about correlation and causation : If you find that $x$ and $y$ are highly correlated (i.e. $r$ is close to $+1$ or $-1$ ) then you cannot say that $x$ causes $y$ or that $y$ causes $x$ or that there is and causal relationship between $x$ and $y$ at all. In other words, it is true that if $x$ causes $y$ or that $y$ causes $x$ then $x$ will be correlated with $y$ but the reverse implication does not logically follow. So beware of looking for relations between variables by looking at correlation alone. Simply finding correlations by themselves doesn’t prove anything.

The significance of $r$ is assessed by a hypothesis test of

$H_{0}: \rho = 0 \;\;\;\;\;\; H_{1}: \rho \neq 0$

To test this hypothesis, you need to convert $r$ to $t$ via:

$t = r \sqrt{\frac{n-2}{1 - r^{2}}} \label{tcorrformula}$

and use $\nu = n-2$ to find $t_{\mbox{crit}}$ . The Pearson Correlation Coefficient Critical Values Table offers a shortcut and lists critical $r$ values that correspond to the critical $t$ values.

Example 14.2 : Given $r = 0.897$ , $n = 6$ and $\alpha = 0.05$ , test if $r$ is significant.

Solution :

1. Hypothesis. $H_{0}: \rho = 0 \ \ \ H_{1}: \rho \neq 0$

2. Critical statistic.

From the t Distribution Table with $\nu = n - 2 = 6 - 2 = 4$ and $\alpha = 0.05$ for a two-tailed test find

$t_{\mbox{crit}} = \pm 2.776$

As a short cut, you can also look in the Pearson Correlation Coefficient Critical Values Table for $\alpha = 0.05$ , $\nu = 4$ to find the corresponding