14.1 Scatter Plots

Gordon E. Sarty

14. Correlation and Regression

14.1 Scatter Plots

You can make a scatter plot of your data when you have values for two or more variables for each subject. Here we will only be interested in the case where we have a pair of variables (2D plot).

Of the two variables, for application to regression, one will be an independent variable (IV) and the other a dependent variable (DV). The IV is usually a variable that is known with a high degree of precision (like age). The idea with regression (when we get to it) is to come up with a formula that allows you to predict what the DV will be if you know the IV. We will use the symbol $x$ for the IV and $y$ for the DV.

The best way to see what a scatter plot is is to plot one. With the data:

Student	No. of absences, $x$	grade, $y$
A	6	82
B	2	86
C	15	43
D	9	74
E	12	58
F	5	90
G	8	78

the scatterplot is:

A couple of things to notice in the plot are: 1. An eyeball best line fit has been drawn through the scatterplot points. With regression we will calculate exactly what that best fit line is. 2. If $x$ and $y$ are linearly related then the points will fall inside an ellipse. If the ellipse is long and skinny, $x$ and $y$ are said to to be highly correlated. If the ellipse is more like a circle the $x$ and $y$ are not correlated. By looking at a scatter plot you can judge if $x$ and $y$ are linearly related. If your scatterplot looks like:

then you could conclude that $x$ and $y$ are not linearly related and it will not make much sense to try and fit a line through the data or to compute a correlation coefficient.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Applied Statistics for Psychology Students Copyright © 2022 by Gordon E. Sarty is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

License

Share This Book