### The Correlation Coefficient r

Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a good predictor?

Use the correlation coefficient as another indicator (besides the scatterplot) of the strength of the relationship between x and y.

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of

strength and direction of the linear association between the independent variable x and the dependent variable y.

### What the VALUE of r tells us:

• The value of r is always between –1 and +1: –1 ≤ r ≤ 1.

• The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or

to +1 indicate a stronger linear relationship between x and y.

• If r = 0 there is absolutely no linear relationship between x and y (no linear correlation).

• If r = 1, there is a perfect positive correlation. If r = –1, there is a perfect negative correlation. In both these cases, all of the original data points lie on a straight line. Of course, in the real world, this will not generally happen.

### What the SIGN of r tells us

• A positive value of r means that when x increases, y tends to increase, and when x decreases, y tends to decrease

(positive correlation).

• A negative value of r means that when x increases, y tends to decrease, and when x decreases, y tends to increase

(negative correlation).

• The sign of r is the same as the sign of the slope, b, of the best-fit line.

### The Coefficient of Determination

The variable r^{2 }is called the coefficient of determination and is the square of the correlation coefficient, but is usually stated as a percent, rather than in decimal form. It has an interpretation in the context of the data:

• r^{2}, when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.

• 1 – r^{2}, when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.