UNDERSTANDING CORRELATION COEFFICIENT R

SHARE

The Correlation Coefficient r

Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a good predictor?
Use the correlation coefficient as another indicator (besides the scatterplot) of the strength of the relationship between x and y.
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of
strength and direction of the linear association between the independent variable x and the dependent variable y.

What the VALUE of r tells us:

• The value of r is always between –1 and +1: –1 ≤ r ≤ 1.

• The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or
to +1 indicate a stronger linear relationship between x and y.

• If r = 0 there is absolutely no linear relationship between x and y (no linear correlation).

• If r = 1, there is a perfect positive correlation. If r = –1, there is a perfect negative correlation. In both these cases, all of the original data points lie on a straight line. Of course, in the real world, this will not generally happen.

What the SIGN of r tells us

• A positive value of r means that when x increases, y tends to increase, and when x decreases, y tends to decrease
(positive correlation).

• A negative value of r means that when x increases, y tends to decrease, and when x decreases, y tends to increase
(negative correlation).

• The sign of r is the same as the sign of the slope, b, of the best-fit line.

The Coefficient of Determination

The variable ris called the coefficient of determination and is the square of the correlation coefficient, but is usually stated as a percent, rather than in decimal form. It has an interpretation in the context of the data:

• r2, when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.

• 1 – r2, when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.

SHARE
Next articleTHE CENTRAL LIMIT THEOREM II

NO COMMENTS

Let's hear your thoughts