3.6 Covariance, Correlation, and the Least Squares Regression Line
- Key Idea 1: Finding the least squares best-fitting line by trial-and-error is laborious; we need a shortcut...
- Key Idea 2: Developing this shortcut uses a new statistic - covariance - that measures how two variables change with respect to each other...
Discussion
- If the slope of the regression line is positive, then as one variable increases, so does the other. Such data have a positive covariance.
- If the slope of the regression line is negative, then as one variable increases, the other decreases. Such data have a negative covariance.
- The covariance of X and Y is the average of the products of the deviations from the mean.
Example Lab Exercise
- Key Idea 3: The formula for the slope of the least squares regression line is (covariance of X and Y)/variance of X...
Discussion Recall that the least squares line passes through (X, Y). Hence, we now have an exact formula for the regression line Y’ = mX + C.
Example
- Key Idea 4: The correlation coefficient measures the strength of the linear relationship between two variables...
Discussion
- Denoted by r or [Greek symbol rho], it is defined so that it has values between -1 and 1.
- The sign of r (positive or negative) is the same as the sign of the slope of the least squares best-fitting line for the data.
- The correlation is really a rescaling (division by Sx*Sy) of the covariance; the formula is
![]()
Example1 Example2
- Key Idea 5: Once you have calculated a least-squares regression line, how well does it fit the data? We can measure this with r-squared (the square of the correlation coefficient)...
Discussion
- R-squared measures how much of the overall variation in the data is explained by the regression line.
- We can start with the error variance , which is the mean square estimation error.
- Next, look at the ratio of the error variance of Y to the regular variance of Y,
- To see what proportion of the overall variation remains unexplained as error variance.
- R-squared, the amount of variation explained by the regression line, is just 1 minus this ratio
- Key Idea 6: Most often, you will be calculating regressions and correlations using software. This will show you how to do this in Excel, and how to interpret output...