|
What is correlation?
When two variables vary together, statisticians say that there is a lot of covariation or correlation. The correlation coefficient, r, quantifies the direction and magnitude of correlation.
Correlation is used when you measured both X and Y variables, and is not appropriate if X is a variable you manipulate.
The correlation analysis reports the value of the correlation coefficient. It does not create a regression line. If you want a best-fit line, choose linear regression.
Correlation vs. linear regression
Correlation and linear regression are not the same. Consider these differences:
| • | Correlation quantifies the degree to which two variables are related. Correlation does not find a best-fit line. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. |
| • | With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X. |
| • | With correlation, it doesn't matter which of the two variables you call "X" and which you call "Y". you will get the same correlation coefficient if you swap the two. With linear regression, the decision of which variable you call "X" and which you call "Y" matters a lot, as you will get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y. |
| • | Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. With linear regression, the X variable is often something you experimentally manipulate (time, concentration...) and the Y variable is something you measure. |
|