Saturday, December 12, 2015

Remember that removing an outlier can greatly affect the correlation between two variables.

Suppose x,y are as follows

x  1  1 1  2 2  2  3 3  3  1
y  8 9 10 8 9 10 8 9 10 10

This has a relatively strong negative correlation, but if you remove the data point (1,10), all you have left is a block of data with correlation coefficient of 0.


No comments:

Post a Comment