Thursday, April 13, 2023

Spearman versus Pearson correlation which is better?

The classic correlation is the Pearson formulation which is a measure of linear correlation between two variables, the ratio of the covariance to the product of their standard deviations cov(X, Y)/[stdev(X)*stdev(Y)]. The nice property of this correlation is that it is insensitive to linear transformation of location and scale. If X becomes a+bX and Y becomes c+dY, and b and d are >0, the correlation will be the same. If you want to give this measure different interpretations, try the nice paper “Thirteen Ways to Look at the Correlation Coefficient”. It is a true workhorse for any investment analysis. So why mess with perfection with something that is easy to calculate and everyone uses?

The answer is in the definition. It is a linear measure. If there is non-linearity between X and Y, you will either not capture it or you will get a false interpretation. Hence, there is a reason to look at the Spearman rho correlation. Spearman measures the rank correlation which a nonparametric measure between ranks.  It is looking for a monotone relationship. The Spearman and Pearson correlation using ranks will be the same.  If one of the variables is ordinal, then use Spearman.  If the distribution is non -normal and or has some extreme values, then use Spearman. If the Spearman correlation > Pearson correlation, then the relationship is monotonic but not linear. To make a variable linear may require a transformation.  

A good practice is to use both especially if one or both variables have outliers that make for non-linear relationships.


No comments: