Sunday, October 24, 2021

Hampel outlier identification - Reducing the sensitivity of standard deviation


Outlier identification is important for investors as a systematic means of finding prices or returns that are abnormal. Outliers are a reality, but they also cause distortions in any time series or cross market correlation that may lead to erroneous investment conclusions. 

There is the general view that data should never be thrown out and data should not be adjusted for outliers, but reality is more complex. Smooth data and you will soon be explaining away all periods of stress and crisis. Leave data untouched and you may have price series distorted for years. There is no easy answer, yet looking at the differences between smoothed versus rough data is a core part of the job of a good data analyst.

The first task is to identify outliers. Instead of using standard deviations, an alternative is to use median absolute deviations, what is called the Hampel identifier. The median can be tracked over a rolling time period with outliers identified as outside a range. It is noted that the median absolute deviation (MAD) is close to the standard deviation for a Gaussian distribution if an adjustment of 1.48 is multiplied by the MAD. Outliers are suggested to be 3 times the adjusted MAD. In general, this measure will be less sensitive than a traditional standard deviation tool. 

Looking at an index like the SPX will show periods of market stress as measured by outliers. The return series will have a different interpretation than a Hampel outlier analysis of price, but it does provide a simple and easy to calculate tool to help isolate data issues. 


We can take a time series and create an envelope around it using a rolling MAD value. Outliers can then be identified and replaced. The usual replacement is with the median. Outlier detection also can be useful for mean reversion trading and for focusing on announcement event dates associated with outliers.

There is value from finding outliers and adapting to market extremes. There is no single approach that is best, but outlier analysis can be helpful for effective data reviews. 

No comments: