Thursday, March 21, 2024

Maximal information coefficient - another way of finding data relationships



I have been disappointed with classic measures of correlation like the Pearson correlation. I have looked at Spearman rank correlation which should detect non-linearities, but I would like to have simple measures that can be better informative. Of course, there is the simplest of all methods - visual interpretation but that can be very subjective. 

I have recently come across Maximal Information Coefficient (MIC) which is an example of MINE (Maximal Information Non-parametric Exploration). 

The MIC is a way of finding two variable dependence that may not be captured with correlation. This is done through looking at relationships based on a bin or grid approach that compares entropy levels. It will provide a score that will be close to the coefficient of determination. It has roots in measure of entropy and information theory, so it is closely tied with advancements in machine learning. 

An important component is to calculate the relative entropy. We know that as Shannon information is the amount of uncertainty or surprise in a random variable, so as probability of an event increases there will be less uncertainty. Hence, the MIC will look for patterns that lowers the entropy that is measured beyond a linear relationship.

The specification can be found in software programs. I will not present the actual formula other than to say that give another interpretation of how two variables x and Y may relate.


 

No comments: