Thursday, March 14, 2024

SHAP and explainable AI - Getting to know your models

 



ML models are hard to understand relative to classic regression analysis. Some fear that ML is often a black box but there are ways to make these models more transparent or have interpretability. The tool most used for explainable AI is the SHAP values (SHapley Additive exPlanations) uses game theory to measure each player or in this case feature contribution to the outcome. 

Each feature is assigned an importance value which represents the contribution to model's output. Features with a positive (negative) SHAP value will have a positive (negative) impact on the prediction. These SHAP values are additive, so each feature can have a contribution to the final prediction and summed.  While the SHAP values can tell us the contribution to the prediction, it cannot tell us about the quality of the model. 

Given the properties of SHAP values, there several ways to display their information. It can be displayed as a waterfall graph which tells whether each feature is adding or subtracting from the prediction. The sum of all predictions will be equal to E[f(x) - f(x)]. The absolute value of the SHAP tells us the overall importance of the feature. Note, the SHAP values can be calculated for any prediction model. 

It may be interesting to measure the impact of non-price information on a prediction through using the SHAP value. There are many ways to use this tool to help refine forecasts and provide insights on non-linear relationships. 

The SHAP values tells us the importance of a specific feature observation, so information is often displayed as a bee swarm plot which tells us the impact of different observations associated with a feature.  You can also use violin plots which again will tell the SHAP value for specific observations associated with specific feature. Force, bar, and waterfall plots all tell us something about the drivers of our model, and all these tools are available in python.  

No comments: