Thursday, November 3, 2022

Survivorship bias - Need to know what is missing from the data


 

There is the old story concerning Abraham Wald, the great statistician, and measuring the right thing in any statistics problem. The Air Force in WWII wanted to determine how they should add armor to their bombers to ensure they would survive given heavy losses from its daylight bombing. The air force staff gathered statistics on all the bombers that came back after missions and looked at the probability of certain areas of the plane being hit with flak or bullets. The idea was to add armor to those areas most often hit with enemy flak.

They proudly gave their extensive evidence to Professor Wald and asked him to validate their thinking on where to put the most armor. He responded in a very simple way by saying that armor should be placed where there was no record of damage. This was just the opposite of what was expected by the other statisticians. His answer was simple and profound, "The bombers hit in those places never came back." The data analysts could count the surviving bombers' damage. There is no evidence for the bombers that crashed.

Ask what data are counting and then ask what the data are not counting. Only survivors are counted and included in databases. You also want to know what got away from the analysis. 

No comments:

Post a Comment