Saturday, December 4, 2021

Price-based and fundamental systems - Trusting data structures


“Trusting a black box model means that you trust not only the model’s equations, but also the entire database that it was built from.”
- Cynthia Rudin AI researcher 

This is an important concept to remember with any systematic model. Where are data coming from? What data are used? How are the data manipulated and adjusted before it enters a model? How are data cleaned? If there is fundamental data, are the taken from the original announcement? Are times series properly aligned with announcement times? 

It may not be garbage in and garbage out, but the quality of the ingredients will affect the cake that comes out of the oven. 

A price-based system has a lot of database trust - the source for decisions is well-defined and easy to manage. Yet, even in this case, there needs to be a review of database structure. There are differences between closing and settlement prices, and I have found differences in price between vendors for exchange traded prices. For equities there are issues with how to handle adjustment for dividends. For futures prices, there are issues with handling rolls. 

The problem of databases increases greatly when non-price data are added. Employment data are revised. Fed announcements may occur before the close. Earnings announcement usually have other information embedded with the accounting data. 

It is possible that small data differences can create a different return series for the same model. You are not investing in just a model. You are investing in a complete data management process. 

No comments: