Thursday, December 14, 2017
How much machine learning is your quant using? Not clear, if you have not defined terms
The current buzzword used with quant investing is the term "machine learning". Many quants may like to appear smarter by peppering their strategy discussions with comments like, "We use machine learning to create new and enhance our existing models." Yet many investors don't fully appreciate that machine learning is a term that refers to a broad set of approaches to data analysis. Many of these techniques have been around for decades. Machine learning can be an all-encompassing term.
At the top of the machine learning taxonomy is the split between supervised and unsupervised learning. Supervised learning is what is done in most cases where there is inferring of a function from data input to a specific output. It involves training input data to reach a target outcome. Unsupervised learning is associated with searching and grouping of input data to find relationships even if there is no specific target output. Within supervised learning there is a breakdown between classification of events and the regression or mapping of sensitivities between input and outputs. Unsupervised learning can be defined as searching for clusters or similarity within a data set.
Understanding how managers go about their data analysis and which techniques they employ to tease-out information on market behavior is the real issue. How are they trying to classify data to find signals on return? How are they using regression to forecast or fit valuations to fundamentals? Or, how are they searching data without a specific relationship model in place? Do they use the right technique for the right problem? Will a different technique provide new insight on the same data? Investors have to ask why a specific technique will provide more useful results.
The key use of a broader set of machine learning techniques is that it can open up or restructure data sets to find new relationships and patterns that are not immediately obvious through linear regression, traditional time series, or simple rules. For example, the use of decision trees and categorical analysis may be helpful with finding non-linear relationships that may not immediately apparent in trends or regression.
Given inexpensive computing power, large data sets, and new statistical techniques, new recurring patterns in prices may be found. Of course, the flip-side to these atheoretical approaches is that data mining is done to excess without regard to sampling bias or the power of the tests. This is why digging into the details of what a manager may mean by machine learning is so critical. If a manager cannot effectively explain the value of their techniques to an investor, then it not likely to these tools will do their job.