Machine learning (ML) is probably the hottest thing in quantitative finance right now. But it’s also badly misunderstood.
For starters it isn’t actually clear what machine learning actually is. The term conjures up images of artificially intelligent cyborgs poring over streams of financial data, coming up with novel trading strategies which they then test and modify – all without any human supervision. Some esoteric ML techniques look a little like this; genetic algorithms for example can modify themselves to improve their performance.
However, there’s more to machine learning than just this. Other methods described as machine learning look decidedly old-fashioned; many people even label classical statistical techniques like linear regression as ML. These older techniques require closely supervised learning – a human being has to specify the variables of interest and the general equation that relates them. The machine has to do no more than find a few parameter values.
You probably think that machine learning is a recent innovation. This is incorrect. Most ML techniques have been around for decades – the exciting ‘new’ technology of neural networks dates back to the 1950s. Where ML draws from traditional statistics there is even more history: linear regression was invented in the 19th century.
However two recent trends have brought machine learning into the limelight. Firstly there is more raw computer power available to data scientists than ever before. Partly this is thanks to Moore’s Law – the continuing exponential growth in the performance of individual chips. But it’s also because of cloud computing, which allows ML programmers to access machines far more powerful than their local desktop or server cluster. As a result computationally intense ML techniques are now feasible.
The other change has been the availability of “big data”: larger data sets for ML to crunch. In the traditional data world tick level price data is now relatively cheap and accessible, giving a much richer data set than minute-by-minute prices. There has also been significant growth in alternative data such as social media posts; which in theory could give clues to the mood of consumers and thus the fortunes of individual stocks.
But these trends don’t automatically translate into huge profits for anyone trying to use machine learning to predict stock prices. Even the fanciest ML technique won’t be able to find a relationship that isn’t there. Worse still is the danger that they will discover a pattern that isn’t really there, or won’t persist in the future. This problem of “overfitting” is an issue with all attempts to predict the future using data from the past, but it is especially problematic for complicated ML methods. Where ML does find a relationship it may just be discovering something that could have been found with more rudimentary tools.
Alternative data may also fail to live up to the hype. The chain of causality between many sources of alternative data and asset prices is probably quite tenuous, even if it exists. Most alternative data sets haven’t been around for very long, and ML techniques need long series of data points to find relatively weak effects.
Perhaps a more promising area is the ongoing battle between buy-side execution desks and the high frequency proprietary traders that try to pick them off. Here, both sides can use ML to see the faint footprints of their competition in large data-sets of tick level price data, and modify their strategies accordingly.
Ironically the real success stories for machine learning in finance are far away from the highly paid world of the front offices of banks and hedge funds. Instead they are to be found in the much less glamorous world of retail banking. ML is particularly good at identifying credit card borrowers and mortgagees who are more likely to default on their payments.
These areas have large and well established data sets for machine learning techniques to get their teeth into, but more importantly the behaviour of individual people seems to be more predictable than their interactions in financial markets. If you want a job in machine learning, it is probably here that you should focus your attention.