Swimming in data lakes and running through random forests (part 1)

12 Mar 2019

LGIM: Swimming in data lakes and running through random forests (part 1)

Why should investors take note of advances in the field of data science? And why now?

As my Asset Allocation colleagues keep telling me, I’m well overdue a blog post given that I’ve now been at LGIM for 18 months. But rather than write one single, lonely post I thought I’d kick off with a series of blogs on one of my favourite topics: artificial intelligence and machine learning.

As Lars has written about recently, we have an overweight to the US tech sector in our portfolios and see opportunities in companies that can profit from advances in this field.

But rather than focus on the effects of AI on other industries, I'll concentrate on ways in which we can further incorporate advanced techniques into our investment process, starting in this post by answering: ‘Why now’?

 

Sounds impressive, doesn’t it? To tone things down just a little, what we’ve seen over the past 15 years are advances in three key areas, all of which are interlinked and in many ways dependent upon each other.

 

1) Data

An often-heard quote is that 90% of the world’s data has been generated in the past two years, although people have been saying that for at least the past five years so it can’t be entirely accurate.

But the sentiment is undoubtedly true; new and often unstructured datasets generated by our internet usage, mobile phones and satellites to name but a few inputs are ever more readily available. What is also true is that only a fraction of this ‘Big Data’ has been harnessed or analysed in any meaningful way so far.

2) Computing

We have seen a shift over the past decade from computers using CPUs to GPUs (Graphics processing units). Whereas CPUs run programs in a consecutive fashion, GPUs are able to break programs down into chunks that are processed simultaneously, thereby significantly improving speed and performance.

The move to what is referred to as massively parallel architecture was pioneered by companies such as Nvidia, and has been an important contributor to recent advances in AI.

In addition, cloud computing has become ubiquitous. This not only means that storage of large and unstructured datasets has become easier and more cost effective, but also processing power and analytical tools are now available in the cloud from providers such as AWS and Microsoft Azure. This means that we no longer need powerful desktop computers or even local databases to be able to take advantage of these improvements

3) Analysis

Many of the principles of machine learning have been around for a long time in disciplines such as physics and chemistry, but due to computational constraints the more complex procedures remained largely theoretical.

Thanks to the advances in processing power previously mentioned, as well as the availability of open source function libraries in popular coding languages such as Python, the application of many of these models is now widely achievable.

Machine learning has benefited from computational improvements, and large unstructured data sets would be relatively useless without advanced techniques to analyse them. Hence these three pillars have combined to drive the data science revolution!

Next time I’ll give my take on what the terms artificial intelligence and machine learning really mean in the context of investing.

 

 

 

Tim Armitage

Quantitative Strategist

 


Share this article