John Arabadjis, head of markets macro strategy product and analytics at BNY Mellon, talks to Profit & Loss about the challenges associated with mining different data sources to produce new insights for investors.
Profit & Loss: You were recently hired by BNY Mellon in the newly created role of head of markets macro strategy product and analytics. What precipitated the need for the creation of this role?
John Arabadjis: BNY Mellon is making a concerted effort to build a very quantitatively based research product that leverages big data and modern analytic techniques. They already had the seeds of this in their market analytics team, but they wanted to bring in someone who has experience shepherding complex products to clients, sifting through huge amounts of data, building products that are very model-driven and developing new models for describing the market. In addition, I think that the firm wanted to bring its quant modelling efforts to the next level in order to rejuvenate its iFlow product and to add some new capabilities around it.
P&L: Is the big challenge for you obtaining all the external data that you need, or finding the right data internally from different segments of such a big organisation?
JA: I would say it’s both. But actually, I think the real challenge is the fact that data in and of itself tends to be riddled with errors, it tends to be very heterogeneous across different types of data and there’s even unstructured data we’d like to eventually play with. So, wrestling the data to the ground before you can even do anything with it is probably the biggest challenge.
“Some of these alternative data sets are structured and some are unstructured, but I think that on our menu, everything’s fair game”
P&L: How do you overcome these challenges?
JA: With a lot of blood, sweat and tears! You have to plow through it, you have to develop tools that allow you to look at huge amounts of data, find where the errors are, and understand what processes have generated the data. There’s a lot of legwork that has to go on, particularly if you’re tackling a new data source.
P&L: Can you give an example of an error? Is it that the data itself is wrong or is it two different kinds of data types that are being formatted differently? What are the typical errors that you struggle with?
JA: It’s all of the above, really. Imagine there’s data that’s manually entered somewhere and somebody types in two extra zeros, well that can kill an analytics result. There can be actual data errors, fat finger errors, there can be different processing errors, there can also be errors of interpretation. For example, if I’m getting data from two different vendors I might think that they’re the same, but it could be that they’re measuring the same thing but from different regions or different asset classes and so you might interpret them as being the same, but they’re not, and so you could end up aggregating things that should not be aggregated. Basically, if you can think of an error, it’s probably in the data.
P&L: What kind of macro strategy products are currently employed at BNY Mellon, and how are you going to be changing them?
JA: We have an iFlow product which tracks flows in asset classes around the world that is currently leveraged by the macro strategy team, who use it as an input for writing very differentiated market commentary. What we want to do is beef that up and broaden the metrics, while also tackling other data sources.
FX is at the core of our Markets business, so from a client receptivity standpoint it makes sense to look at it first.
The idea is to take a good seed and just grow it into a 2.0 version. We want to provide very unique insights into market dynamics – what are investors and institutions doing around the world, how does that affect the way that market dynamics play out over different horizons, whether you’re talking a day, a week or a year. We want to give people differentiated insights using as many different data sources as possible because everybody has the pricing data these days.
P&L: Talking about different data sources there, how much of a believer are you in the value of alternative data sources for financial markets?
JA: Well “alternative data” broadly covers everything that’s not typical financial data that’s been harvested from exchanges, self-reported on regulatory filings, etc. But what is alternative data really? There are things like satellite imagery, there are firms that publish the number of cars in the parking lots at a big box store and images of ships going across water detailing how deep are the ships sitting, the locations of commercial airliners. Then there’s also environmental/social/governance – or ESG – investing, which includes data, for example, on the number of women on the board of directors of a publicly traded corporation or the different HR policies at various firms. The carbon dioxide output of a firm due to its electric load, that is alternative data and may have implications on the financial performance of the firm – it certainly has implications for the regulatory requirements for them, depending on the jurisdiction.
Now some of these alternative data sets are structured and some are unstructured, but I think that on our menu, everything’s fair game. However, we’re a pretty lean group, so we’re going to be prioritising our build out, concentrating on foreign exchange metrics initially and then branching out into multiple asset classes. And we’ll keep drawing a larger and larger circle around our input data sets as our bandwidth and our successes allow.
P&L: Why did you choose to focus on FX first?
JA: FX is at the core of our Markets business, so from a client receptivity standpoint it makes sense to look at it first. We have good sources of data, both internal and external and it’s an area that we’re pretty familiar with. I also report into Jason Vitale, who is COO of BNY Mellon’s FX business, so it is the natural place to begin by focusing on the currency markets.
P&L: Can social media be important as an alternative data source?
JA: In doing research over the years, I’ve learnt that it depends on what you’re using social media for. The problem with many sources of unstructured data, like social media, is that they tend to be massive echo chambers. When something happens in the social media space, it’s difficult to calibrate unless you spend a lot of time understanding and doing due diligence on the sources.
“The problem with many sources of unstructured data, like social media, is that they tend to be massive echo chambers”
For example, you can get something that gets repeated a million times across Twitter by some very disreputable sources. On the other hand, you can get one influencer or trusted source that says something that may get repeated much less, but it actually contains much more solid information. So, from our perspective, I would put social media very far down on the list of data that we’ll go after. It’s just a really difficult problem.
P&L: Is there a challenge, in the so-called “Fake News Era,” around finding the right news information out there?
JA: The proliferation of media that does not adhere to sound editorial standards is a big problem for anybody chewing through media as a reputable data source. We’re not going to actually tackle unstructured media data for a while, so we’re not overly concerned with that yet. The amount of structured data alone will keep us busy for the foreseeable future.
P&L: Can you ever have such thing as too much data or is it just the case of the more you have, the better?
JA: From the investor perspective, that’s exactly the problem: they are inundated with data and information. Even if the information is actually sound, there can be too much of it. I hate using the term “big data” because it’s overplayed, but the amount of data that’s now available is incredible. You may have heard this statistic, but every two years the amount of data that’s been produced and captured digitally is greater than the sum of all previous years’ worth of data. We’re still at the exponential growth stage, and that’s probably going to continue.
“We do have the problem of too much data, and that’s precisely the point of building these metrics out of that data”
So yes, we do have the problem of too much data, and that’s precisely the point of building these metrics out of that data: so that you can sift it and distill insights out of it for investors who don’t have the time to read all the material that’s available. They don’t have time to chew through every dataset and so we want to do the distilling for them in a very scalable way.
P&L: How have client demands regarding macro analytics tools changed in recent
JA: Put it this way: if you look 10 years ago, I don’t believe you would have ever heard someone say that they had a geopolitical risk indicator – that is a phrase that really didn’t exist back then. And what I mean by that is, some number that’s derived from a quantitative model which can characterise the geopolitical risks given the regulatory and political structure of a specific jurisdiction.
That now is a real thing because we have techniques to sift through data and we can put together these kinds of metrics, we have access to the data that produced this. As a result of this I think that consumers have now set their sights a bit higher: they think that if you can build something like that, then you can build a model that will give them the probability of a certain event happening, like a “#Me Too” scandal breaking out at a given company. They expect you to be able to provide simple characterisations of a very complicated process. In the era of Big Data, Netflix and Amazon suggest that you might want to watch this movie or buy this brand of toothpaste by running these fairly complicated models using data from you and your closest million friends. I think that people expect these types of analytics now, so we are filling that void in financial services.
P&L: And finally, your background is in astrophysics. How has this shaped the way that you approach these kinds of data problems in financial markets?
JA: At first blush they have nothing to do with each other, except for the fact that I studied a lot of math and computer programming. But I think I do tackle these kinds of problems in the way that my training dictates, a way that has been hardwired into me, which is: look at everything as a scientific problem. Don’t look at a huge random set of data and use your cognitive biases to impose order on it because that’s really a psychological issue. I’m very aware of those kinds of problems from my previous career.
“Don’t look at a huge random set of data and use your cognitive biases to impose order on it because that’s really a psychological issue”
Let’s say you go outside at night and look at the stars. You might be able to find the Big Dipper, but when it comes down to it, your eyes are just picking up bright stars and forming a pattern. Stars in the Big Dipper really have nothing to do with each other; it’s just something your eyes and your brain are doing. So you’ve got to be careful of that when you’re trying to come up with explanations or theories for market dynamics. You don’t want to just pick out random events or random data points and attach meaning to them, you have to guard against cognitive bias. I think that’s one of the things that my training has helped me with in financial services.