Coder Shows Back-Tested Data For Optimizing Trading Algorithm

Matt: Welcome to the channel, everybody! I have an exciting face here today, and I know you haven’t seen him yet because he’s always on the back end of Nurp. His name is Marcin. He is part of the product development side of things and is always working on building out the best, most sophisticated code to ensure that our algorithms are always optimal and up-to-date. Welcome to the channel, Marcin.

Marcin: Welcome, thank you for having me.

Matt: No, it’s a real pleasure, and I appreciate you taking the time to sit here, even though I don’t really have much data or anything to go over—more so to go over your data for the channel. Before we get into the exciting stuff you have, I’d love to know a little bit about your background, where you came from, and how you even got into the coding space.

Marcin: With pleasure. So, I’m a computer scientist by training. Right after graduating, I got a position at CERN, where I was a data analyst for one of the experiments. That was a really exciting job, where I was looking at data coming from the LHC, which is one of the most advanced experimental facilities in the world, so that was pretty exciting. Later, I got more interested in finance, and I also got a position at the pension fund of CERN, which is actually a huge fund because they manage roughly five billion US dollars. That’s a considerable amount of money, and it was run by a relatively small group of around 25 professionals, split into different asset classes. Initially, I joined as a quantitative analyst, trying to build models for optimal portfolio allocation, and over the years, I got into the position of Head of Quant, where I was running a small team and also managing a number of quantitative strategies.

Later, after leaving CERN, I joined another asset manager in Switzerland, where I was a senior portfolio manager and quantitative strategist responsible for setting up a system for systematic trading for the firm. I was also responsible for five portfolios running for clients, and that was my last position before joining Nurp. I’m very happy to be here.

Matt: Wow, that is a pretty serious roster of history you have there! First of all, anyone watching this who doesn’t understand what CERN is – I don’t understand either, but what I do know is they have pretty sophisticated machinery, and I think they’re working on really interesting things. I believe one of the projects I looked up was trying to split atoms, and is the whole facility underground? Is it exposed?

Marcin: Well, the facility is quite incredible, and there are many experiments dealing with different things. So, there were all sorts of ideas there – splitting the atom was one, creating antimatter was another one. The biggest part of CERN is what’s called the LHC, which stands for Large Hadron Collider. It’s basically a particle accelerator that is roughly 28 kilometers in diameter, and it’s underground, roughly 100 meters below Geneva and the surrounding areas. They accelerate particles close to the speed of light and collide them. At the point of collision, there are huge detectors—you could think of them like photo cameras—that take pictures of what’s happening when these particles collide. That’s very exciting because these cameras take millions of pictures per second and store it all in incredibly large databases. So, there’s plenty of work for computer scientists, engineers, data analysts, and, of course, physicists as well.

Matt: There’s no way to overstate how impressive that is! It just goes to show you’re definitely talented, and we are beyond happy that you’re on the Nurp team. I think anyone watching this can conclude that if you can analyze and put data together on matter and dark matter – things that we don’t even understand—I’m pretty sure you can put some code together to figure out financial markets.

Marcin: Yeah, both problems are difficult to solve.

Matt: Well, that is amazing. I think that’s enough of hearing how excellent your roster of history is, and I’d love to now dive into some of the product side of things that you’re working on. I believe you have a presentation that you wanted to share. I’m very interested – if you could go ahead and share your screen – so we could see what it is you’re looking at on a day-to-day basis and what you’ve gathered for the audience here today.

Marcin: Sure, one part of what I wanted to share with you is my recent work on the developments and changes to one of our main algorithms, namely “Fed.” I was working on correlations and portfolio construction. In the portfolio optimization framework, we are trying to solve a problem that many asset allocators and asset managers face. There is a fairly well-established framework to address this issue, based on Modern Portfolio Theory, introduced by Harry Markowitz in the 1950s. Later, Markowitz received a Nobel Prize for this work, so it’s a well-known, well-established framework.

Basically, the idea is that we have a portfolio, which is a combination of different investments – in our case, these are different FX pairs. The goal is to make the portfolio as profitable as possible while keeping the risk low. That’s the overall goal.

Here, I have a little diagram which is purely for illustration purposes and not based on real data, but it’s a good illustration of what’s happening. We have individual assets represented as dots in a space with two dimensions. On the X-axis, we have risk, and on the Y-axis, we have return. As you can imagine, we want to be as far to the left as possible to minimize risk, and as high up as possible to maximize return. That’s the overall goal of the portfolio.

Risk is measured by how much the returns of an investment vary from day to day. This variation is also called variance, which is why the portfolio we aim to build would be called a minimum variance portfolio. What we are really focusing on here is how we can build the safest portfolio possible using the assets at hand – in this case, the trading pairs of the Fed algorithm.

We can do this because not all the pairs are perfectly correlated. Some pairs correlate highly, while others exhibit zero or even negative correlation. By combining assets with low correlation, we can achieve superior returns with lower risk. This is the exercise I was trying to solve here, and I can show you some charts I generated.

On the left, you see the different FX pairs. This chart starts from 2015, showing the evolution of various FX pairs. On the surface, it looks like there isn’t much correlation because they all seem to behave almost independently. However, if you look at the correlation matrix, it actually reveals plenty of correlation, with many green parts indicating relatively high correlation between the pairs.

There’s another lens we can use to examine this—by looking at the performance of the Fed algorithm trading different pairs. What you see here is a backtest of Fed, also starting from 2015, on the same pairs. The correlation of Fed trading these pairs is much lower than the correlation of the pairs themselves, as you can see. That’s a very good sign, but there is still room for improvement because some pairs do show some level of correlation.

If we look at our pre-selected pairs – those we’ve already chosen after a few rounds of screening – we could allocate them based on this micro-optimization, which would allocate to these pairs based on the correlations we observed in the FX pairs or based on the correlations we see between the Fed algorithm trading these pairs. What’s more important for us is how the Fed trades them rather than what the pairs are doing themselves, but both views have merit. We use both to get the total score. The higher the score, the more we should allocate to these pairs; the lower the score, the riskier the pair could be from a portfolio construction standpoint.

That’s one of the views we use to provide basic guidelines for our users. Another straightforward measure could be looking at the maximum historical drawdown of any given pair through the nine years of backtesting.

As you can see, there are two pairs that historically had slightly higher drawdowns. In the end, we decided to combine these two views—the correlation and the risk view, with the risk being perceived as the maximum drawdown – to come up with the final allocation. You can see this in the final column on the right. Pairs that receive a score of 1 would be considered base pairs, while those with a score of 0.5 would be considered higher-risk pairs.

So basically, what we show to users might just be a table like this, but there is plenty of analysis, optimization, and a lot of work that goes on behind the scenes.

Another thing I could show you is the comparison of the new and old settings that we also generate. We’ve been trying to minimize the risk, and that was really the goal. By selecting only 11 pairs out of the 23 pairs we’ve been trading before and increasing the lot size just slightly, we were able to reduce the risk significantly. As you can see, the maximum drawdown for the strategy has been reduced from 73% to only 37%. Additionally, the number of drawdowns exceeding 10% over the last nine years was reduced from 16 to 6, which is a very, very good result. The volatility of the strategy was roughly cut in half.

So from this analysis, you can see that we managed to cut the risk of the Fed strategy in half. Of course, the question you might ask is, “How does the return look?” since risk and return often come together. As expected, the return is a little bit lower, but the reduction in return is very small compared to how much we managed to reduce the risk. The average yearly return was reduced from 29.8% to 25.2%, which is only a 4.6 percentage point reduction for taking on half the risk. I’m very happy with these results, and I think our users will appreciate this as well.

Matt: Yeah, I think that’s a very fair tradeoff. So essentially, we cut the risk by half and only diminished the return by 4.6%, while cutting more than 50% off the risk.

Marcin: Correct.

Matt: That’s a very good tradeoff. Thank you for sharing all of that data with us – I found that very insightful. So basically, what you just presented was more about historical data, but also an evolution happening from an old model to a newer model. How soon is that being incorporated?

Marcin: Well, this is something that the whole team is working on, and it will be released in the coming days. We want to provide the results as quickly as possible but also ensure that what we present is thoroughly tested and very robust. That’s always the priority.

Matt: Nice. When it comes to testing, what is the typical standard amount of time where you deem a test finished and worthy or not worthy?

Marcin: That’s a good question. There’s an entire development process, and it could be viewed from many different perspectives. The main part involves developing the algorithm, testing different ideas, and running tests on various datasets. But then the question arises: How does this algorithm perform on different datasets or on data it has never seen before?

Here, we’re really touching on two subjects. One would be Monte Carlo testing—what we can expect from our algorithm going forward. The other is how we can ensure we’re not overfitting the strategy.

Matt: Okay, so let’s assume that myself and everyone watching this doesn’t understand the two terms you just mentioned. One of the terms was Monte Carlo, and the other was overfitting. Can you just give me a quick, layman’s term overview of what those can be?

Marcin: Sure, I can share a few slides to make it easier and walk you through an example. Monte Carlo testing is a method developed by Stanisław Ulam, a mathematician working on the Manhattan Project. The name “Monte Carlo” comes from the influence of his uncle, who was a gambler, and they went together to Monaco. Stanisław, as a mathematician and early computer scientist, was inspired by the world of gambling and sought to understand how randomness could impact different processes. So, Stanisław Ulam developed this method, which is now widely used and is fantastic for testing different outcomes. At Nurp, one way we employ the Monte Carlo method is as follows: When we conduct backtests, we have, say, 10 years of data for the algorithm’s returns, day by day. We can perform a lot of analysis with that data. For instance, we could put all the trades the algorithm made into a large basket. Then we could ask ourselves, “Can we simulate not only the actual years we’ve experienced but also hypothetical, random years we might encounter going forward?”

This is where the Monte Carlo method comes in handy. We can randomly draw trades from this big basket and create “buckets”—in this case, 10,000 little buckets. Each of these buckets will simulate one year of trading. Essentially, we’re asking, “What might a year of trading look like in the future based on what we’ve seen in the past?” This approach is extremely useful because it shows us all sorts of potential outcomes. It helps us realize that the historical data we have might only be a small sample of what could happen in the future.

To illustrate, if we simulate this and plot it on a chart, it might look something like this: Starting at zero, each line simulates one year of trading. In this example, the algorithm was making 430 trades per year on average. Some paths show very positive trades, while others could be quite unlucky. The key question is, “What does the distribution of these outcomes look like?” This is crucial because it gives us a sense of what to expect moving forward, purely based on randomness.

The data I’m showing you here isn’t related to any specific algorithm we have at Nurp—it’s from another algorithm I worked on that isn’t yet strong enough to be released. However, it illustrates that while this algorithm averaged roughly 30% returns, the distribution of outcomes is wide. You could have a year where you make zero, or a year where you make over 80% or even 100%.

This is where Monte Carlo testing really helps, as it allows us to analyze these potential outcomes. Ideally, we want an algorithm with a high expected return and a narrow standard deviation of outcomes around that positive expected value.

One last thing I wanted to show you is a type of analysis we can run based on this data. For example, the mean return in this case was 22%. However, in the worst 5% of outcomes, it would be less than -3%, meaning there’s a possibility of losing money. In the best 5% of cases, it could be close to 80%. So, you can see that the dispersion is quite wide, and Monte Carlo testing helps us identify potential issues and develop more robust algorithms moving forward.

Matt: Wow, that’s really interesting. Now, about the other term you mentioned—overfitting—what does that mean?

Marcin: Overfitting is a big problem for anyone trying to build algorithmic trading systems. Essentially, we have historical data, and everyone tries to build something that works well on that data. However, we have a relatively short history in trading—maybe 10 to 20 years of data. That’s not much compared to, say, algorithms in image recognition, which are trained on millions of samples. Overfitting occurs when an algorithm is too closely tailored to historical data, making it perform well on past data but poorly on new, unseen data because it’s too rigid.

Matt: So, essentially, it’s about trying to over-optimize the algorithm, which can actually hinder its performance because it’s been overdone?

Marcin: Exactly. Overdoing it is bad. There’s a sweet spot where you optimize your algorithm enough, but not too much, so that it’s still flexible enough to handle new, real-world data.

Matt: Thank you for all that data, Marcin. I found a lot of value in it, and I’m sure a large portion of the audience will, too. Many people have been curious about who’s working on these algorithms, and now they’ve met one of the key people behind the scenes. I’m looking forward to bringing more of the product team to the forefront so everyone can understand what they’re working on.

Before I let you go, I have one last question: What do you do the moment you wake up for Nurp on a day-to-day basis? What’s your top priority when you start your work?

Marcin: Great question. There’s plenty of work to do here, and I split my time into a few main areas. One is maintaining and improving existing algorithms, which involves checking if they’re performing as expected, ensuring they’re not overfitted, and making sure they still work as they should. That’s a big part of my job. Another significant aspect is constantly thinking of new ideas to improve existing algorithms, making them better, more robust, and able to handle unexpected events. As you know, financial markets are unpredictable—unlike physics, where we know what will happen if we drop something. In financial markets, things can go sideways, and that’s something we always need to consider. The last part, which I really enjoy, is developing new algorithms. This is especially exciting because we want to offer our clients the best possible selection of algorithms. Staying ahead of the competition and building the most robust trading systems is what makes me happy and drives my work.

Matt: Awesome. Well, I’ve had you for 26 minutes now, and I appreciate every single second. This has been a blast, and I look forward to meeting with you again and bringing in more people from your department to help everyone understand how hard we’re working on our algorithms. More importantly, we have a very robust quant department that’s diligently working on developing the best products on the planet.

Thank you again for your time, Marcin, and thanks to everyone for watching. If you got any value from this, please leave a like, comment, and subscribe. As always, everyone…

Please visit Coder Shows Back-Tested Data For Optimizing Trading Algorithm to watch the full interview on YouTube!

The post Coder Shows Back-Tested Data For Optimizing Trading Algorithm first appeared on Nurp.com.