Your backtest is probably wrong

Why the awesome results you're seeing in your backtest may break your heart in the future

Jan 28, 2024

So you have decided you want to become a quant trader. You have been reading a lot of articles, and exchanging some tweets with X (twitter) quant-influencers, think Python is just too slow and only use R for its libraries.

Because you’re a quant trader now, you have to show everyone what you’re capable of. One of the influencers you follow posted a Sharpe ratio of 4, but "he keeps all the alpha for himself". So you go on Google and look for "Quant trading strategies with high sharpe". Spend a few hours reading and decide the easiest way to have a high Sharpe ratio is by doing arbitrage strategies. You keep researching and discover pairs trading, plus you also read words that fit your quant persona "sophisticated trading strategy", "experienced traders" and "high-frequency trading".

"This is it", you say, "I'm going to do a pairs trading strategy in Crypto".

A week or two passed by, and you got your first backtest results. The results are meh, but it's a good day because now you have "your baseline model" finished and can start what you enjoy the most, the "optimization phase". The first thing you do is try to increase the number of trades because this is arbitrage so more trades = more money.

Instead of daily sampling, you start using 1 hour, then 30 minutes. You stop at 10 minutes bars because that’s a lot of data, your Python prototype is taking too long, you take a screenshot of tqdm() time of arrival and post "This is why I hate Python, 2h to run a backtest!!".

After modifying the lengths of your moving averages and adding different EMAs on top of SMAs that are on top of averaged z-scores… Finally you got a good looking PnL curve using 10 minute bars in your backtest. Time to get investors! You got some family and friends’ money and started trading live, your parents are proud to say they have a high frequency quant trader.

Right after you went live the strategy starts bleeding, you have some green days but they are not enough to meet the 45-degree PnL curve you had in the backtest.

All this hard work and all you get is losses. You have spent months building this thing, and have to tell your family and friends maybe you’re not a high frequency quant trader yet.

What went wrong?

What, where, how, when, why

Imagine you are trying to convince the best quant trader in the world that your trading strategy is robust. But he’s not your friend with whom you just chit-chat about trading, this guy will ask you about evidence for every argument you show. Some questions you will need to have answers for:

What assets are you going to trade? Why?
Where (what exchanges) are you going to trade? Why?
Why not trade assets A, B and C in exchanges E, F and G?
Why would this strategy work?
What is the major risk for this strategy?
When was last time this major risk occurred and what happened?
What other things could go wrong?
How often does the effect this strategy is based on happen?
What are your profit expectations? Why?

Up until this point, you haven’t run any backtest. Here you know there’s a mispricing effect you can profit from. And you know every single detail about it, you know so much you can give a seminar about it. Now it’s the time to “try” to profit from it.

Backtesting 101

Normally all backtests are run simulating market orders. It’s just assumed you can execute a market order of whatever size at the last traded price of each bar in the sampling you chose 10min, 60min, 1day. This may be fine with big samplings and if you’re trading small size, however, if you intend trading with 10min bars in an illiquid instrument it may be unrealistic to assume perfect fills.

If you go with market orders, you want to know the spread of the best bid-ask and the liquidity. Say your strategy predicts a return of 1% in the next 10mins, but the best bid ask spread is 2%, is it a good idea to hit market order? What about the liquidity, there’s $100k of liquidity sitting at the best ask, but your signal is telling you to buy $200k, the market order is going to wipe the best ask and hit whatever levels deeper, these levels sometimes are not just one tick apart, the best ask may be priced at $10.5, and you can end up getting fully filled at $14.

With limit orders, the story doesn’t get much better. You have the potential opportunity cost of not getting fully executed, and if you try trading amounts relatively bigger than the instruments order book, you will need an execution strategy, which will complicate things further. Let’s say you go with the good ole TWAP, now your signal tells you to execute at this current time, but with TWAP it may take you seconds, minutes or hours to get fully executed. Is your signal prediction strong enough after adding execution lags?

Now say you have a flawless order simulator with all kinds of variables and execution strategies. The good news is that it may be good enough to pour real money into the market, bad news is don’t expect the backtest results to match live results. One thing that is very hard to simulate is market impact, or how other players react to your real orders. For instance, if you send a market order, as soon as you get a partial fill, everybody is going to notice and some people may think you’re a big dog and may want to just follow your trade, or market makers may just widen their quotes to make you pay more for your potentially informed trade.

Fees

Needless to mention fees must be accounted for, as well as rebates. This may also help you choose which execution strategy you go with, exchanges generally incentivize liquidity provision. So if you add liquidity, you get a discount or a rebate, if you remove liquidity you have to pay a premium.

If you’re smart and your strategy is very low risk, you can negotiate with exchanges. Try to talk to them after trading live for a bit specially if you are providing liquidity. They may offer you better fee tiers if they like you.

Latency

How would your backtest look like if you were able to execute with an added 200ms latency? What if you were the first trader to see the quotes? Would you have an advantage over other competitors? Of course you would, picture 1000 traders trading the same strategy, the last one to get the market signal will generally execute at the worst price. Latency optimization is out of scope for this post, but let’s just say you want to be as close as possible to the exchange, cut as much latency executing your code as possible and share information in the most efficient manner between all the components of your strategy.

Optimizing aka overfitting?

Before running the backtest, you should already have an idea of what’s a set of parameters that may not be the best, but they make sense because of the evidence you gathered in the researching stage with your correlation plots, heat maps, linear regressions etc.

Careful optimizing parameters after you have run your first backtest. If your backtest results are not in line with your research work, go back to research stage, make sure the signals are correct, check an execution plot, add some circles or arrows when the market event was received, signal was processed and executed.

Say you found some errors here and there, fixed them, and the PnL curve still looks ugly? Maybe your signal is not very secret. The more known the strategy or signal are the more likely the edge will decay over time. May be a good idea to go back to research stage and find another thing to trade.

Leverage

Don’t play with fire at the beginning. Using leverage adds more variables and risks you have to consider. Now you have to become an expert in margin and liquidation. Try to trade with as little money as possible in the beginning and try to enhance the results once the live results are in line with your expectations.

Monitoring

Imagine an airplane flying without all those indicators pilots have. In trading, you have to know every single detail of your strategy, you are the pilot. Build a monitoring system as well as metrics and alerts. These metrics have to cover the entire strategy pipeline, from data ingestion to order execution: Correlation with simulated results using the same data as live trading, number of market events received, time to process data, number of signals in last X period, average signal, number of orders sent, turnover, fees paid, margin, leverage, etc.

Usually the more monitoring the better. This part is underrated but will help you a lot when looking for patterns or investors. Shows that you pay attention to every detail of the strategy as well as being aware of all the risks involved. This normally translates to people thinking “This trader knows what she’s doing” and potentially trusting you with their money.

Lastly, avoid breaking Notorious B.I.G.’ rule number 7:

“Seven, this rule is so underrated
Keep your family and business completely separated”

(Notorious B.I.G., Ten crack commandments)

Trading is extremely risky. It’s not like a normal job where if you make a mistake you will still get paid at the end of the month. Here if you are wrong you are most likely going to pay for it, and it could be even worse, sometimes you won’t know you’re wrong right away and may notice once the result is aggravated. Don’t drag your family and friends into this with you.

Hope this post will help the reader to not pull the trigger too fast when going live with a trading strategy. Algorithmic trading is not so simple, there’s so many things that can go wrong, here we just covered a few. Emphasis should be put on understanding the effect you’re trying to trade as well as all risks and factors involved before trading it with real money.

You Got This Trading

Aug 27

Yep - I like to tell people they can do whatever stupid thing they want as long as they take 1% of what they thought they would trade and use that instead.

Expand full comment

Simon M

This a great sister article to my recent post on back-testing and back-testing systems! https://thealgorithmicadvantage.substack.com/p/battle-of-the-back-testers : )

Diary of a Quant

Discussion about this post