Backtest and validate AI hedge fund strategies with Sourcetable's natural language interface. Analyze historical performance, test signals, and optimize portfolios without complex coding.
Andrew Grosser
February 16, 2026 • 14 min read
Your momentum strategy showed 87% win rate on 2020-2021 data—backtested returns of +142%. You pitch it to investors. Then 2022 hits: -38% drawdown, win rate collapses to 34%. What went wrong? You curve-fit to a bull market, didn't test regime changes, ignored transaction costs, and overlooked slippage. This is why rigorous backtesting matters: validating strategies across market cycles, stress scenarios, and real-world frictions before risking capital.
Excel/Python backtesting means writing hundreds of lines: loading data, calculating signals, tracking positions, computing returns with =LOG(B2/B1), accounting for commissions, rebalancing costs, slippage—then repeating for out-of-sample periods and parameter variations. One strategy takes days. Sourcetable eliminates this. Upload price data, describe your strategy ("Buy when RSI <30, sell when >70"), ask "Backtest on S&P 500, 2015-2024, show Sharpe ratio and max drawdown." Get complete performance analysis in seconds. Start backtesting strategies for free at sign up free.
Why do backtests show amazing returns that disappear in live trading?
Because lookahead bias lets your model "peek" at future data when making past decisions—using information that wouldn't have been available at the time. Example: your strategy buys when today's close is below the 20-day moving average. In Excel, you calculate MA with =AVERAGE(B2:B21) and compare to today's close in B2. But real trading executes during the day before the close—you need yesterday's MA (using B3:B22) to make today's decision. Using B2 creates lookahead bias, artificially inflating backtest returns by 15-40% because you're trading with tomorrow's information.
Other common lookahead errors: using adjusted prices for corporate actions (splits, dividends) that weren't known at trade time, calculating indicators on full-day data when signals trigger intraday, rebalancing portfolios based on end-of-period returns, and using survivor-biased indices (current S&P 500 constituents, not historical). A momentum strategy backtested on today's S&P 500 stocks avoids all the delisted losers—survivors bias inflates returns by 20-30% vs trading the actual historical index.
How do you prevent lookahead bias in complex multi-asset strategies?
Use point-in-time data and shift all indicators one period forward before generating signals. If calculating Monday's signal, use only data available through Friday close. In Excel, this means careful cell referencing: indicator in row 10 uses data from rows 1-9, trading signal in row 11 uses indicator from row 10. With 2,000 rows and 15 indicators, you're managing 30,000+ cell references—one mistake invalidates the entire backtest.
Sourcetable handles this automatically through temporal awareness. When you ask "Backtest momentum strategy: buy top quintile by 12-month return," the AI ensures each month's quintile ranking uses only data available through the prior month-end. For corporate actions, specify: "Use unadjusted prices and account for splits/dividends as they occur" and the model applies adjustments chronologically. Follow-up: "Compare results using today's index constituents vs historical constituents" reveals survivorship bias impact—typically showing 15-25% inflated returns from survivor bias.
Let's backtest a dual momentum strategy (relative + absolute momentum) across three distinct market regimes: 2020 COVID crash/recovery, 2021 bull market, 2022 bear market. This demonstrates in-sample vs out-of-sample testing, regime analysis, and transaction cost impact.
Step 1: Define the strategy (January 2020)
Strategy rules: Monthly rebalancing, invest in top 3 sector ETFs by 6-month relative momentum (returns vs other sectors), but only if absolute momentum positive (6-month return >0). If fewer than 3 sectors qualify, allocate remaining capital to cash. Position weight: equal-weight top 3 sectors.
Upload 11 sector ETF data (XLK, XLF, XLV, XLE, XLY, XLP, XLI, XLB, XLU, XLRE, XLC) from January 2018 to December 2024 into Sourcetable. Ask: "Calculate 6-month momentum for all sectors each month, rank, and show top 3 with positive absolute momentum."
Step 2: Run in-sample backtest (2018-2019)
Start with in-sample period to validate strategy logic. Ask: "Backtest dual momentum strategy on 2018-2019 data, show annual returns, max drawdown, and Sharpe ratio." AI calculates:
In-sample results look promising: beat S&P in down year (2018), matched in up year (2019), solid risk-adjusted returns. But this is just validation that logic works—not proof the strategy will work forward.
Step 3: Out-of-sample test on 2020 (COVID crash)
How does the strategy perform through unprecedented volatility?
Ask: "Backtest 2020 with monthly transaction breakdown." Sourcetable shows month-by-month:
Key insight: strategy sacrificed upside (missed April-May recovery) to protect downside (avoided March collapse). Risk-adjusted performance strong: Sharpe 0.94 vs S&P 0.52. The absolute momentum filter worked—moved to cash when momentum turned negative.
Step 4: Test on 2021 bull market
Ask: "Backtest 2021, compare to buy-and-hold S&P." Results:
In strong trending bull markets, simple buy-and-hold often beats sector rotation—transaction costs and timing friction drag returns. Strategy still delivered solid absolute gains (+24.8%) with lower volatility (14.2% vol vs S&P 17.8%), but relative performance lagged.
Step 5: Test on 2022 bear market
The critical test: how does strategy handle sustained decline? Ask: "Backtest 2022 with sector allocation each month."
Strategy's value proposition confirmed: preserves capital in bear markets through absolute momentum filter. Sitting in cash for 6 months meant forgoing any gains but avoiding catastrophic losses. Over 2020-2022 combined (three distinct regimes), strategy delivered +28.1% total vs S&P +18.4%, with significantly lower max drawdown (-18.2% vs -33.8%).
Step 6: Analyze transaction cost impact
Monthly rebalancing generates significant turnover. Ask Sourcetable: "Calculate transaction costs assuming 0.1% per trade (bid-ask spread + commissions). Show net returns." AI adds up monthly trades:
Transaction costs reduced returns by 5.6 percentage points—still outperformed S&P by 3.9pp net of costs (+22.5% vs +18.4%). But this highlights importance of cost-aware backtesting. Ask: "Compare monthly vs quarterly rebalancing." Quarterly rebalancing cuts costs to -2.1pp (60% reduction) while maintaining 90% of excess return—better risk-adjusted approach.
How do you test if optimal parameters from backtesting will work in future?
Use walk-forward analysis: optimize parameters on a training window, test on an out-of-sample period, then roll the window forward and repeat. Simple example: momentum lookback period. Should you use 3-month, 6-month, or 12-month momentum? Optimizing on full 2018-2024 data shows 6-month is best (+34% return), but did you just curve-fit? Walk-forward tests whether 6-month stays optimal across time.
Process: Train on 2018-2019 (24 months), test on 2020 (12 months). Then train on 2019-2020, test on 2021. Then 2020-2021 data trains, 2022 tests. If parameter that's optimal in training consistently works in testing, it's robust. If optimal parameter keeps changing (6-month best in period 1, 3-month best in period 2, 12-month best in period 3), you're overfitting—the parameter doesn't have predictive power.
In Excel, walk-forward analysis means rebuilding your backtest 5-7 times with different data windows, manually tracking which parameters optimize in each training period, testing those parameters in hold-out periods, and aggregating results. For three parameters with 5 values each (125 combinations × 7 walk-forward periods = 875 backtests), you're managing tens of thousands of formulas.
Sourcetable automates this. Ask: "Run walk-forward optimization on momentum lookback (test 3, 6, 9, 12-month), using 24-month training and 12-month testing windows, rolling monthly from 2018-2024." AI performs complete walk-forward analysis, showing:
This validates 6-month momentum has genuine predictive power, not just curve-fit to specific historical period. You can confidently deploy this parameter in live trading knowing it performed consistently across multiple out-of-sample tests.
Historical backtests show what happened, but Monte Carlo simulation shows what could happen under different random outcomes. Take your strategy's historical trade sequence and randomize the order—if your strategy produced 100 trades over 5 years, Monte Carlo reshuffles them 10,000 times to generate distribution of possible outcomes. Some sequences will hit winning streaks early, others lose streaks. This reveals outcome range purely from randomness, helping separate skill from luck.
Why this matters: your backtest showed 32% total return with max drawdown -18%. But what if the winning trades came at the end and losing trades came first? Monte Carlo reveals that in 15% of random sequences, max drawdown exceeded -30%—same trades, different order, much worse intermediate experience. Your actual -18% max drawdown was somewhat lucky timing. Investors need to know the strategy could have experienced -30% drawdowns with different luck.
How do you determine if strategy performance is statistically significant or just luck?
Compare actual Sharpe ratio to Monte Carlo distribution—if actual Sharpe is in top 5% of random outcomes, strategy likely has edge. Your dual momentum strategy delivered 1.18 Sharpe over 2018-2022. But randomly shuffling trades 10,000 times shows: median Sharpe 0.87, 95th percentile Sharpe 1.32. Your actual 1.18 falls at 78th percentile—above median but not exceptional. This suggests modest edge, not strong alpha.
For comparison, if your actual Sharpe was 1.45 (95th percentile), you could claim with 95% confidence that results aren't pure luck. Conversely, if actual Sharpe was 0.92 (55th percentile), your outperformance is likely noise. Monte Carlo provides statistical rigor to performance claims.
Sourcetable runs Monte Carlo automatically. Ask: "Run 10,000 Monte Carlo simulations of my dual momentum strategy, show distribution of total returns, max drawdowns, and Sharpe ratios." AI randomizes trade sequences, calculates metrics for each simulation, generates histograms showing: 50th percentile return +24% (your actual +28% is 72nd percentile), 50th percentile max DD -16% (your -18% is 58th percentile), 50th percentile Sharpe 0.87 (your 1.18 is 78th percentile). Conclusion: strategy has modest positive edge but outcomes highly sensitive to timing luck.
Markets alternate between regimes: trending (strong directional moves), mean-reverting (sideways oscillation), volatile (large swings), calm (low volatility). Momentum strategies thrive in trending regimes, fail in mean-reverting. Conversely, mean-reversion strategies profit from sideways markets but get crushed in trends. Without regime analysis, you'll deploy strategies in conditions where they're statistically likely to fail.
Classify historical periods by regime using VIX (volatility), moving average slopes (trend strength), or correlation (dispersion). Then backtest your strategy separately in each regime. A complete picture requires 4 regime-specific backtests: uptrending + low vol, uptrending + high vol, downtrending + low vol, downtrending + high vol. If your strategy loses money in 3 of 4 regimes but makes outsized gains in the one regime that dominated your backtest period, you have a fragile strategy.
How do you identify which regime the market is currently in?
Use regime indicators updated in real-time: VIX level, 50-day vs 200-day MA relationship, equity correlation, sector dispersion. Combination rule: Uptrend = SPY above 200-day MA + VIX below 20, Downtrend = SPY below 200-day MA + VIX above 25, High vol = VIX above 30 regardless of trend, Low vol = VIX below 15 + low sector correlation. These regimes have different probabilities and durations—uptrend low vol is most common (45% of months since 2000), downtrend high vol is rarest (8% of months) but causes most damage.
Ask Sourcetable: "Classify each month 2000-2024 by regime (uptrend/downtrend × low/high vol), backtest my momentum strategy in each regime separately." AI generates regime-specific performance:
This reveals: strategy only works in uptrends. In downtrends (37% of months), you lose money. Your overall positive returns came from being long during bull markets. To improve, add regime filter: only run momentum strategy when market is in uptrend regime, otherwise move to cash or bonds. Sourcetable can backtest this modification: "Rerun strategy but go to cash when regime is downtrend + high vol." This single change improved Sharpe from 1.18 to 1.54 by avoiding the worst-performing regime.
If your question is not covered here, you can contact our team.
Contact Us