Backtesting Guide

Backtesting validates forecast accuracy by testing models on historical data. This guide covers rolling-origin backtests, performance metrics, and parameter optimization.

Related:

GLOSSARY.md — Definitions of MAE, RMSE, Sharpe ratio, etc.
FORECAST.md — Forecasting methods
FORECAST_GENERATE.md — Detailed forecast options
VOLATILITY.md — Volatility forecasting

Key Concepts

What is Backtesting?

Backtesting answers: "How well would this forecast method have performed on past data?"

Instead of testing on the same data used for training (overfitting), backtesting:

Picks historical "anchor" points
At each anchor, generates a forecast using only data available at that time
Compares the forecast to what actually happened
Aggregates error metrics across all test points

Rolling-Origin Backtest

The standard backtesting approach in mtdata:

Timeline: [----history----][forecast horizon]
                          ^
                       anchor

Parameters:

steps: Number of anchor points to test
spacing: Bars between anchor points
horizon: How far ahead each forecast predicts

Example: steps=20, spacing=10, horizon=12 creates 20 test points, each 10 bars apart, each forecasting 12 bars ahead.

Quick Start

Compare Forecasting Methods

mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
  --methods "theta sf_autoarima analog" --steps 20 --spacing 10

Single Method with Custom Parameters

mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
  --methods theta --params "alpha=0.3" --steps 30

Volatility Backtest

mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
  --quantity volatility --methods "ewma parkinson garch" --steps 20

Command Reference

mtdata-cli forecast_backtest_run <SYMBOL> [OPTIONS]

Core Parameters

Parameter	Default	Description
`symbol`	(required)	Trading symbol (e.g., EURUSD)
`--timeframe`	H1	Candle timeframe
`--horizon`	12	Bars to forecast at each anchor
`--steps`	5	Number of test anchors
`--spacing`	20	Bars between anchors
`--methods`	auto	Space or comma-separated method names

Method Parameters

Parameter	Description
`--params`	Parameters applied to all methods (JSON or `k=v`)
`--params-per-method`	Per-method parameters: `{"theta": {"seasonality": 24}}`

Example with per-method params:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
  --methods "theta arima" \
  --params-per-method '{"theta": {"alpha": 0.3}, "arima": {"p": 2, "d": 1, "q": 2}}'

Quantity

Parameter	Options	Description
`--quantity`	`price`, `return`, `volatility`	What to forecast

Notes:

return uses log returns (ln(close_t / close_{t-1})), which is often more stationary than prices.
volatility backtests compare predicted volatility vs realized volatility; use volatility methods like ewma, garch, har_rv.

Examples:

# Forecast returns instead of prices
mtdata-cli forecast_backtest_run EURUSD --quantity return

# Backtest volatility methods
mtdata-cli forecast_backtest_run EURUSD --quantity volatility --methods "ewma garch"

Trade Simulation

Parameter	Default	Description
`--slippage-bps`	0.0	Transaction cost in basis points (1 bp = 0.01%)
`--trade-threshold`	0.0	Minimum expected return to trigger a trade

Example with trading costs:

# Simulate 2 bps slippage per side (4 bps round-trip)
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
  --slippage-bps 2 --trade-threshold 0.0005

Preprocessing Options

Parameter	Description
`--denoise`	Denoising method (e.g., `ema`, `kalman`)
`--denoise-params`	Denoising parameters
`--features`	Feature engineering spec
`--dimred-method`	Dimensionality reduction (e.g., `pca`)
`--dimred-params`	Dim reduction parameters

Dimred methods supported by the forecasting pipeline: pca, tsne, selectkbest (requires scikit-learn).

Tip: for forecast_backtest_run, pass dimred params as JSON:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods mlf_lightgbm \
  --features '{"include":["close","volume"]}' \
  --dimred-method pca --dimred-params '{"n_components":5}'

Example with denoising:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \       
  --denoise ema --denoise-params "alpha=0.2"

Understanding Output

Aggregate Metrics

{
  "results": {
    "theta": {
      "success": true,
      "avg_mae": 0.00142,
      "avg_rmse": 0.00186,
      "avg_directional_accuracy": 0.583,
      "win_rate": 0.625,
      "successful_tests": 20,
      "num_tests": 20
    }
  }
}

Metric	Description	Good Value
`avg_mae`	Mean Absolute Error (average)	Lower is better
`avg_rmse`	Root Mean Squared Error (average)	Lower is better
`avg_directional_accuracy`	% of correct direction predictions	> 0.55
`win_rate`	% of profitable trades	> 0.50
`successful_tests`	Tests that completed without error	= num_tests

Trading Performance Metrics

When slippage-bps or trade-threshold is set:

{
  "metrics": {
    "avg_return_per_trade": 0.00082,
    "win_rate": 0.625,
    "sharpe_ratio": 1.45,
    "max_drawdown": 0.034,
    "calmar_ratio": 2.12,
    "cumulative_return": 0.0164,
    "annual_return": 0.087,
    "num_trades": 20,
    "trades_per_year": 365
  }
}

Metric	Description	Good Value
`sharpe_ratio`	Risk-adjusted return	> 1.0
`max_drawdown`	Largest peak-to-trough decline	< 0.10 (10%)
`calmar_ratio`	Annual return / max drawdown	> 1.0
`cumulative_return`	Total return over test period	> 0
`win_rate`	Fraction of profitable trades	> 0.50

Per-Anchor Details

Add --json to see individual test results:

{
  "details": [
    {
      "anchor": "2025-12-15 14:00",
      "success": true,
      "mae": 0.00128,
      "rmse": 0.00165,
      "directional_accuracy": 0.636,
      "forecast": [1.0542, 1.0545, ...],
      "actual": [1.0540, 1.0548, ...],
      "entry_price": 1.0538,
      "exit_price": 1.0552,
      "expected_return": 0.00094,
      "position": "long",
      "trade_return": 0.00133
    }
  ]
}

Method Comparison

Default Methods

If --methods is not specified, the backtest uses available classical methods:

naive, drift, seasonal_naive, theta, fourier_ols
Plus sf_autoarima, sf_theta if statsforecast is installed

Comparing Categories

Fast baselines:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
  --methods "naive drift theta seasonal_naive" --steps 30

Statistical models:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
  --methods "sf_autoarima sf_autoets sf_theta" --steps 30

ML models:

mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
  --methods "mlf_lightgbm mlf_rf" --steps 20

Foundation models:

mtdata-cli forecast_backtest_run EURUSD --horizon 24 \
  --methods "chronos2 chronos_bolt" --steps 15

Parameter Optimization

Genetic Search (`forecast_tune_genetic`)

Automatically find optimal parameters for a forecasting method:

mtdata-cli forecast_tune_genetic EURUSD --timeframe H1 --method theta \
  --horizon 12 --steps 20 --spacing 10 \
  --metric avg_rmse --mode min \
  --population 20 --generations 10

Genetic Parameters

Parameter	Default	Description
`--method`	(required)	Method to optimize
`--metric`	`avg_rmse`	Metric to optimize
`--mode`	`min`	`min` to minimize, `max` to maximize
`--population`	12	Population size per generation
`--generations`	10	Number of generations
`--crossover-rate`	0.6	Probability of crossover
`--mutation-rate`	0.3	Probability of mutation
`--seed`	None	Random seed for reproducibility

Available Metrics

Metric	Mode	Description
`avg_mae`	min	Minimize mean absolute error
`avg_rmse`	min	Minimize root mean squared error
`avg_directional_accuracy`	max	Maximize direction accuracy
`win_rate`	max	Maximize profitable trades
`sharpe_ratio`	max	Maximize risk-adjusted return
`calmar_ratio`	max	Maximize return/drawdown ratio

Custom Search Space

Define which parameters to search:

mtdata-cli forecast_tune_genetic EURUSD --method theta \
  --search-space '{"seasonality": {"type": "int", "min": 12, "max": 48}}'

Search space format:

{
  "param_name": {
    "type": "int" | "float" | "categorical",
    "min": 0,
    "max": 100,
    "log": false,          // For float: use log scale
    "choices": [...]       // For categorical
  }
}

Default Search Spaces

Each method has sensible defaults. Examples:

Method	Parameters Searched
`theta`	alpha (0.05-0.5)
`arima`	p (0-3), d (0-2), q (0-3)
`fourier_ols`	m (8-96), K (1-6), trend (true/false)
`sf_autoarima`	seasonality, stepwise, d, D
`mlf_lightgbm`	n_estimators, learning_rate, num_leaves, max_depth

Practical Examples

Example 1: Find Best Method for Scalping

# Short horizon, tight spacing
mtdata-cli forecast_backtest_run EURUSD --timeframe M5 --horizon 6 \
  --methods "naive theta fourier_ols sf_autoarima" \
  --steps 50 --spacing 12 \
  --slippage-bps 1 --trade-threshold 0.0003

What to look for:

Highest win_rate with positive avg_trade_return
Low max_drawdown
sharpe_ratio > 1.0

Example 2: Optimize Theta for Swing Trading

# Step 1: Find optimal alpha
mtdata-cli forecast_tune_genetic EURUSD --timeframe H4 --method theta \
  --horizon 48 --steps 30 --spacing 24 \
  --metric sharpe_ratio --mode max \
  --population 20 --generations 15

# Step 2: Backtest with optimal params
mtdata-cli forecast_backtest_run EURUSD --timeframe H4 --horizon 48 \
  --methods theta --params "alpha=0.25" \
  --steps 50 --slippage-bps 2

Example 3: Compare Volatility Methods

mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
  --quantity volatility \
  --methods "ewma parkinson garch har_rv" \
  --steps 30 --spacing 24

Output interpretation:

forecast_sigma: Predicted volatility
realized_sigma: Actual volatility that occurred
mae: Error between forecast and realized

Example 4: Robust Testing with Denoising

# Test if denoising improves accuracy
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
  --steps 30 --denoise ema --denoise-params "alpha=0.3"

# Compare to non-denoised
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
  --steps 30

Example 5: Walk-Forward Optimization

Simulate real-world model updates:

# Period 1: Optimize on first 6 months
mtdata-cli forecast_tune_genetic EURUSD --method theta --horizon 12 \
  --steps 50 --spacing 24 --metric avg_rmse

# Record best params, then test on next 3 months with those params
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
  --params "seasonality=24" --steps 30 --spacing 24

# Repeat: re-optimize, test out-of-sample

Interpreting Results

Good Results Checklist

✅ avg_rmse is small relative to price volatility
✅ avg_directional_accuracy > 0.55 (better than random)
✅ win_rate > 0.50 with positive avg_trade_return
✅ sharpe_ratio > 1.0
✅ max_drawdown < 10-15%
✅ Results consistent across different spacing values

Warning Signs

⚠️ Very high accuracy on backtests but poor live results → overfitting
⚠️ successful_tests << num_tests → method fails frequently
⚠️ avg_rmse much larger than avg_mae → outlier errors
⚠️ max_drawdown > 20% → high risk
⚠️ Results vary wildly with small parameter changes → unstable

Avoiding Overfitting

Use enough test points: steps ≥ 20 for statistical significance
Test across timeframes: Method should work on H1, H4, D1
Test across symbols: Don't optimize for a single pair
Out-of-sample validation: Reserve recent data for final test
Realistic costs: Include slippage-bps and trade-threshold

Performance Tips

Speed Optimization

Reduce steps for initial screening:

--steps 10 --spacing 30  # Quick check
--steps 50 --spacing 10  # Full validation

Use fast methods first:
- naive, theta, seasonal_naive are instant
- sf_autoarima, chronos2 are slower

Limit genetic search:

--population 15 --generations 8  # Quick
--population 30 --generations 20 # Thorough

Parallelization

Run multiple backtests in parallel (different terminals):

# Terminal 1
mtdata-cli forecast_backtest_run EURUSD --methods theta --steps 30

# Terminal 2
mtdata-cli forecast_backtest_run GBPUSD --methods theta --steps 30

Quick Reference

Task	Command
Compare methods	`mtdata-cli forecast_backtest_run EURUSD --methods "theta arima analog" --steps 20`
With trading costs	`--slippage-bps 2 --trade-threshold 0.0005`
Volatility backtest	`--quantity volatility --methods "ewma garch"`
With denoising	`--denoise ema --denoise-params "alpha=0.2"`
Optimize params	`mtdata-cli forecast_tune_genetic EURUSD --method theta --metric avg_rmse`
JSON output	`--json`

FilesExpand file tree

BACKTESTING.md

Latest commit

History