Backtesting validates forecast accuracy by testing models on historical data. This guide covers rolling-origin backtests, performance metrics, and parameter optimization.
Related:
- GLOSSARY.md — Definitions of MAE, RMSE, Sharpe ratio, etc.
- FORECAST.md — Forecasting methods
- FORECAST_GENERATE.md — Detailed forecast options
- VOLATILITY.md — Volatility forecasting
Backtesting answers: "How well would this forecast method have performed on past data?"
Instead of testing on the same data used for training (overfitting), backtesting:
- Picks historical "anchor" points
- At each anchor, generates a forecast using only data available at that time
- Compares the forecast to what actually happened
- Aggregates error metrics across all test points
The standard backtesting approach in mtdata:
Timeline: [----history----][forecast horizon]
^
anchor
Parameters:
- steps: Number of anchor points to test
- spacing: Bars between anchor points
- horizon: How far ahead each forecast predicts
Example: steps=20, spacing=10, horizon=12 creates 20 test points, each 10 bars apart, each forecasting 12 bars ahead.
mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
--methods "theta sf_autoarima analog" --steps 20 --spacing 10mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
--methods theta --params "alpha=0.3" --steps 30mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
--quantity volatility --methods "ewma parkinson garch" --steps 20mtdata-cli forecast_backtest_run <SYMBOL> [OPTIONS]| Parameter | Default | Description |
|---|---|---|
symbol |
(required) | Trading symbol (e.g., EURUSD) |
--timeframe |
H1 | Candle timeframe |
--horizon |
12 | Bars to forecast at each anchor |
--steps |
5 | Number of test anchors |
--spacing |
20 | Bars between anchors |
--methods |
auto | Space or comma-separated method names |
| Parameter | Description |
|---|---|
--params |
Parameters applied to all methods (JSON or k=v) |
--params-per-method |
Per-method parameters: {"theta": {"seasonality": 24}} |
Example with per-method params:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
--methods "theta arima" \
--params-per-method '{"theta": {"alpha": 0.3}, "arima": {"p": 2, "d": 1, "q": 2}}'| Parameter | Options | Description |
|---|---|---|
--quantity |
price, return, volatility |
What to forecast |
Notes:
returnuses log returns (ln(close_t / close_{t-1})), which is often more stationary than prices.volatilitybacktests compare predicted volatility vs realized volatility; use volatility methods likeewma,garch,har_rv.
Examples:
# Forecast returns instead of prices
mtdata-cli forecast_backtest_run EURUSD --quantity return
# Backtest volatility methods
mtdata-cli forecast_backtest_run EURUSD --quantity volatility --methods "ewma garch"| Parameter | Default | Description |
|---|---|---|
--slippage-bps |
0.0 | Transaction cost in basis points (1 bp = 0.01%) |
--trade-threshold |
0.0 | Minimum expected return to trigger a trade |
Example with trading costs:
# Simulate 2 bps slippage per side (4 bps round-trip)
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
--slippage-bps 2 --trade-threshold 0.0005| Parameter | Description |
|---|---|
--denoise |
Denoising method (e.g., ema, kalman) |
--denoise-params |
Denoising parameters |
--features |
Feature engineering spec |
--dimred-method |
Dimensionality reduction (e.g., pca) |
--dimred-params |
Dim reduction parameters |
Dimred methods supported by the forecasting pipeline: pca, tsne, selectkbest (requires scikit-learn).
Tip: for forecast_backtest_run, pass dimred params as JSON:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods mlf_lightgbm \
--features '{"include":["close","volume"]}' \
--dimred-method pca --dimred-params '{"n_components":5}'Example with denoising:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
--denoise ema --denoise-params "alpha=0.2"{
"results": {
"theta": {
"success": true,
"avg_mae": 0.00142,
"avg_rmse": 0.00186,
"avg_directional_accuracy": 0.583,
"win_rate": 0.625,
"successful_tests": 20,
"num_tests": 20
}
}
}| Metric | Description | Good Value |
|---|---|---|
avg_mae |
Mean Absolute Error (average) | Lower is better |
avg_rmse |
Root Mean Squared Error (average) | Lower is better |
avg_directional_accuracy |
% of correct direction predictions | > 0.55 |
win_rate |
% of profitable trades | > 0.50 |
successful_tests |
Tests that completed without error | = num_tests |
When slippage-bps or trade-threshold is set:
{
"metrics": {
"avg_return_per_trade": 0.00082,
"win_rate": 0.625,
"sharpe_ratio": 1.45,
"max_drawdown": 0.034,
"calmar_ratio": 2.12,
"cumulative_return": 0.0164,
"annual_return": 0.087,
"num_trades": 20,
"trades_per_year": 365
}
}| Metric | Description | Good Value |
|---|---|---|
sharpe_ratio |
Risk-adjusted return | > 1.0 |
max_drawdown |
Largest peak-to-trough decline | < 0.10 (10%) |
calmar_ratio |
Annual return / max drawdown | > 1.0 |
cumulative_return |
Total return over test period | > 0 |
win_rate |
Fraction of profitable trades | > 0.50 |
Add --json to see individual test results:
{
"details": [
{
"anchor": "2025-12-15 14:00",
"success": true,
"mae": 0.00128,
"rmse": 0.00165,
"directional_accuracy": 0.636,
"forecast": [1.0542, 1.0545, ...],
"actual": [1.0540, 1.0548, ...],
"entry_price": 1.0538,
"exit_price": 1.0552,
"expected_return": 0.00094,
"position": "long",
"trade_return": 0.00133
}
]
}If --methods is not specified, the backtest uses available classical methods:
naive,drift,seasonal_naive,theta,fourier_ols- Plus
sf_autoarima,sf_thetaif statsforecast is installed
Fast baselines:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
--methods "naive drift theta seasonal_naive" --steps 30Statistical models:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
--methods "sf_autoarima sf_autoets sf_theta" --steps 30ML models:
mtdata-cli forecast_backtest_run EURUSD --horizon 12 \
--methods "mlf_lightgbm mlf_rf" --steps 20Foundation models:
mtdata-cli forecast_backtest_run EURUSD --horizon 24 \
--methods "chronos2 chronos_bolt" --steps 15Automatically find optimal parameters for a forecasting method:
mtdata-cli forecast_tune_genetic EURUSD --timeframe H1 --method theta \
--horizon 12 --steps 20 --spacing 10 \
--metric avg_rmse --mode min \
--population 20 --generations 10| Parameter | Default | Description |
|---|---|---|
--method |
(required) | Method to optimize |
--metric |
avg_rmse |
Metric to optimize |
--mode |
min |
min to minimize, max to maximize |
--population |
12 | Population size per generation |
--generations |
10 | Number of generations |
--crossover-rate |
0.6 | Probability of crossover |
--mutation-rate |
0.3 | Probability of mutation |
--seed |
None | Random seed for reproducibility |
| Metric | Mode | Description |
|---|---|---|
avg_mae |
min | Minimize mean absolute error |
avg_rmse |
min | Minimize root mean squared error |
avg_directional_accuracy |
max | Maximize direction accuracy |
win_rate |
max | Maximize profitable trades |
sharpe_ratio |
max | Maximize risk-adjusted return |
calmar_ratio |
max | Maximize return/drawdown ratio |
Define which parameters to search:
mtdata-cli forecast_tune_genetic EURUSD --method theta \
--search-space '{"seasonality": {"type": "int", "min": 12, "max": 48}}'Search space format:
{
"param_name": {
"type": "int" | "float" | "categorical",
"min": 0,
"max": 100,
"log": false, // For float: use log scale
"choices": [...] // For categorical
}
}Each method has sensible defaults. Examples:
| Method | Parameters Searched |
|---|---|
theta |
alpha (0.05-0.5) |
arima |
p (0-3), d (0-2), q (0-3) |
fourier_ols |
m (8-96), K (1-6), trend (true/false) |
sf_autoarima |
seasonality, stepwise, d, D |
mlf_lightgbm |
n_estimators, learning_rate, num_leaves, max_depth |
# Short horizon, tight spacing
mtdata-cli forecast_backtest_run EURUSD --timeframe M5 --horizon 6 \
--methods "naive theta fourier_ols sf_autoarima" \
--steps 50 --spacing 12 \
--slippage-bps 1 --trade-threshold 0.0003What to look for:
- Highest
win_ratewith positiveavg_trade_return - Low
max_drawdown sharpe_ratio> 1.0
# Step 1: Find optimal alpha
mtdata-cli forecast_tune_genetic EURUSD --timeframe H4 --method theta \
--horizon 48 --steps 30 --spacing 24 \
--metric sharpe_ratio --mode max \
--population 20 --generations 15
# Step 2: Backtest with optimal params
mtdata-cli forecast_backtest_run EURUSD --timeframe H4 --horizon 48 \
--methods theta --params "alpha=0.25" \
--steps 50 --slippage-bps 2mtdata-cli forecast_backtest_run EURUSD --timeframe H1 --horizon 12 \
--quantity volatility \
--methods "ewma parkinson garch har_rv" \
--steps 30 --spacing 24Output interpretation:
forecast_sigma: Predicted volatilityrealized_sigma: Actual volatility that occurredmae: Error between forecast and realized
# Test if denoising improves accuracy
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
--steps 30 --denoise ema --denoise-params "alpha=0.3"
# Compare to non-denoised
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
--steps 30Simulate real-world model updates:
# Period 1: Optimize on first 6 months
mtdata-cli forecast_tune_genetic EURUSD --method theta --horizon 12 \
--steps 50 --spacing 24 --metric avg_rmse
# Record best params, then test on next 3 months with those params
mtdata-cli forecast_backtest_run EURUSD --horizon 12 --methods theta \
--params "seasonality=24" --steps 30 --spacing 24
# Repeat: re-optimize, test out-of-sample✅ avg_rmse is small relative to price volatility
✅ avg_directional_accuracy > 0.55 (better than random)
✅ win_rate > 0.50 with positive avg_trade_return
✅ sharpe_ratio > 1.0
✅ max_drawdown < 10-15%
✅ Results consistent across different spacing values
successful_tests << num_tests → method fails frequently
avg_rmse much larger than avg_mae → outlier errors
max_drawdown > 20% → high risk
- Use enough test points:
steps≥ 20 for statistical significance - Test across timeframes: Method should work on H1, H4, D1
- Test across symbols: Don't optimize for a single pair
- Out-of-sample validation: Reserve recent data for final test
- Realistic costs: Include
slippage-bpsandtrade-threshold
-
Reduce steps for initial screening:
--steps 10 --spacing 30 # Quick check --steps 50 --spacing 10 # Full validation
-
Use fast methods first:
naive,theta,seasonal_naiveare instantsf_autoarima,chronos2are slower
-
Limit genetic search:
--population 15 --generations 8 # Quick --population 30 --generations 20 # Thorough
Run multiple backtests in parallel (different terminals):
# Terminal 1
mtdata-cli forecast_backtest_run EURUSD --methods theta --steps 30
# Terminal 2
mtdata-cli forecast_backtest_run GBPUSD --methods theta --steps 30| Task | Command |
|---|---|
| Compare methods | mtdata-cli forecast_backtest_run EURUSD --methods "theta arima analog" --steps 20 |
| With trading costs | --slippage-bps 2 --trade-threshold 0.0005 |
| Volatility backtest | --quantity volatility --methods "ewma garch" |
| With denoising | --denoise ema --denoise-params "alpha=0.2" |
| Optimize params | mtdata-cli forecast_tune_genetic EURUSD --method theta --metric avg_rmse |
| JSON output | --json |
- GLOSSARY.md — MAE, RMSE, Sharpe ratio definitions
- FORECAST.md — Forecasting methods overview
- FORECAST_GENERATE.md — Forecast generation options
- DENOISING.md — Preprocessing options