CVNTrade — Tuning Protocol¶
Version: 1.0 Date: 2026-04-14 Issue: #499 Governing ADR: ADR-56 (every change A/B testable by design)
1. Purpose¶
This document defines the systematic process by which every parameter of the CVNTrade ML trading pipeline is selected, validated, and locked. No parameter is set by intuition, copied from another system, or left at a library default. Every choice is empirically validated through controlled ablation testing.
Audience: Anyone who needs to understand WHY the pipeline is configured the way it is — engineers, auditors, investors, regulators.
Guarantee: For every parameter in this document, there is: 1. A hypothesis explaining why this value was chosen 2. An FTF ablation run proving it's the best option 3. A statistical test (BH-corrected p < 0.05) or an explicit "no significant difference" verdict 4. A committee review validating the decision
Companion plan: see F1_BUY_BOOST_PLAN.md for the active 13-track plan to break the chronic f1_buy 0.40-0.46 plateau (committee-approved 2026-04-27 round 3, session 9d4942cb).
2. Methodology — Lock-and-Advance¶
Principle¶
Test ONE factor at a time (ceteris paribus). All other parameters are held at their locked baseline value. This isolates the effect of each change.
Process per factor¶
1. Define variants → env var + FTF factor (ADR-56)
2. Run FTF ablation → 5 folds × 5 cryptos × 3 costs × N variants
3. Analyze results → Sortino, CI, pairwise BH comparison
4. Committee review → score ≥ 8 to proceed
5. Lock winner → lock_winner(factor, variant)
6. Update BASE_ENV → winner becomes new baseline
7. Advance to next factor → all subsequent tests use updated baseline
Statistical Standards¶
| Criterion | Threshold |
|---|---|
| Significance | BH-corrected p < 0.05 |
| Effect size | Cohen's d reported (no minimum, but d < 0.2 = negligible) |
| Power | ≥ 63 trades per variant (d=0.5, α=5%, power=80%) |
| Minimum trades per fold | ≥ 30 (below = underpowered warning) |
| Confidence intervals | Bootstrap 95% CI on all metrics |
| Outlier protection | Sortino capped ±20, PF capped 50, runs < 3 trades excluded |
If no significant difference¶
Lock the simplest/cheapest variant (Occam's razor). Document "no significant difference" — the test is still valuable because it proves the parameter doesn't matter.
3. Protocol Phases¶
Phase 0 — Calibration Baseline¶
Purpose: Establish the starting point. Verify the pipeline works end-to-end.
| Factor | Variants tested | Winner | Evidence |
|---|---|---|---|
calibration |
none, isotonic, platt | isotonic | No significant difference (p=0.96). All three viable. |
Decision: Lock isotonic (user selection). Calibration has negligible impact on Sortino.
Phase 1a — Data Foundation¶
Purpose: How we prepare the data BEFORE training. These choices affect every downstream component.
| Factor | Question | Variants | Env var |
|---|---|---|---|
timeframe |
What candle resolution? | 5m, 15m, 30m, 1h | CVN_TIMEFRAME |
fold_size |
How long is each training window? | 6m, 9m, 12m, 18m | CVN_TRAIN_WINDOW_MONTHS |
n_features |
How many features does the model see? | top_30, top_50, top_100, full | CVN_MAX_FEATURES |
atr_period |
ATR lookback for label generation? | 10, 14, 20 bars | CVN_ATR_PERIOD |
purge_embargo |
Gap between train/test to prevent leakage? | various purge/embargo combos | CVN_PURGE_BARS, CVN_EMBARGO_BARS |
Status: ✅ LOCKED — Winners applied to BASE_ENV (2026-04-16)
Why these matter:
- timeframe: Determines the granularity of patterns the model can learn. 5m = noisy but granular. 1h = smoother but fewer samples.
- fold_size: Short folds = more recent data but less training signal. Long folds = more data but older patterns that may no longer hold.
- n_features: Too few = model can't learn complex patterns. Too many = overfitting risk (mitigated by XGBoost regularization).
- atr_period: Controls the TP/SL levels in the triple barrier. Shorter = more responsive to recent volatility. Longer = more stable.
- purge_embargo: Prevents label leakage between train and test. Too small = leakage risk. Too large = wasted data.
Phase 1b — Training Core¶
Purpose: How we configure the model training process itself. The highest-leverage phase.
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
cusum_training_mode |
Do we filter data before training? | disabled, relaxed_1_5, legacy_3_0 | CVN_CUSUM_TRAINING_MODE |
Critical: CUSUM removes 95% of training data. Disabling it gives 20× more samples. |
class_balancing |
Do we reweight minority classes? | OFF, ON | CVN_CLASS_BALANCING |
70% of labels are HOLD. Without balancing, model defaults to HOLD and misses BUY signals. |
hpo_objective |
What does the optimizer maximize? | fbeta_buy, precision_recall_auc, f1_macro, sortino_net | CVN_HPO_OBJECTIVE |
Classification metrics ≠ trading profit. sortino_net optimizes what we actually care about. |
early_stopping |
When to stop training? | 50, 150, 300 rounds | CVN_EARLY_STOPPING_ROUNDS |
Too early = undertrained. Too late = overfit. |
hpo_budget |
How many HPO trials? | 15, 30, 50 | CVN_HPO_N_TRIALS |
More trials = better HP search but slower. Diminishing returns after ~30. |
Status: NEXT (Sprint 1 fixes deployed, cusum_training_mode triggered)
Market hypothesis behind these choices:
The system targets short-term mean-reversion at regime transition points in DeFi altcoins. CUSUM detects regime shifts. The ML model predicts which transitions will mean-revert profitably. The triple barrier captures the reversion (TP) or limits the loss (SL).
For this to work, the model needs: - Enough training data to learn transition patterns (→ relax CUSUM during training) - Balanced exposure to BUY signals (→ class balancing) - Optimization for trading profit, not classification accuracy (→ sortino_net)
Phase 2 — Model Architecture¶
Purpose: What type of model and classification scheme.
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
classification_mode |
2-class or 3-class? | 3class, binary_balanced, binary_precision | CVN_BINARY_CLASSIFICATION |
3-class wastes capacity on SELL (we only go long). Binary focuses 100% on the BUY decision. |
model_type |
Which ML algorithm? | xgboost, lightgbm, catboost | CVN_MODEL_TYPE |
Different inductive biases. XGBoost = baseline. LightGBM = faster. CatBoost = better on categoricals. |
objective_beta |
Precision/recall trade-off? | β=0.5, 1.0, 2.0 | CVN_BUY_BETA |
β<1 favors precision (fewer but better trades). β>1 favors recall (more trades). |
Status: PLANNED (after Phase 1b locked)
Phase 3 — Signal Generation¶
Purpose: How the model's predictions are filtered before becoming trade signals.
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
meta_labeling |
Secondary model validates primary? | OFF, ON_03, ON_05, ON_07 | CVN_USE_META_LABEL |
Meta-label can filter false positives but adds complexity and requires separate training. |
cusum_threshold |
How sensitive is the regime detector? | h=2.0, 3.0, 5.0 | CVN_CUSUM_THRESHOLD_H |
Lower h = more events (more trades, more noise). Higher h = fewer events (fewer trades, cleaner). |
adaptive_event_engine |
Dynamic CUSUM threshold? | OFF, ON | CVN_ADAPTIVE_EVENT_ENGINE |
Static CUSUM doesn't adapt to regime changes. Adaptive adjusts h based on rolling volatility. |
confidence_threshold |
Minimum model confidence to trade? | 0.3, 0.4, 0.5, 0.6 | CVN_THRESHOLD_BUY |
Lower = more trades (higher recall). Higher = fewer but better trades (higher precision). |
Status: PLANNED
Phase 4 — Execution Rules¶
Purpose: How we structure each trade (entry, exit, time limit).
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
signal_mode |
Instant or confirmed execution? | ldp (instant), legacy_confirm (2-candle) | CVN_USE_LDP_PIPELINE |
LdP = faster execution, no missed trades. Legacy = confirmation reduces false signals but delays entry. |
triple_barrier |
SL/TP/Horizon settings? | Various ATR multiplier combos | CVN_SL_MULT, CVN_TP_MULT, CVN_HORIZON_HOURS |
Defines the risk/reward of each trade. Wider SL = fewer stops but larger losses. Higher TP = larger wins but fewer hits. |
Status: PLANNED
Phase 5 — Signal Filters¶
Purpose: Post-inference filters that gate which signals become trades.
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
trend_filter |
Only trade with the trend? | OFF, ON_EMA20, ON_EMA50 | CVN_USE_TREND_FILTER |
Prevents counter-trend trades. May reduce drawdown but also reduces opportunity. |
regime_filter |
Block hostile regimes? | OFF, ON | CVN_USE_REGIME_FILTER |
Prevents trading during high-volatility crashes. May miss recovery bounces. |
cooldown_policy |
Minimum time between trades? | none, 5min, 15min | CVN_TRADE_COOLDOWN_SECONDS |
Prevents overtrading after stops. Reduces emotional/revenge trading patterns in the model. |
concurrency_limit |
Max simultaneous positions? | 1, 2, 3 | CVN_MAX_CONCURRENT |
1 = concentrated bets (higher Sortino variance). 3 = diversified (lower variance but lower per-trade impact). |
Default policy: All filters disabled by default unless FTF ablation proves they improve Sortino (BH p < 0.05). Committee approval required to enable (ADR-52).
Status: PLANNED
Phase 6 — Cost & Risk¶
Purpose: How we model transaction costs and manage portfolio risk.
| Factor | Question | Variants | Env var | Why it matters |
|---|---|---|---|---|
cost_model |
Base transaction fee? | 10, 15, 30 bps | CVN_TRADE_FEE_BPS |
Lower cost = more trades profitable. 15 bps is realistic for DeFi perps (maker+taker). |
slippage_model |
How much slippage? | none, linear, nonlinear | CVN_SLIPPAGE_IMPACT_FACTOR |
Nonlinear: base + impact × √(size/volume). More realistic for illiquid DeFi tokens. |
kelly_sizing |
Position sizing method? | OFF, half_kelly, full_kelly | CVN_USE_KELLY |
Kelly maximizes long-term growth. Half-Kelly is more conservative (lower drawdown). |
Status: PLANNED
Phase 7 — Operations¶
Purpose: Runtime behavior and safety mechanisms.
| Factor | Question | Variants | Env var |
|---|---|---|---|
drift_detection |
Monitor for model degradation? | OFF, ON | CVN_DRIFT_ACTION |
system_status |
Active trading or shadow mode? | active, shadow | CVN_SYSTEM_STATUS |
Status: PLANNED
Phase 8 — Holdout Validation¶
Purpose: Final out-of-sample validation before production deployment.
Process: 1. Run the fully locked configuration on holdout fold (fold 1, most recent 2 months) 2. Compare to baselines: buy-and-hold, random entry, naive (ADR-29) 3. Verify: Sortino > 1.5× random, net expectancy > 0, trades ≥ 30 4. Committee review (score ≥ 8) 5. Promote model to Production in MLflow (ADR-2: manual only) 6. Staged rollout: shadow → canary (10%) → full (ADR-42: atomic per crypto)
4. Current Baseline (BASE_ENV)¶
Every parameter below is the CURRENT production configuration. Changes require FTF validation + committee approval.
Data Preparation¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| Timeframe | 1h | Phase 1a | ✅ Locked |
| Train window | 18 months | Phase 1a | ✅ Locked |
| History depth | 24 months | Fixed | — |
| Feature count | top 50 | Phase 1a | ✅ Locked |
| ATR period | 20 bars | Phase 1a | ✅ Locked |
| Purge bars | 20 (strict) | Phase 1a | ✅ Locked |
| Embargo bars | 10 (strict) | Phase 1a | ✅ Locked |
| CUSUM training mode | enabled (legacy) | — | Phase 1b (testing) |
Model Training¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| Model type | XGBoost | — | Phase 2 |
| Classification | 3-class (SELL/HOLD/BUY) | — | Phase 2 |
| HPO objective | precision_recall_auc | — | Phase 1b (testing) |
| HPO trials | 30 | — | Phase 1b |
| Early stopping | 150 rounds | — | Phase 1b |
| Class balancing | ON | ADR-46 | Phase 1b |
| Calibration | isotonic | Phase 0 | ✅ Locked |
| Binary classification | OFF (3-class) | — | Phase 2 |
Signal Generation¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| CUSUM filter (inference) | ON, h=3.0σ | — | Phase 3 |
| Meta-label | OFF | — | Phase 3 |
| Adaptive event engine | OFF | — | Phase 3 |
| Confidence threshold | 0.4 (HPO-tuned) | — | Phase 3 |
Execution¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| Pipeline mode | LdP (instant) | Fixed | — |
| SL multiplier | 1.5 ATR | — | Phase 4 |
| TP multiplier | 3.0 ATR | — | Phase 4 |
| Horizon | 5 hours | — | Phase 4 |
Filters¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| Trend filter | OFF | — | Phase 5 |
| Regime filter | OFF | — | Phase 5 |
| Meta-label filter | OFF | — | Phase 5 |
| Concurrency | max 1 | — | Phase 5 |
| Cooldown | 0s (none) | — | Phase 5 |
Cost & Risk¶
| Parameter | Value | Locked by | Phase |
|---|---|---|---|
| Trade fee | 15 bps | — | Phase 6 |
| Slippage model | nonlinear (impact=0.001) | — | Phase 6 |
| Kelly sizing | OFF | — | Phase 6 |
| Max daily drawdown | 10% | Fixed | — |
5. Audit Trail¶
Every locked parameter has a traceable chain:
FTF Run (run_id) → PostgreSQL (finetune_results)
→ FTF Report (committee/reports/)
→ Committee Session (committee/sessions/{id}_committee.json)
→ lock_winner() call (results/ftf_locked_config.json)
→ BASE_ENV update (PR with CodeRabbit review)
→ Helm deploy (CI/CD)
All data is retained. Any decision can be replayed.
6. Glossary¶
| Term | Definition |
|---|---|
| FTF | Fine-Tuning Framework — the ablation testing engine |
| ADR-56 | Every pipeline change gated by env var + FTF factor |
| Lock | A parameter value validated by FTF and committee, set as new baseline |
| BASE_ENV | The current locked configuration in ablation_matrix.py |
| Ceteris paribus | "All else equal" — only ONE factor varies per test |
| BH correction | Benjamini-Hochberg false discovery rate control for multiple comparisons |
| Sortino | Risk-adjusted return (downside deviation only). Primary trading metric. |
| CUSUM | Cumulative Sum control chart — detects regime changes in volatility |
| Triple barrier | Labeling method: TP hit → BUY, SL hit → SELL, timeout → HOLD |
| Walk-forward | OOS validation: train on past, test on future, slide window forward |
| Holdout | Final validation fold never seen during ablation (fold 1, most recent) |