CVNTrade — Tuning Protocol¶

Version: 1.0 Date: 2026-04-14 Issue: #499 Governing ADR: ADR-56 (every change A/B testable by design)

1. Purpose¶

This document defines the systematic process by which every parameter of the CVNTrade ML trading pipeline is selected, validated, and locked. No parameter is set by intuition, copied from another system, or left at a library default. Every choice is empirically validated through controlled ablation testing.

Audience: Anyone who needs to understand WHY the pipeline is configured the way it is — engineers, auditors, investors, regulators.

Guarantee: For every parameter in this document, there is: 1. A hypothesis explaining why this value was chosen 2. An FTF ablation run proving it's the best option 3. A statistical test (BH-corrected p < 0.05) or an explicit "no significant difference" verdict 4. A committee review validating the decision

Companion plan: see F1_BUY_BOOST_PLAN.md for the active 13-track plan to break the chronic f1_buy 0.40-0.46 plateau (committee-approved 2026-04-27 round 3, session 9d4942cb).

2. Methodology — Lock-and-Advance¶

Principle¶

Test ONE factor at a time (ceteris paribus). All other parameters are held at their locked baseline value. This isolates the effect of each change.

Process per factor¶

1. Define variants         → env var + FTF factor (ADR-56)
2. Run FTF ablation        → 5 folds × 5 cryptos × 3 costs × N variants
3. Analyze results         → Sortino, CI, pairwise BH comparison
4. Committee review        → score ≥ 8 to proceed
5. Lock winner             → lock_winner(factor, variant)
6. Update BASE_ENV         → winner becomes new baseline
7. Advance to next factor  → all subsequent tests use updated baseline

Statistical Standards¶

Criterion	Threshold
Significance	BH-corrected p < 0.05
Effect size	Cohen's d reported (no minimum, but d < 0.2 = negligible)
Power	≥ 63 trades per variant (d=0.5, α=5%, power=80%)
Minimum trades per fold	≥ 30 (below = underpowered warning)
Confidence intervals	Bootstrap 95% CI on all metrics
Outlier protection	Sortino capped ±20, PF capped 50, runs < 3 trades excluded

If no significant difference¶

Lock the simplest/cheapest variant (Occam's razor). Document "no significant difference" — the test is still valuable because it proves the parameter doesn't matter.

3. Protocol Phases¶

Phase 0 — Calibration Baseline¶

Purpose: Establish the starting point. Verify the pipeline works end-to-end.

Factor	Variants tested	Winner	Evidence
`calibration`	none, isotonic, platt	isotonic	No significant difference (p=0.96). All three viable.

Decision: Lock isotonic (user selection). Calibration has negligible impact on Sortino.

Phase 1a — Data Foundation¶

Purpose: How we prepare the data BEFORE training. These choices affect every downstream component.

Factor	Question	Variants	Env var
`timeframe`	What candle resolution?	5m, 15m, 30m, 1h	`CVN_TIMEFRAME`
`fold_size`	How long is each training window?	6m, 9m, 12m, 18m	`CVN_TRAIN_WINDOW_MONTHS`
`n_features`	How many features does the model see?	top_30, top_50, top_100, full	`CVN_MAX_FEATURES`
`atr_period`	ATR lookback for label generation?	10, 14, 20 bars	`CVN_ATR_PERIOD`
`purge_embargo`	Gap between train/test to prevent leakage?	various purge/embargo combos	`CVN_PURGE_BARS`, `CVN_EMBARGO_BARS`

Status: ✅ LOCKED — Winners applied to BASE_ENV (2026-04-16)

Why these matter: - timeframe: Determines the granularity of patterns the model can learn. 5m = noisy but granular. 1h = smoother but fewer samples. - fold_size: Short folds = more recent data but less training signal. Long folds = more data but older patterns that may no longer hold. - n_features: Too few = model can't learn complex patterns. Too many = overfitting risk (mitigated by XGBoost regularization). - atr_period: Controls the TP/SL levels in the triple barrier. Shorter = more responsive to recent volatility. Longer = more stable. - purge_embargo: Prevents label leakage between train and test. Too small = leakage risk. Too large = wasted data.

Phase 1b — Training Core¶

Purpose: How we configure the model training process itself. The highest-leverage phase.

Factor	Question	Variants	Env var	Why it matters
`cusum_training_mode`	Do we filter data before training?	disabled, relaxed_1_5, legacy_3_0	`CVN_CUSUM_TRAINING_MODE`	Critical: CUSUM removes 95% of training data. Disabling it gives 20× more samples.
`class_balancing`	Do we reweight minority classes?	OFF, ON	`CVN_CLASS_BALANCING`	70% of labels are HOLD. Without balancing, model defaults to HOLD and misses BUY signals.
`hpo_objective`	What does the optimizer maximize?	fbeta_buy, precision_recall_auc, f1_macro, sortino_net	`CVN_HPO_OBJECTIVE`	Classification metrics ≠ trading profit. `sortino_net` optimizes what we actually care about.
`early_stopping`	When to stop training?	50, 150, 300 rounds	`CVN_EARLY_STOPPING_ROUNDS`	Too early = undertrained. Too late = overfit.
`hpo_budget`	How many HPO trials?	15, 30, 50	`CVN_HPO_N_TRIALS`	More trials = better HP search but slower. Diminishing returns after ~30.

Status: NEXT (Sprint 1 fixes deployed, cusum_training_mode triggered)

Market hypothesis behind these choices:

The system targets short-term mean-reversion at regime transition points in DeFi altcoins. CUSUM detects regime shifts. The ML model predicts which transitions will mean-revert profitably. The triple barrier captures the reversion (TP) or limits the loss (SL).

For this to work, the model needs: - Enough training data to learn transition patterns (→ relax CUSUM during training) - Balanced exposure to BUY signals (→ class balancing) - Optimization for trading profit, not classification accuracy (→ sortino_net)

Phase 2 — Model Architecture¶

Purpose: What type of model and classification scheme.

Factor	Question	Variants	Env var	Why it matters
`classification_mode`	2-class or 3-class?	3class, binary_balanced, binary_precision	`CVN_BINARY_CLASSIFICATION`	3-class wastes capacity on SELL (we only go long). Binary focuses 100% on the BUY decision.
`model_type`	Which ML algorithm?	xgboost, lightgbm, catboost	`CVN_MODEL_TYPE`	Different inductive biases. XGBoost = baseline. LightGBM = faster. CatBoost = better on categoricals.
`objective_beta`	Precision/recall trade-off?	β=0.5, 1.0, 2.0	`CVN_BUY_BETA`	β<1 favors precision (fewer but better trades). β>1 favors recall (more trades).

Status: PLANNED (after Phase 1b locked)

Phase 3 — Signal Generation¶

Purpose: How the model's predictions are filtered before becoming trade signals.

Factor	Question	Variants	Env var	Why it matters
`meta_labeling`	Secondary model validates primary?	OFF, ON_03, ON_05, ON_07	`CVN_USE_META_LABEL`	Meta-label can filter false positives but adds complexity and requires separate training.
`cusum_threshold`	How sensitive is the regime detector?	h=2.0, 3.0, 5.0	`CVN_CUSUM_THRESHOLD_H`	Lower h = more events (more trades, more noise). Higher h = fewer events (fewer trades, cleaner).
`adaptive_event_engine`	Dynamic CUSUM threshold?	OFF, ON	`CVN_ADAPTIVE_EVENT_ENGINE`	Static CUSUM doesn't adapt to regime changes. Adaptive adjusts h based on rolling volatility.
`confidence_threshold`	Minimum model confidence to trade?	0.3, 0.4, 0.5, 0.6	`CVN_THRESHOLD_BUY`	Lower = more trades (higher recall). Higher = fewer but better trades (higher precision).

Status: PLANNED

Phase 4 — Execution Rules¶

Purpose: How we structure each trade (entry, exit, time limit).

Factor	Question	Variants	Env var	Why it matters
`signal_mode`	Instant or confirmed execution?	ldp (instant), legacy_confirm (2-candle)	`CVN_USE_LDP_PIPELINE`	LdP = faster execution, no missed trades. Legacy = confirmation reduces false signals but delays entry.
`triple_barrier`	SL/TP/Horizon settings?	Various ATR multiplier combos	`CVN_SL_MULT`, `CVN_TP_MULT`, `CVN_HORIZON_HOURS`	Defines the risk/reward of each trade. Wider SL = fewer stops but larger losses. Higher TP = larger wins but fewer hits.

Status: PLANNED

Phase 5 — Signal Filters¶

Purpose: Post-inference filters that gate which signals become trades.

Factor	Question	Variants	Env var	Why it matters
`trend_filter`	Only trade with the trend?	OFF, ON_EMA20, ON_EMA50	`CVN_USE_TREND_FILTER`	Prevents counter-trend trades. May reduce drawdown but also reduces opportunity.
`regime_filter`	Block hostile regimes?	OFF, ON	`CVN_USE_REGIME_FILTER`	Prevents trading during high-volatility crashes. May miss recovery bounces.
`cooldown_policy`	Minimum time between trades?	none, 5min, 15min	`CVN_TRADE_COOLDOWN_SECONDS`	Prevents overtrading after stops. Reduces emotional/revenge trading patterns in the model.
`concurrency_limit`	Max simultaneous positions?	1, 2, 3	`CVN_MAX_CONCURRENT`	1 = concentrated bets (higher Sortino variance). 3 = diversified (lower variance but lower per-trade impact).

Default policy: All filters disabled by default unless FTF ablation proves they improve Sortino (BH p < 0.05). Committee approval required to enable (ADR-52).

Status: PLANNED

Phase 6 — Cost & Risk¶

Purpose: How we model transaction costs and manage portfolio risk.

Factor	Question	Variants	Env var	Why it matters
`cost_model`	Base transaction fee?	10, 15, 30 bps	`CVN_TRADE_FEE_BPS`	Lower cost = more trades profitable. 15 bps is realistic for DeFi perps (maker+taker).
`slippage_model`	How much slippage?	none, linear, nonlinear	`CVN_SLIPPAGE_IMPACT_FACTOR`	Nonlinear: `base + impact × √(size/volume)`. More realistic for illiquid DeFi tokens.
`kelly_sizing`	Position sizing method?	OFF, half_kelly, full_kelly	`CVN_USE_KELLY`	Kelly maximizes long-term growth. Half-Kelly is more conservative (lower drawdown).

Status: PLANNED

Phase 7 — Operations¶

Purpose: Runtime behavior and safety mechanisms.

Factor	Question	Variants	Env var
`drift_detection`	Monitor for model degradation?	OFF, ON	`CVN_DRIFT_ACTION`
`system_status`	Active trading or shadow mode?	active, shadow	`CVN_SYSTEM_STATUS`

Status: PLANNED

Phase 8 — Holdout Validation¶

Purpose: Final out-of-sample validation before production deployment.

Process: 1. Run the fully locked configuration on holdout fold (fold 1, most recent 2 months) 2. Compare to baselines: buy-and-hold, random entry, naive (ADR-29) 3. Verify: Sortino > 1.5× random, net expectancy > 0, trades ≥ 30 4. Committee review (score ≥ 8) 5. Promote model to Production in MLflow (ADR-2: manual only) 6. Staged rollout: shadow → canary (10%) → full (ADR-42: atomic per crypto)

4. Current Baseline (BASE_ENV)¶

Every parameter below is the CURRENT production configuration. Changes require FTF validation + committee approval.

Data Preparation¶

Parameter	Value	Locked by	Phase
Timeframe	1h	Phase 1a	✅ Locked
Train window	18 months	Phase 1a	✅ Locked
History depth	24 months	Fixed	—
Feature count	top 50	Phase 1a	✅ Locked
ATR period	20 bars	Phase 1a	✅ Locked
Purge bars	20 (strict)	Phase 1a	✅ Locked
Embargo bars	10 (strict)	Phase 1a	✅ Locked
CUSUM training mode	enabled (legacy)	—	Phase 1b (testing)

Model Training¶

Parameter	Value	Locked by	Phase
Model type	XGBoost	—	Phase 2
Classification	3-class (SELL/HOLD/BUY)	—	Phase 2
HPO objective	precision_recall_auc	—	Phase 1b (testing)
HPO trials	30	—	Phase 1b
Early stopping	150 rounds	—	Phase 1b
Class balancing	ON	ADR-46	Phase 1b
Calibration	isotonic	Phase 0	✅ Locked
Binary classification	OFF (3-class)	—	Phase 2

Signal Generation¶

Parameter	Value	Locked by	Phase
CUSUM filter (inference)	ON, h=3.0σ	—	Phase 3
Meta-label	OFF	—	Phase 3
Adaptive event engine	OFF	—	Phase 3
Confidence threshold	0.4 (HPO-tuned)	—	Phase 3

Execution¶

Parameter	Value	Locked by	Phase
Pipeline mode	LdP (instant)	Fixed	—
SL multiplier	1.5 ATR	—	Phase 4
TP multiplier	3.0 ATR	—	Phase 4
Horizon	5 hours	—	Phase 4

Filters¶

Parameter	Value	Locked by	Phase
Trend filter	OFF	—	Phase 5
Regime filter	OFF	—	Phase 5
Meta-label filter	OFF	—	Phase 5
Concurrency	max 1	—	Phase 5
Cooldown	0s (none)	—	Phase 5

Cost & Risk¶

Parameter	Value	Locked by	Phase
Trade fee	15 bps	—	Phase 6
Slippage model	nonlinear (impact=0.001)	—	Phase 6
Kelly sizing	OFF	—	Phase 6
Max daily drawdown	10%	Fixed	—

5. Audit Trail¶

Every locked parameter has a traceable chain:

FTF Run (run_id) → PostgreSQL (finetune_results)
    → FTF Report (committee/reports/)
    → Committee Session (committee/sessions/{id}_committee.json)
    → lock_winner() call (results/ftf_locked_config.json)
    → BASE_ENV update (PR with CodeRabbit review)
    → Helm deploy (CI/CD)

All data is retained. Any decision can be replayed.

6. Glossary¶

Term	Definition
FTF	Fine-Tuning Framework — the ablation testing engine
ADR-56	Every pipeline change gated by env var + FTF factor
Lock	A parameter value validated by FTF and committee, set as new baseline
BASE_ENV	The current locked configuration in `ablation_matrix.py`
Ceteris paribus	"All else equal" — only ONE factor varies per test
BH correction	Benjamini-Hochberg false discovery rate control for multiple comparisons
Sortino	Risk-adjusted return (downside deviation only). Primary trading metric.
CUSUM	Cumulative Sum control chart — detects regime changes in volatility
Triple barrier	Labeling method: TP hit → BUY, SL hit → SELL, timeout → HOLD
Walk-forward	OOS validation: train on past, test on future, slide window forward
Holdout	Final validation fold never seen during ablation (fold 1, most recent)