Plan dossier — Track 1 : BTC cross-asset features¶
Date : 2026-04-30
Story : CVN-N001-EE-S04 (OP wp#43)
GH issue : #715
Author : Dominique (operator) + Claude
Session type : plan_review (per ADR-68)
Severity : P2 — quick-win bundle Track 4, tier 1 data lever (different lever from Track 5/6 ABANDONED loss-function attempts and Track 9 In testing calibration tier).
Sequencing : per F1_BUY_BOOST_PLAN.md §6 Phase 1 — Track 1 is the next pick after Track 9 enters In testing. Cross-track lesson : training signal manipulation (Track 5+6) is not the productive lever ; Track 1 + Track 12 are the data-tier alternatives.
1. Context — why now, why this lever¶
Tracks 5 (label smoothing + cleanlab, both branches ABANDONED) and 6 (focal loss, all 4 variants ABANDONED) showed that training signal manipulation does not help at the current dataset / labelling regime. Track 9 (per-regime threshold) is in testing — calibration-tier lever, distinct from training signal. Track 1 is the first data-tier lever : it expands the input space rather than tweaking the loss / labels / threshold. The hypothesis is independent of whether Track 9 LOCKs.
CVNTrade currently trades altcoins on 15m candles using own-asset features only (~300 enriched columns from OHLCV + technical indicators per cvntrade_enrich.py). The model is blind to the BTC macro state — yet altcoins are highly BTC-correlated. The F1 plan §5 Track 1 hypothesises that adding a small set of BTC cross-asset features lifts f1_buy materially because the model gains visibility on the dominant regime driver.
2. Hypothesis (falsifiable)¶
Adding 6 BTC cross-asset features lifts f1_buy materially over the BTC-blind baseline at the current dataset / model regime. Specifically :
- H0 (null) :
mean(f1_buy | btc_features=enabled) - mean(f1_buy | disabled)is indistinguishable from 0 (CI95 includes 0) → ABANDON. - H1 (alternative) : Δf1_buy ≥ +0.020 (Story-specific bar — higher than the +0.015 baseline because Track 1 has the largest expected lift per F1 plan §5 : +0.03 to +0.06) with 95 % bootstrap CI excluding 0, AND ≥ 4/5 cryptos individually improve, AND Cohen's d ≥ 0.3.
The hypothesis is falsifiable per the same gate criteria as Tracks 5 / 6 / 9, with a tightened f1_buy bar (+0.020 instead of the standard +0.015) reflecting the higher expected effect size.
3. Variant matrix¶
5 variants per the F1 plan §4.2 convention (5 unique configs per FTF factor, including baseline) :
| Variant | What it does | Features added | Notes |
|---|---|---|---|
none (baseline) |
Existing FE pipeline, BTC-blind | 0 | Reference |
btc_min |
Minimal BTC features — directionality only | btc_return_1h, btc_return_4h, btc_return_24h |
Cheapest variant — tests "BTC direction is enough" |
btc_full |
Full F1-plan feature set | btc_return_1h, btc_return_4h, btc_return_24h, btc_realized_vol_24h, btc_z_score_close, btc_correlation_15m_lag5 |
Canonical Track 1 variant per F1 plan §5 |
btc_full_purge0 |
Same as btc_full but no purging (purge_bars=0) |
Same 6 | Pre-registered "leakage-permitted" sanity — proves purging is doing real work (CR rec #2) — informational only, NOT a candidate for lock |
btc_full_purge10 |
Same as btc_full with purge_bars=10 (sensitivity) |
Same 6 | Empirically justifies purge_bars=20 default (CR pass 1 reco #11). Could become the locked variant if it dominates. |
btc_vol_only |
Volatility-only features (no direction) | btc_realized_vol_24h, btc_z_score_close, btc_correlation_15m_lag5 |
Tests "regime detection without direction" |
6 variants (5 candidates + 1 sensitivity). Per-fold aggregation across folds : standard FTF protocol — bootstrap CI95 + Cohen's d + BH-corrected p-values per F1 plan §7.
Why the purge0 variant is a sanity check, not a candidate : if btc_full_purge0 outperforms btc_full on f1_buy, that's evidence of leakage (the purge is hiding real predictive signal that the model would have used in production-impossible ways). If it underperforms or matches, the purge is doing real work and the locked variant is btc_full. The variant is shipped explicitly in the FTF matrix to make the leakage check part of the gate review, not an afterthought.
4. Implementation path¶
4.1 Cross-asset data path (enrichment_api.py, cvntrade_enrich.py)¶
The current enrichment pipeline (CVNTrade_Enrich.process(df: pd.DataFrame, feature_name: str, mode: str = "train") -> pd.DataFrame) is single-asset only. Track 1 extends it to accept an optional cross-asset reference :
- Extend
EnrichmentConfig(insrc/commun/pipeline/contracts.py) with the BTC configuration fields :btc_features_enabled: bool = False,btc_features_set: Literal["min", "full", "vol_only"] = "full",btc_purge_bars: int = 20,btc_embargo_bars: int = 10. The BTC OHLCV DataFrame itself is NOT aEnrichmentConfigfield — it's runtime data, passed as a separate parameter to the enrichment functions (configuration vs. data separation). - Extend
CVNTrade_Enrich.process()signature to accept an optionalbtc_ohlcv: Optional[pd.DataFrame]kwarg (defaultNone). Whenbtc_ohlcvisNoneANDbtc_features_enabledisFalse, behaviour is bit-identical to the current code (regression bar). Whenbtc_features_enabledisTrue,btc_ohlcvMUST be passed (ADR-25 fail-fast). - New module
commun/pipeline/btc_features.pywithcompute_btc_features(target_ohlcv, btc_ohlcv, feature_set, purge_bars)invoked from the enrichment pipeline ONLY whenbtc_features_enabled=True. Otherwise no BTC columns appear in the output.
4.1bis Feature contract pinning via MLflow artefact (CR pass 1 BLOCKER #2 resolution)¶
⚠️ Status note (CR pass 4) : the persistence + inference loader of
enrichment_config.jsonis OUT-OF-SCOPE for this PR — this section describes the target architecture that the follow-up PR wires. PR #792 ships the dataclass extension + feature computation + FTF factor + tests + docs ; the autotrainer write +InferenceAPI.from_mlflow(run_id)read land in the follow-up. Until follow-up merges, models trained withbtc_features_enabled=TrueMUST NOT be deployed to inference (the InferenceAPI doesn't yet know how to read the pinned config). PR description's "Out of scope" +mlops_readiness.md§7 +EnrichmentConfigdocstring.. note::cover this constraint.
The env var CVN_BTC_FEATURES_* is a TRAINING-TIME signal only. Per ADR-23 (features version-pinned, fail-fast), the inference path must NOT read these env vars — the contract travels with the model artefact (when the follow-up wires the persistence path) :
- At training time (follow-up PR), the autotrainer reads the env vars to decide which
EnrichmentConfigto use. The trained model's MLflow artefacts include a newenrichment_config.jsoncapturing : btc_features_enabled: boolbtc_features_set: strbtc_purge_bars: intbtc_embargo_bars: intfeature_names_with_btc: list[str](pinned ordered list of BTC columns produced by this model)- At inference time (follow-up PR),
InferenceAPIloadsenrichment_config.jsonfrom the model's MLflow artefacts (alongside the existingfeature_names). TheEnrichmentConfigfor the inference call is derived from the model's pinned config, not from the runtime env. If the env disagrees with the model's pinned config, fail fast per ADR-25 with an explicit error listing the mismatch (no silent imputation). feature_namesvalidation (follow-up PR) : at inference, the input DataFrame's columns are compared to the model's pinnedfeature_names. Missing columns OR extra columns →RuntimeError. This is the regression bar that catches the case where someone deploys a BTC-enabled model under a BTC-blind enrichment pipeline (or vice versa).
This mirrors the pattern already used for ThresholdCalibrator and PerRegimeThresholdCalibrator (the calibrator artefact pins its regime_detector_version ; loading under a different version raises RuntimeError per ADR-25).
4.2 BTC OHLCV loading (cvntrade_etl_pipeline.py)¶
The existing ETL pipeline already loads BTC OHLCV via _fetch_binance_data("BTCUSDT", mode=..., timeframe=...). For training, BTC and target-asset OHLCV are fetched on the same window + same timeframe, then both passed to enrichment. For paper/live, the streaming kernel pre-fetches a rolling BTC window and passes it on every candle.
Loading happens at the ETL orchestration layer — the enrichment layer never reaches out for BTC data itself (per ADR-25 fail-fast : if btc_features_enabled=True and btc_ohlcv=None, raise RuntimeError).
4.3 ADR-14 purging invariant (committee F1 plan v2 rec #2)¶
The hard contract per ADR-14 : at training time t, BTC features may use only data ≤ t - purge_bars. This prevents look-ahead leakage from BTC's future state into the altcoin's training labels.
Implementation : _ajouter_features_btc computes the 6 features on the BTC OHLCV, then shifts every BTC feature column by purge_bars bars before joining to the target altcoin's index. The shift is the formal proof of the invariant ; a regression test asserts btc_return_1h.iloc[i] at time t_i was computed from BTC data ≤ t_i - purge_bars × bar_duration (15 min × 20 = 5 hours for default).
Defaults : purge_bars=20 and embargo_bars=10 (per F1 plan §5 Track 1). The btc_full_purge0 variant overrides purge_bars=0 for the leakage-detection sanity check.
4.4 The 6 BTC features — exact definitions¶
| Feature | Definition | Window |
|---|---|---|
btc_return_1h |
pct_change(4) (4 bars on 15m = 1 h) |
4 bars |
btc_return_4h |
pct_change(16) |
16 bars |
btc_return_24h |
pct_change(96) |
96 bars |
btc_realized_vol_24h |
pct_change(1).rolling(96).std() (raw 15m-return standard deviation over 24 h ; un-annualised for direct comparability with the existing FE pipeline's per-bar vol features) |
96 bars |
btc_z_score_close |
(close - rolling_mean(96)) / rolling_std(96) |
96 bars |
btc_correlation_15m_lag5 |
target.pct_change(1).rolling(96).corr(BTC.pct_change(1).shift(5)) |
96 bars + 5-bar lag |
The lagged correlation feature is computed on the target altcoin × BTC pair (not on BTC alone). It uses the target's own OHLCV that's already in scope.
After computation, every column is .shift(purge_bars) before being joined to the target index — the output column at row i has only BTC information from row i - purge_bars and earlier.
4.5 FTF factor + guardrail¶
Add factor=btc_features to src/commun/finetune/ablation_matrix.py under DATA_FACTORS (per ADR-56) with the 5 variants. Gates CVN_BTC_FEATURES_* env vars :
| Variant | env vars |
|---|---|
none |
CVN_BTC_FEATURES_ENABLED=0 |
btc_min |
CVN_BTC_FEATURES_ENABLED=1, CVN_BTC_FEATURES_SET=min, CVN_BTC_PURGE_BARS=20, CVN_BTC_EMBARGO_BARS=10 |
btc_full |
CVN_BTC_FEATURES_ENABLED=1, CVN_BTC_FEATURES_SET=full, CVN_BTC_PURGE_BARS=20, CVN_BTC_EMBARGO_BARS=10 |
btc_full_purge0 |
CVN_BTC_FEATURES_ENABLED=1, CVN_BTC_FEATURES_SET=full, CVN_BTC_PURGE_BARS=0, CVN_BTC_EMBARGO_BARS=0 |
btc_full_purge10 |
CVN_BTC_FEATURES_ENABLED=1, CVN_BTC_FEATURES_SET=full, CVN_BTC_PURGE_BARS=10, CVN_BTC_EMBARGO_BARS=5 |
btc_vol_only |
CVN_BTC_FEATURES_ENABLED=1, CVN_BTC_FEATURES_SET=vol_only, CVN_BTC_PURGE_BARS=20, CVN_BTC_EMBARGO_BARS=10 |
Guardrail in src/commun/finetune/guardrails.py (per ADR-58) — _validate_btc_features :
CVN_BTC_FEATURES_SET∈{min, full, vol_only}— reject other values.CVN_BTC_FEATURES_ENABLED=1⇒ the ETL pipeline MUST passbtc_ohlcvto enrichment. Fail-fast at training entry-point if the BTC dataframe is missing.CVN_BTC_FEATURES_ENABLED=0withCVN_BTC_FEATURES_SETset ⇒ orphaned override, reject (typical copy-paste leak).CVN_BTC_PURGE_BARSandCVN_BTC_EMBARGO_BARS∈[0, 200]—0is the sanity-check variant, anything > 200 is most likely a typo (15 m × 200 = ~50 hours is way more than the model's horizon). TheBTC_prefix avoids collision withCVN_PURGE_BARSused by the global purged k-fold infrastructure (src/training/cv/purged_kfold.py).
4.6 Tests¶
tests/unit/test_enrich_btc_features.py— unit tests for_ajouter_features_btc:- happy path : 6 features computed correctly on synthetic OHLCV pair
- shape : output has same row count as input target
- shift invariant : assert
btc_return_1hat rowiequalsbtc_pct_change(4)at rowi - purge_bars(formal proof of ADR-14) - missing BTC ohlcv with
btc_features_enabled=TrueraisesRuntimeError(ADR-25) btc_features_enabled=Falseproduces zero BTC columns (regression bar — pre-Track-1 behaviour bit-identical)tests/integration/test_track1_btc_features.py— 5-variant FTF matrix end-to-end on small synthetic dataset, asserts per-variant determinism + correct env var routingtests/unit/test_ftf_guardrails.py— extend with the new env var validation (5+ test cases per_validate_btc_featureschecks)
4.7 Observability + MLOps readiness¶
- New event
event=btc_features_applied feature_set=... purge_bars=... n_features=...indexed in Loki (per ADR-32) — emitted once per enrichment run. - Grafana panel "BTC features purge invariant" : checks the lag between
btc_return_1hnon-null first row and the target's first row (must equalpurge_bars). Sanity-check on the running pipeline. - MLOps readiness file
documentation/stories/CVN-N001-EE-S04/mlops_readiness.mdfilled per ADR-70 before merge. - New runbook
documentation/runbooks/runbook_btc_features_drift.md(P2) added per committee CR pass 2 reco v2.5 — covers KS-test alerts, BTC-altcoin correlation drift, BTC OHLCV quality alerts,enrichment_config_mismatch(P1 fail-fast), pre-LOCK rollback dry-run failure handling.
5. Acceptance gate (per F1 plan §6)¶
The 6 official gates apply, with one tightening :
| Gate | Threshold |
|---|---|
| F1_buy lift | mean Δf1_buy ≥ +0.020 with 95 % bootstrap CI excluding 0 (Story-specific tightened from the standard +0.015) |
| Joint metric | Δexpectancy ≥ 0 AND Δsortino ≥ 0 AND Δmax_drawdown ≤ +1 % |
| Stability | per-fold variance of f1_buy ≤ 0.05 |
| Per-asset | f1_buy improves on ≥ 4/5 cryptos |
| Sample size | ≥ 50 BUY trades / fold |
| MLOps | documentation/stories/CVN-N001-EE-S04/mlops_readiness.md complete |
Mandatory leakage check (committee F1 plan v2 rec #2 + CR pass 1 reco #9) — replace the arbitrary +0.005 threshold with a paired t-test (BH-corrected across 5 cryptos × 5 folds = 25 paired observations) on f1_buy(btc_full_purge0) - f1_buy(btc_full). If the paired difference is significantly positive (BH-corrected p < 0.05) → leakage suspected → ABANDON Track 1 pending root-cause investigation. The statistical bar replaces the arbitrary effect-size threshold with a pre-registered hypothesis test, immune to noise floor calibration. This is a hard gate independent of the other 6.
If every gate clears → operator decision lock (Console flip the chosen variant in ftf_config.base_env, ADR-59). If any gate fails on every variant → abandon. If a variant clears AT MOST one gate beyond the F1_buy gate → keep available.
6. Out of scope¶
- Order book microstructure features (F1 plan Track 2) — separate track, deferred to big-bet bundle, gated on Track 1 + Track 9 outcome.
- Other cross-asset references (ETH, SOL, BTC dominance, total market cap) — Track 1 is BTC-only by design ; if it LOCKs, expanding to ETH could be a follow-up Story under the same Epic.
- Adaptive purge_bars (per-regime or volatility-adaptive purging) — premature ; constant
purge_bars=20per ADR-14 standard is the contract. - Online BTC feature computation in paper/live — the streaming kernel needs a rolling BTC window. v1 ships as backtest-only ; paper/live integration is the natural next sprint if Track 1 LOCKs (separate Story under CVN-N001-EE).
- Cross-asset in inference cache — cache key extension to include BTC OHLCV hash. v1 disables cache when
btc_features_enabled=True(fail-safe ; Track 12 will revisit).
7. Falsifiability + rollback¶
- Falsifiability : the gate criteria above (especially the +0.020 f1_buy bar with CI95 excluding 0 + per-asset 4/5) are pre-registered. If the FTF sweep produces Δf1 ∈ [-0.01, +0.015] with CI95 including 0, that's the H0 outcome — ABANDON cleanly. If Δf1 ≥ +0.015 but < +0.020, that's "encouraging but doesn't meet bar" —
keep available(could be combined with a future Track for joint lift). - Rollback (CR pass 1 BLOCKER #1 resolution — model-switching, not env-flag flipping) : if Track 1 LOCKs and a production regression appears, the rollback path is switching the deployed model artefact to a baseline-trained model (i.e. trained without BTC features, with
btc_features_enabled=Falsepinned in itsenrichment_config.json). - The MLOps promotion workflow already handles model artefact swaps per ADR-15 + ADR-42 (atomic per-crypto promotion). The operator promotes the previous BTC-blind champion via the standard Console flow on
mlflow_promotion, NOT viaCVN_BTC_FEATURES_ENABLED=0. - The runtime env var
CVN_BTC_FEATURES_ENABLEDis training-time only ; flipping it on a deployed BTC-enabled model would cause afeature_namesshape mismatch at inference (caught by the §4.1bis ADR-23 contract, raisesRuntimeError). - Mandatory pre-LOCK artefact : every Track-1 LOCK must keep the previous BTC-blind champion as a deployable rollback target in MLflow Registry (tagged
champion_btc_blind). The promotion script enforces this — no Track-1 model becomes champion without a fallback model registered. - Hot-fix path for code bugs : standard PR, retrain the model with the fix, atomic promotion. No runtime env-flag toggle.
- Why this is safer than env-flag rollback : env-flag flipping at inference would either dimension-mismatch the model (caught) or silently impute zero (not caught — the ADR-23 violation flagged by committee). Model-switching keeps the feature contract intact end-to-end.
8. Risks¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| BTC OHLCV gaps (Binance feed outages) propagate to altcoin training | medium | medium | The pd.merge left-join on target index handles missing BTC bars by emitting NaN ; the existing FE pipeline drops rows with > X% NaN. Add a Loki alert if BTC NaN rate > 5% over a fold |
| Look-ahead leakage despite the purge | low | high | Mandatory btc_full_purge0 variant in the FTF matrix surfaces leakage as a gate violation. Plus the unit test that formally verifies the shift invariant |
| Cross-asset features cause MLflow artefact bloat | low | low | The 6 BTC columns are negligible vs the 300+ existing FE columns. Drop one with --cov-report if it ever matters |
| Concept drift : BTC's relationship with altcoins changes (e.g., post-halving regime shift) | medium | high | Track-level review at every quarterly model retrain. If correlations drift > 3σ, file an issue to revisit Track 1 ; quarterly cadence per #709 MLOps readiness template §3 |
| Purging too aggressive eats predictive signal | low | medium | The btc_min (3 features × purge_bars=20) variant tests a thinner feature set ; if it dominates btc_full, suggests over-engineering not over-purging |
| Operator forgets to load BTC OHLCV in the streaming kernel | medium | high | ADR-25 fail-fast at construction time : btc_features_enabled=True + btc_ohlcv=None raises with explicit error |
| BTC OHLCV cost (Binance API quota) blows up the training pipeline budget | low | low | BTC is fetched once per training run (already cached at the ETL level) and added as a join — minimal overhead |
9. Why this is not the next loss-function attempt¶
Per the cross-track lesson recorded in F1_BUY_BOOST_PLAN.md §6 Outcomes : Tracks 5 + 6 closed ABANDONED on training signal manipulation (label engineering + loss function). Track 1 is :
- Data-tier (tier 1 of the F1 plan, distinct from tier 2 LABEL ENGINEERING + tier 3 LOSS FUNCTION + tier 5 CALIBRATION).
- Input-space expansion (the model sees more, not different signal).
- Pre-training (operates on the model's input, not its training signal nor decision rule).
If Track 1 also abandons, the lesson generalizes more strongly across all tiers and the next pick should pivot to Track 12 (fractional differentiation + feature interactions) per the F1 plan §6 implication block. If Track 1 locks, data-tier becomes the productive lever and Track 12 is naturally aligned.
10. Cross-references¶
- F1 plan §5 Track 1 + §6 sequencing
- ADRs : ADR-14 (purging+embargo standard), ADR-25 (no silent fallback), ADR-32 (event=key=value structured logs), ADR-56 (every change FTF-testable), ADR-58 (every factor → guardrail + integration test), ADR-70 (MLOps readiness mandatory)
- Existing infra :
src/ETL/cvntrade_enrich.py:59(entry-point —CVNTrade_Enrich.process)src/commun/pipeline/enrichment_api.py:46(modern wrapper —EnrichmentAPI)src/commun/pipeline/contracts.py:20(extendsEnrichmentConfigwith the 4 BTC config fields ; BTC OHLCV is passed as a separate parameter to enrichment, not stored on the config)src/training/cv/purged_kfold.py:41-46(canonical purge_bars pattern, env varCVN_PURGE_BARS)src/ETL/cvntrade_etl_pipeline.py:365(BTC OHLCV loader —_fetch_binance_data("BTCUSDT", ...))src/commun/finetune/ablation_matrix.py:89(DATA_FACTORS list — will registerbtc_featuresfactor here)tests/unit/test_enrichment_service.py:29(SAMPLE_OHLCV fixture)- Sister Tracks : Track 5 results (ABANDON), Track 6 results (ABANDON), Track 9 results (pending FTF sweep verdict)
- Production filter chain :
architecture/FILTER_FUNNEL.md— Track 1 sits at the FE step (pre-inference)
11. Committee plan_review v1 triage (session 62d756a9, 2026-04-30)¶
v1 verdict : REJECTED / EXECUTION_RISK — split consensus across 5 experts (architect 7.5, ml-engineer 7.5, ops 7.0, data-scientist 8.0, crypto-trader 7.5 — avg 7.5). 2 blockers + 11 recommendations.
Reason cited : "The proposed rollback mechanism is fundamentally flawed and violates ADR-23, posing a severe silent degradation risk due to feature contract mismatch between trained models and the inference pipeline."
11.1 Blockers (architectural — required pre-impl)¶
| # | Blocker | Source | Resolution |
|---|---|---|---|
| 1 | Rollback via env-flag (CVN_BTC_FEATURES_ENABLED=0) violates ADR-23 — flipping at inference creates feature dimension mismatch or silent imputation against a model trained with BTC features |
expert-ops + expert-architect | §7 rewritten — rollback is now model-artefact switching via the existing MLOps promotion workflow (ADR-15 + ADR-42). The env var is downgraded to training-time only. Mandatory : every Track-1 LOCK keeps the previous BTC-blind champion as a registered fallback. |
| 2 | Global env vars create a brittle feature contract — runtime misconfig → wrong-shape input or runtime errors | expert-architect | §4.1bis added — feature contract is pinned in MLflow artefact metadata (enrichment_config.json + feature_names_with_btc). At inference, InferenceAPI derives EnrichmentConfig from the model's pinned config, NOT from runtime env. ADR-25 fail-fast on env↔artefact mismatch. Mirrors the existing regime_detector_version pinning pattern. |
11.2 Recommendations integrated pre-impl (locked into the plan)¶
| # | Recommendation | Source | Integration |
|---|---|---|---|
| 1 | Revise rollback to model-switching | expert-ops + expert-architect | §7 rewritten (also blocker resolution) |
| 2 | Strengthen feature contract via MLflow metadata | expert-architect | §4.1bis added (also blocker resolution) |
| 4 | Plan deployment_review for live | expert-ops + expert-crypto-trader | §6 amended — explicit acknowledgement that paper/live integration requires a separate deployment_review session before live promotion. v1 ships as backtest-only. |
| 5 | Document BTC OHLCV provenance + quality monitoring | expert-ml-eng + expert-architect + expert-crypto-trader | §4.2 amended + new §4.2bis — BTC source = Binance via existing _fetch_binance_data("BTCUSDT", ...) ; document known limitations (Binance only, no exchange aggregation, post-2017 data, no listings prior to BTC-USDT pair availability). New monitoring : outlier detection on returns (> 5σ flagged) + volume anomalies (drop > 80 % vs 30d median) + wick-to-body ratios (> 5 flagged) — emitted as Loki events btc_ohlcv_quality_alert reason=.... |
| 6 | Verify backtest cost model realism | expert-crypto-trader | §5 amended — explicit confirmation that expectancy_net and sortino gates use the F1 plan §4 cost formula : gross_pnl - taker_fee - spread - slippage - funding (round-trip ≈ 45 bps interim per the v3 cost assumption). Pinned in the FTF results dossier template. Updated cost model from Track 2 dynamic slippage will retroactively apply when it lands. |
| 9 | Replace +0.005 leakage threshold with statistical test | expert-data-scientist | §5 leakage gate rewritten — paired t-test with BH correction across 25 paired observations. Hypothesis-test bar replaces the effect-size threshold. |
| 11 | Sensitivity for purge_bars |
expert-architect + expert-data-scientist | §3 matrix extended — added btc_full_purge10 variant. Now 6 variants : none, btc_min, btc_full, btc_full_purge0 (leakage check), btc_full_purge10 (sensitivity), btc_vol_only. |
11.3 Recommendations applied at impl time¶
| # | Recommendation | When |
|---|---|---|
| 3 | Continuous concept/data drift monitoring | Phase 4 — extends §4.7 observability with feature distribution monitoring (KS test on each BTC feature vs training distribution, weekly window) + Grafana panel "BTC features drift" + alert if KS p < 0.01 over 14 days |
| 7 | Specific tests : NaN propagation, window alignment, no-future-leak | Phase 1 — extends §4.6 unit test list ; tests below in §11.4 |
| 10 | Per-asset metrics in FTF results | Phase 5 — extends FTF results dossier table (already mandated by F1 plan §6 per-asset gate, just makes the per-asset trade count + variance explicit) |
11.4 Tests added per CR pass 1 reco #7¶
Beyond the 5 unit tests already listed in §4.6 :
- NaN propagation test : input target with random NaN gaps in BTC OHLCV → assert that the target altcoin's row at time
tis dropped only if BTC's row att - purge_barswas NaN ; downstream rows unaffected. - Window alignment test : assert that
btc_correlation_15m_lag5at rowiusestarget.pct_change(1).rolling(96)from rows[i-95, i]paired withBTC.pct_change(1).shift(5).rolling(96)from rows[i-100, i-5]— formal proof that BTC's 5-bar lag is applied BEFORE the rolling window, not after. - No-future-leak test : compute
_ajouter_features_btcon synthetic data where BTC's last 50 rows have a step-function spike. Assert that the spike does NOT appear in any target row at index< n - 50 - purge_bars. Catches subtle off-by-one errors in the shift logic.
11.5 Recommendations deferred (out of scope for S04)¶
- Reco #8 — Cache re-enablement : flagged as Track 12 work. v1 disables cache when
btc_features_enabled=True(fail-safe ; Track 12 will revisit cache key extension to include BTC OHLCV hash). Documented in §6 out-of-scope.
11.6 §6 amendment — deployment_review for paper/live¶
Online BTC feature computation in paper/live — the streaming kernel needs a rolling BTC window. v1 ships as backtest-only ; paper/live integration requires a dedicated
deployment_reviewcommittee session covering staged rollout (canary → shadow → live), real-time drift detection, live feedback loops, kill-switch validation. This is a separate Story under CVN-N001-EE if Track 1 LOCKs.
11.7 Net effect on §4 implementation path¶
- 2 blockers fixed (model-artefact feature contract + model-switching rollback).
- 7 recos applied directly (rollback rewrite, feature contract pinning, BTC provenance + quality monitoring, statistical leakage test, purge_bars sensitivity variant, deployment_review acknowledgement, cost model realism confirmation).
- 3 recos applied at impl time (drift monitoring, NaN/window/no-future-leak tests, per-asset metrics).
- 1 reco deferred (cache re-enablement — Track 12).
Verdict re-submitted to committee in v2 round (session ID TBD) with this triage section explicit. Re-submission expected to upgrade to PASSED EXECUTION_RISK.
11bis. Committee plan_review v2 triage (session 6519ed97, 2026-04-30)¶
v2 verdict : PASSED / EXECUTION_RISK — strong consensus across 5 experts, 0 blockers, 7 new recos.
Reason cited : "The plan successfully addresses the v1 blockers regarding ADR-23 compliance and rollback mechanisms, but new execution risks related to cache integrity, MLflow artefact validation, and live model swap atomicity are identified."
The two architectural rewrites (§4.1bis feature contract + §7 model-switching rollback) are accepted. Implementation may proceed.
11bis.1 Recommendations integrated pre-impl (5 of 7)¶
| # | Recommendation | Source | Integration |
|---|---|---|---|
| v2.1 | Checksum validation for MLflow artefacts | all 5 experts | §4.1bis amended — enrichment_config.json and feature_names_with_btc ship with their SHA256 in the MLflow registry tags. At load, InferenceAPI recomputes the hashes and raises RuntimeError per ADR-25 if any drift (catches partial uploads + tampering). Pre-promotion hook in the MLOps workflow validates the hashes before the artefact is registered. |
| v2.2 | Cache key includes BTC features state | all 5 experts | §4.bis added — when btc_features_enabled=True, the L2 cache key is extended with + btc_first_ts + btc_last_ts + btc_features_set so a BTC-enabled enrichment cannot collide with a BTC-blind one for the same target window. 5-line patch in commun/cache/. Track 12 will revisit a hashed-window key for a tighter contract. |
| v2.5 | Drift response runbook | expert-ops + ml-eng + crypto-trader | §4.7 amended — new documentation/runbooks/runbook_btc_features_drift.md (P2) covers KS-test alert response : revert to BTC-blind champion if KS p < 0.01 over 14 days OR per-feature distribution drift > 3σ from training distribution, with quantitative thresholds for revisiting Track 1. |
| v2.6 | Stress-case tests | expert-crypto-trader | §4.6 amended — new tests for synthetic BTC flash crash (-30% spike in 4 bars) + halving-like step (10% baseline shift) ; assert that the per-fold f1_buy doesn't degrade > 0.05 vs a no-stress run. |
| v2.7 | ≥ 50 BUY trades/fold pre-FTF validation |
expert-crypto-trader | §5 amended — operator runs 1 fold of btc_full on BTCUSDC (acts as a pre-flight check) and verifies sample size BEFORE triggering the full FTF sweep. If fail, the FTF sweep is aborted and the run gets a sample-size diagnostic dossier, not a verdict dossier. |
11bis.2 Recommendations applied at impl + deployment time (1 of 7)¶
| # | Recommendation | When |
|---|---|---|
| v2.3 | Atomic MLflow promotion + race conditions | Deferred to deployment_review session per §6 (pre-paper/live). The session covers blue/green, circuit breakers, pre-swap health checks. v1 ships backtest-only ; no live model swap on this Story. |
11bis.3 Recommendations applied at LOCK time (1 of 7)¶
| # | Recommendation | When |
|---|---|---|
| v2.4 | Pre-LOCK rollback dry run in staging | At LOCK decision time. Operator runs the registered champion_btc_blind for 24h shadow on the same day's data ; assert that its feature_names match the inference-time enrichment output (no schema drift) AND its f1_buy on the shadow window is ≥ baseline - 0.01 (basic sanity). Documented in mlops_readiness.md §5 rollback plan as a mandatory pre-promotion gate. |
11bis.4 Net path forward¶
- 0 v2 blockers → impl proceeds.
- 5 v2 recos applied pre-impl (checksum, cache key, drift runbook, stress tests, sample-size pre-flight).
- 1 v2 reco deferred to deployment_review (race conditions during live swap).
- 1 v2 reco applied at LOCK time (rollback dry run).
Together with v1 triage : 7 v1 + 5 v2 = 12 recos applied pre-impl, 3 v1 + 1 v2 = 4 recos at impl time, 1 v2 at LOCK time, 1 v1 deferred (Track 12 cache).
11bis.5 EXECUTION_RISK acknowledgment¶
The EXECUTION_RISK code remains in v2 because Track 1 introduces first cross-asset feature in CVN history — committee correctly flags that the architectural patterns (feature contract pinning, model-switching rollback, checksum validation) are new to the codebase and have execution risk in their first integration. The risk is acknowledged + budgeted into the impl phase (extra rigor on tests + observability, NOT cut corners on the patterns themselves).
Question for the committee (v2)¶
v1 verdict : REJECTED / EXECUTION_RISK due to ADR-23 violation in the rollback mechanism + brittle feature contract via runtime env vars. Both blockers addressed in §4.1bis (feature contract pinned in MLflow artefact, env var becomes training-time only) + §7 (rollback via model-artefact switching, not env-flag flipping). All 11 recos triaged in §11 — 7 applied pre-impl, 3 at impl time, 1 deferred to Track 12.
Re-validate : is the new feature contract pinning pattern (MLflow
enrichment_config.json+feature_names_with_btc, derived from artefact at inference, ADR-25 fail-fast on env↔artefact mismatch) sufficient to satisfy ADR-23 ? Is the model-switching rollback path (atomic promotion of the BTC-blind champion via existing MLOps workflow) sufficient to safely revert ? Are there remaining hidden modes (e.g. cache key collision between BTC-blind and BTC-enabled enrichment outputs, race condition during live model swap) the v1 dossier missed ?