MLOps readiness — CVN-N001-EE-S04 — BTC cross-asset features (Track 1, F1_buy boost)¶
Story : CVN-N001-EE-S04 (wp#43) · GH issue #715
Owner : @dococeven (DRI for production behaviour of this change)
Filled on : 2026-04-30
Reviewed by committee :
- v1 session 62d756a9 — REJECTED / EXECUTION_RISK (2 architectural blockers)
- v2 session 6519ed97 — PASSED / EXECUTION_RISK (strong consensus, 0 blockers, 7 forward-looking recos)
1. Production monitoring (MUST)¶
| Metric | Type | Source | Dashboard | Threshold (warn / crit) | Owner |
|---|---|---|---|---|---|
event=btc_features_applied feature_set=... n_features=... purge_bars=... |
counter (training-time) | commun.pipeline.btc_features.compute_btc_features (Loki) |
Grafana cvntrade-track1-btc-features panel "Feature set distribution" |
sample size mismatch vs FTF run config → warn | @dococeven |
| Per-feature distribution (mean, std, KS-statistic vs training distribution) | gauge | offline analysis on inference logs (committee CR pass 2 reco v2.5) | Grafana panel "BTC features drift" | KS p < 0.01 over 14 days → warn ; > 3σ per-feature → crit | @dococeven |
event=btc_ohlcv_quality_alert reason=outlier_returns\|outlier_volume\|wick_to_body (committee CR pass 1 reco #5) |
counter | ETL orchestration layer | Grafana panel "BTC OHLCV quality" | any reason fires > 5 times in 1 h → warn | @dococeven |
event=enrichment_config_mismatch model_run_id=... env_value=... artefact_value=... |
counter (failure) | InferenceAPI._enforce_btc_artefact_consistency (ADR-25 fail-fast) |
Grafana panel "BTC artefact contract" | any occurrence → P1 (model deployed under wrong env config) | @dococeven |
f1_buy per fold per crypto with BTC features |
gauge | FTF results dossier table (committee reco #10) | offline ; not Grafana | per-fold variance > 0.05 → ABANDON variant (gate criterion 3) | operator |
Required minima covered :
- ✅ prediction-rate metric — signals.buy_proba distribution (existing) + new event=btc_features_applied for training-time tagging
- ✅ outcome metric — f1_buy per fold per crypto with BTC variant attribution
- ✅ health metric — event=btc_ohlcv_quality_alert + event=enrichment_config_mismatch (ADR-25 fail-fast contract)
All metrics tagged with the FTF variant id (none / btc_min / btc_full / btc_full_purge0 / btc_full_purge10 / btc_vol_only) per ADR-30.
2. Alerting & runbooks (MUST)¶
- Runbook P2 :
runbook_btc_features_drift.md— handles concept drift on the 6 BTC features (committee CR pass 2 reco v2.5) + BTC OHLCV quality alerts + enrichment artefact contract mismatch. - Alerts :
event=enrichment_config_mismatchfires → P1 alert routed to@dococeven(model deployed under wrong env, ADR-23 violation)- KS test p < 0.01 on any BTC feature over 14 days → P2 alert routed to
@dococeven(drift, follow runbook §1) event=btc_ohlcv_quality_alert reason=outlier_returns> 5 times / hour → P2 alert (BTC feed degradation)btc_full_purge0outperformsbtc_fullpaired t-test BH-corrected p < 0.05 in FTF results → leakage suspected → ABANDON Track 1 (mandatory hard gate per dossier §5)
3. Drift detection (MUST)¶
| Drift type | Detection method | Threshold | Action |
|---|---|---|---|
| Per-feature distribution drift (committee reco v2.5) | KS test on each BTC feature vs training distribution, weekly window | KS p < 0.01 over 14 days | runbook §1 — investigate or rollback to champion_btc_blind |
| BTC-altcoin correlation drift (committee reco #3) | rolling 30d Pearson correlation of target vs BTC returns | drift > 3σ from training-time correlation | runbook §2 — quarterly review trigger |
| BTC OHLCV quality | outlier detection on returns / volume / wick-to-body | per-bar threshold (5σ returns ; 80% volume drop ; wick-to-body > 5) | runbook §3 + alert |
| Cross-regime f1_buy variance | per-regime f1 in FTF results dossier | per-fold variance > 0.05 | gate 3 of F1 plan §6 — block lock |
enrichment_config.json SHA256 mismatch (committee reco v2.1) |
SHA256 of artefact recomputed on load vs MLflow registry tag | strict equality | RuntimeError per ADR-25 (catches partial uploads + tampering) |
| Class distribution drift (existing) | PSI on y_true per fold |
PSI > 0.2 | existing playbook |
4. Staged rollout (MUST)¶
| Stage | Surface | Duration | Gate |
|---|---|---|---|
| 1 | FTF sweep on defi_top5 (5 cryptos × 5 folds × 6 variants = 150 rows) |
run-completion | every gate of F1 plan §6 + tightened f1_buy ≥ +0.020 + mandatory leakage check via paired t-test on purge0 vs full |
| 2 | Pre-FTF sample-size pre-flight (committee reco v2.7) | 1 fold of btc_full on BTCUSDC |
≥ 50 BUY trades / fold ; if fail, FTF sweep aborted |
| 3 | Pre-LOCK rollback dry run (committee reco v2.4) | 24h shadow on the champion_btc_blind fallback model |
feature_names schema match + f1_buy ≥ baseline - 0.01 |
| 4 | Operator promotion → live for 1 crypto (BTCUSDC), 7 days |
7 d | f1_buy ≥ baseline + 0.020 ; max_drawdown ≤ baseline + 1 % |
| 5 | Rollout to all 5 defi-top5 cryptos | continuous | quarterly drift review per §3 |
Per ADR-59, the lock decision is a Console-driven model promotion (atomic per-crypto promotion per ADR-15 + ADR-42), NOT a runtime env-flag toggle. The artefact-pinned config (per ADR-23) means rollback = deploying the champion_btc_blind model, not flipping a switch.
Paper/live integration is NOT in this Story — covered by a separate deployment_review session (committee CR pass 1 reco #4).
5. Rollback plan (MUST)¶
| Symptom | Action | Reversal latency |
|---|---|---|
event=enrichment_config_mismatch (model loaded under wrong env, ADR-25 fail-fast) |
Console promotion of the registered champion_btc_blind model (atomic per-crypto promotion per ADR-15 + ADR-42) |
< 5 minutes |
| Production f1_buy regression > 0.02 over 7 d | same Console promotion of champion_btc_blind |
< 5 minutes |
| BTC OHLCV feed degraded > 5% NaN over a fold | revert to champion_btc_blind ; investigate Binance API status |
< 5 minutes |
Bug in compute_btc_features itself (e.g. window misalignment) |
hot-fix PR ; DOES NOT require redeploying ; revert via Console promotion until fix lands | < 5 minutes for revert ; ~1 hour for fix-and-retrain |
| BTC-altcoin concept drift (correlation > 3σ from training) | quarterly re-fit cadence ; revert to champion_btc_blind if revert is needed before retrain completes |
< 5 min revert ; 1 sprint to retrain |
| Pre-LOCK dry run fails (committee reco v2.4) | LOCK decision blocked ; investigation required before any promotion | 0 (LOCK not approved) |
The rollback path is symmetric : every Track-1 variant ships with a registered champion_btc_blind fallback as part of the LOCK gate. The env var CVN_BTC_FEATURES_ENABLED is training-time only ; flipping it on a deployed model would either dimension-mismatch (caught by ADR-23) or silently impute zero (ruled out by §4.1bis pinning).
6. Owner & DRI (MUST)¶
- DRI :
@dococeven - Backup :
@cvntrade-ml - Escalation :
@cvntrade-architect(architectural drift on the feature contract or rollback workflow) ;@cvntrade-ops(production incident impacting SL/TP behaviour or kill-switch)
7. Known follow-up — cache key extension (committee CR pass 2 reco v2.2)¶
The L2 cache key for enrichment outputs (in commun/cache/) currently does not include the BTC OHLCV identity. With btc_features_enabled=True mixing into a target's enrichment, two parallel runs (one BTC-blind, one BTC-enabled) for the same target window could collide cache entries.
v1 known debt : the FTF sweep runs the BTC-enabled and BTC-blind variants in separate runs (separate run_id) so cache collision between them within a single run is moot. Cross-run cache pollution is a Track 12 concern.
Track 12 follow-up : extend the cache key with btc_first_ts + btc_last_ts + btc_features_set so BTC-blind and BTC-enabled enrichments coexist safely in cache. To be filed as a separate issue before Track 1 LOCK ; deferred per committee CR pass 2 (reco "v1 OR Track 12" — operator chose Track 12 to keep the v1 PR scope tight).
Pre-LOCK gate : if Track 1 clears all 6 official gates, the cache extension MUST land before live promotion (dry-run check : same target window enriched twice with none and btc_full variants in the same cache namespace must not collide).
This is a known-debt acknowledgement, not a blocker — committee verdict v2 accepts the deferral with the pre-LOCK gate above.
Sign-off checklist (gate before PR merge)¶
- §1-§6 all complete
- Plan dossier
2026-04-30-track1-btc-features-plan.mdv2 PASSED committeeplan_review(session6519ed97) - Runbook
runbook_btc_features_drift.mdlands in this PR - All 6 official gates of F1 plan §6 met OR explicit
keep available/abandonverdict — checked at FTF sweep completion, post-merge - Mandatory leakage check :
btc_full_purge0does NOT outperformbtc_full(paired t-test BH p ≥ 0.05) — checked at FTF sweep completion -
champion_btc_blindrollback model registered + 24h shadow dry run passed — checked at LOCK time, post-FTF - Expert Committee
pr_reviewPASSED — runs against this PR before merge (mandatory per ADR-68 for substantial ML changes)