Skip to content

MLOps readiness — CVN-N001-EE-S03 — Per-regime threshold (Track 9, F1_buy boost)

Story: CVN-N001-EE-S03 (wp#42) · GH issue #714 Owner: @dococeven (DRI for production behaviour of this change) Filled on: 2026-04-30 Reviewed by committee: session c560b67a (plan_review PASSED EXECUTION_RISK ; verdict OK with 11 recos triaged in plan dossier §11)


1. Production monitoring (MUST)

Metric Type Source Dashboard Threshold (warn / crit) Owner
event=threshold_applied source=per_regime\|global_no_regime\|global_unknown_regime\|global_floor\|global_low_regime_confidence counter InferenceAPI._resolve_thresholds (Loki) Grafana cvntrade-track9-per-regime-threshold panel "Threshold source distribution" per_regime < 50 % over 1 h → warn ; < 20 % → crit (means most inferences fall back to global, defeating the Track) @dococeven
event=per_regime_threshold_fallback regime=... reason=insufficient_samples\|insufficient_positives\|negative_expectancy counter (training-time) PerRegimeThresholdCalibrator.fit (Loki) Grafana panel "Fallback reasons" any single reason on > 3 of 5 folds → warn (suggests systemic issue, not noise — committee reco #4) @dococeven
event=regime_classified regime=... confidence=... (committee reco #1) gauge / counter existing regime_detector.classify_regime Loki line + new emission in inference path Grafana panel "Regime confidence distribution" confidence < 0.6 over > 10 % of inferences over 1 h → warn @dococeven
f1_buy per fold per regime (committee reco #7) gauge FTF results dossier table (extended) offline analysis ; not a Grafana panel per-fold variance > 0.05 → ABANDON variant (gate criterion 3) operator
inference_latency_p99 (existing) histogram OpenTelemetry span on predict_single existing Grafana cvntrade-inference-latency p99 +50 % vs pre-Track-9 baseline → warn (regime classification adds work) @dococeven

Required minima covered :

  • ✅ prediction-rate metric — signals.buy_proba distribution (existing) + new event=threshold_applied for source attribution
  • ✅ outcome metric — f1_buy per fold per regime + expectancy_net_realized (existing)
  • ✅ health metric — event=per_regime_threshold_fallback + existing inference_latency_p99

All metrics tagged with FTF factor / variant id (per_regime_f1 / per_regime_expectancy / per_regime_f1_with_floor / coarse_3regime / none) per ADR-30.

2. Alerting & runbooks (MUST)

  • Runbook P2 : runbook_per_regime_threshold_drift.md — handles drift > 2σ from training threshold over 7 d (committee reco #6 backstop) + low regime confidence + missing artefact at inference + negative expectancy on > 50 % of folds.
  • Alerts :
  • event=per_regime_threshold_fallback reason=insufficient_positives fires on > 50 % of folds for any regime → P2 alert routed to @dococeven
  • event=regime_classified confidence < 0.6 fires on > 10 % of inferences over 1 h → P2 alert
  • event=regime_rejected reason=negative_expectancy fires on > 50 % of folds for any regime → P2 alert (committee reco #4)

3. Drift detection (MUST)

Drift type Detection method Threshold Action
Per-regime threshold drift rolling 7d comparison vs regime_detector_version artefact baseline (Track 9 dedicated) per-regime threshold drifts > 2σ from training runbook §3 — investigate or rollback
Regime frequency drift proportion of inferences per regime, 7d window vs val-set baseline proportional drift > 20 % on any regime runbook §4 — pre-deploy stability validation (committee reco #3)
Class distribution drift (existing) PSI on y_true per fold PSI > 0.2 existing playbook
regime_detector_version mismatch pin enforcement at MLflow load time (PerRegimeThresholdCalibrator.from_metadata) strict equality RuntimeError per ADR-25 (committee reco #8)
Cross-regime f1_buy variance per-regime f1 in FTF results dossier per-fold variance > 0.05 gate 3 of F1 plan §6 — block lock

4. Staged rollout (MUST)

Stage Surface Duration Gate
1 FTF sweep on defi_top5 (5 cryptos × 5 folds × 5 variants = 125 rows) run-completion every gate of F1 plan §6 must clear → operator decision lock / keep available / abandon
2 Shadow inference on BTCUSDC paper trading, 7 days 7 d ≤ 5 % delta on cumulative P&L vs baseline ; no RuntimeError from artefact loading
3 Operator Console flip CVN_THRESHOLD_PER_REGIME=1 for 1 crypto live (BTCUSDC) 7 d f1_buy ≥ baseline + 0.015 ; max_drawdown ≤ baseline + 1 %
4 Rollout to all 5 defi-top5 cryptos continuous quarterly drift review per §3

Per ADR-59, the lock decision is a Console flip (no code change). The artefact stays per-version in MLflow ; rolling back the model also rolls back the per-regime calibrator.

5. Rollback plan (MUST)

Symptom Action Reversal latency
RuntimeError from PerRegimeThresholdCalibrator.from_metadata (schema or regime_version drift) Console flip CVN_THRESHOLD_PER_REGIME=0 (ADR-59) — fully reverts to global threshold, no model retrain < 1 minute
Production f1_buy regression > 0.02 over 7 d same Console flip < 1 minute
Per-regime calibrator drift alert (P2) over 14 d flip the variant via Console UI (per_regime_f1_with_floor instead of per_regime_f1) to dampen the deviation < 1 minute
Bug in calibrator code itself (parser drift, off-by-one in routing) hot-fix PR + redeploy console-next (no model retrain needed since artefact stays valid) < 1 hour
Regime detector hardening required (committee reco #6 — contingent on Track 9 LOCK) follow-up Story under CVN-N001-EE next sprint ; old behaviour available as fallback ~1 sprint

The rollback is symmetric : every variant lives under one env flag (CVN_THRESHOLD_PER_REGIME) plus optional method/floor/grouping overrides ; flipping the master flag to 0 reverts cleanly.

6. Owner & DRI (MUST)

  • DRI : @dococeven
  • Backup : @cvntrade-ml
  • Escalation : @cvntrade-architect (architectural drift) ; @cvntrade-ops (production incident impacting kill-switch / SL/TP behaviour)

Sign-off checklist (gate before PR merge)

  • §1-§6 all complete
  • Plan dossier 2026-04-29-track9-per-regime-threshold-plan.md PASSED committee plan_review (session c560b67a)
  • Runbook runbook_per_regime_threshold_drift.md lands in this PR
  • All 6 official gates of F1 plan §6 met OR explicit keep available / abandon verdict in the results dossier — checked at FTF sweep completion, post-merge (Story closes only after the operator decision per workflow §2.5)
  • Expert Committee pr_review PASSED — runs against this PR before merge (mandatory per ADR-68 for substantial ML changes)