MLOps readiness — CVN-N001-EI-S03 — Split / regime reconstruction (Block 3b)¶
- Story: CVN-N001-EI-S03 · wp#226 · GH #1058 · Epic CVN-N001-EI (#1055)
- Template:
documentation/templates/TEMPLATE_mlops_readiness.md(ADR-70) - Plan dossier:
documentation/reviews/2026-05-28-cvn-n001-ei-s03-split-regime-reconstruction-plan.md(committeeplan_review0854dc31PASSED) - Decision:
documentation/reviews/2026-05-28-cvn-n001-ei-s03-regime-signal-source-decision.md(Option B unconditional)
Nature of this change (read first): S03 ships (a) the S41 split-ablation diagnostic — a two-layer (Airflow + Hamilton) instrument that re-splits and re-fits a model under six split families to measure whether
best_iter+ AUC move materially (the C-d test), and resolves the S02 temporal-autocorrelation reserve; and (b) theCVN_SPLIT_MODEgated production split — an off-by-default, A/B-testable capability (ADR-56) that inserts a purged walk-forward embargo gap, NOT promoted to the production default (ADR-2 — awaits the S03 + S04 verdicts). The diagnostic audits models; it ships none. The gated split changes nothing in production whileCVN_SPLIT_MODE=off(the default). The behavioural sections (drift / staged rollout / expectancy / canary) are therefore N/A — justified for this PR; promotion of the gated split is a separate, future decision with its own readiness.
1. Production monitoring (MUST)¶
| Metric | Type | Source | Threshold (warn / crit) | Owner |
|---|---|---|---|---|
s41_verdict.status |
counter (per status) | Loki {namespace="cvntrade"} \|= "event=s41_verdict" |
warn on severity=error (INCONCLUSIVE_TOOLING); review on BASELINE_LEAK_INFLATED (HALT-level — leak signal) |
dococeven |
s41_cell_outcome / s41_group_outcome |
counter | Loki \|= "event=s41_cell_outcome" / "event=s41_group_outcome" |
n/a (diagnostic verdict) | dococeven |
s41_embargo_violation |
counter | Loki \|= "event=s41_embargo_violation" |
crit on any — a guarded split produced train/test overlap or short embargo (a code bug) | dococeven |
s41_power_estimate.powered |
gauge (bool) | Loki \|= "event=s41_power_estimate" |
n/a (false → family-6 underpowered → INCONCLUSIVE_UNDERPOWERED) |
dococeven |
split_mode_applied.mode |
counter | Loki \|= "event=split_mode_applied" |
alert if mode != off unexpectedly (the gated split must stay off until promoted) |
dococeven |
- Prediction-rate metric: N/A — adds no prediction. The diagnostic re-fits to measure AUC, not to serve.
- Outcome metric: N/A — no trading-outcome change (the gated split is off by default).
2. Alerting & runbooks (MUST)¶
s41_embargo_violation(crit) → the leak guard fired in a guarded family: aregime_splitbug. Runbook: inspect the cell'sfamily/kind, fixregime_split, re-run. (Should be impossible — unit-tested — hence crit if seen in prod.)split_mode_applied mode != offwhile no promotion decision exists → someone setCVN_SPLIT_MODEinftf_config; confirm it was an intentional A/B run, else reset tooff(Console, ADR-59).BASELINE_LEAK_INFLATEDgroup verdict → the program is under a leak HALT signal (the S02 reserve direction); escalate per the plan §1 routing before any modeling conclusion.
3. Drift detection (MUST)¶
N/A — no feature / label / architecture / calibration change. S03 adds a diagnostic + a gated-off split capability. While CVN_SPLIT_MODE=off (default) the production train/test boundary is byte-identical to today. Production drift detection is owned by the N010 Drift Store epics, unaffected here.
4. Staged rollout (MUST)¶
N/A — diagnostic + gated-off capability. The S41 diagnostic is operator-triggered (no traffic, no predictions). The CVN_SPLIT_MODE split is off by default and not promoted (ADR-2); there is nothing to shadow / canary / roll out in this PR. If the S03 + S04 verdicts later justify promoting the purged-WF split to the production default, that promotion is a separate Story with its own staged rollout + canary + this template re-filled — explicitly out of scope here.
5. Rollback plan (MUST)¶
| Mechanism | Description | Revert SLA |
|---|---|---|
| Config flag | CVN_SPLIT_MODE=off (the default) — disables the purged-WF split, no deploy needed (Console, ADR-59) |
< 1 min |
| Git revert | Revert the S03 commits on main — removes the diagnostic + the gated split wiring. No model / feature / production-config artefact involved (the gated split was never promoted). |
< 5 min (next DAG-sync + build) |
6. Owner & DRI (MUST)¶
- Owner / DRI: dococeven
- Diagnostic verdict authority: committee (the S41 group verdict + the §12 routing feed a committee
experiment_review, not an automatic production action — and a control-group result triggers a generalisation Story, not a program-wide ship, per the decision dossier §reconciliation).
Sign-off checklist (gate before PR merge)¶
- §1 monitoring :
s41_verdict/s41_embargo_violation/s41_power_estimate/split_mode_applieddefined + Loki-discoverable ; prediction/outcome N/A-justified (no behavioural change while gated off) - §2 alerting : embargo-violation (crit) +
split_mode != off+BASELINE_LEAK_INFLATEDrunbooks stated - §3 drift : N/A-justified (no feature/label/architecture/calibration change)
- §4 rollout : N/A-justified (diagnostic + gated-off split ; promotion = separate Story)
- §5 rollback :
CVN_SPLIT_MODE=offflag (< 1 min) + git revert (< 5 min) - §6 owner : dococeven ; verdict authority = committee
- Story OP comment links the committee
plan_reviewsession0854dc31/ OP Meeting #234