ADR-0070 — MLOps readiness plan template is mandatory before any ML production merge¶
Status: active
Date: 2026-04-28
Introduced by: CVN-N001-EF-S01 (#709), F1_buy boost committee session 8db2529d
Supersedes: none
Context¶
The F1_buy boost plan committee review (session 8db2529d, Ops finding) flagged a recurring failure mode in the project: ML changes have been merging to production with monitoring, drift detection, and rollback paths defined informally — sometimes by chat, sometimes not at all. When a change degrades silently, the operator discovers it days later via aggregate dashboards rather than via a paged alert.
Concrete observations from the last quarter:
- The variance feature-selection method shipped to production for ~2 weeks emitting
1.000291for every feature post-StandardScaler (issue #706, root cause structural). No alert fired because no metric measured "feature selection variance dispersion" — there was no template forcing the question. - Predictions hook silently dropped 100% of OOS predictions for 1 fold (#700/#701) due to a
reset_index()exception caught silently. The runbook for "predictions_captured=False > X% of folds" did not exist; we noticed via per-row spot-check. - Phase 2 rerun was launched without a documented canary stage. When a flaw surfaces mid-run, rollback is "kill all variants and restart" — a 4-hour cost.
The committee finding was : "Mandate a Comprehensive MLOps Readiness Plan for Each Track: Before any track proceeds to implementation, require a detailed plan covering: (1) Production Monitoring, (2) Alerting & Runbooks, (3) Drift Detection, (4) Staged Rollout Strategy."
We need to convert this from a one-off recommendation into a binding gate.
Decision¶
Every ML Story whose implementation touches model training, label generation, feature engineering, inference path, or trading filters MUST publish a filled TEMPLATE_mlops_readiness.md BEFORE its implementation PR can be merged.
The template lives at documentation/templates/TEMPLATE_mlops_readiness.md and has 6 mandatory sections : production monitoring, alerting & runbooks, drift detection, staged rollout, rollback plan, owner & DRI.
The filled version is committed to one of:
documentation/stories/<cvn_id>/mlops_readiness.md(preferred for substantial Stories), or- a
## MLOps readinesssection pasted into the GitHub issue body, with the OP Storyarch_notescf pointing to the issue (lighter for small Stories).
The PR description MUST link the filled file (or the issue section) and the committee pr_review session that approved it.
| Story scope | Template required | Sections enforced |
|---|---|---|
Touches src/training/, src/commun/{pipeline,inference,filters,labels}/, model artefacts, FTF factor matrix |
YES | all 6 |
Touches src/backtest/ only (no live trading impact) |
YES | §1, §2, §5 (no rollout/drift since not live) |
Touches src/commun/audit/ or pure observability code |
YES (lite) | §1 + §2 + §5 + §6 (audit failures are silent ; need rollback path + DRI) |
| Pure docs / dashboards / FTF-only experiment with no model promotion path | NO | — |
The committee pr_review (per ADR-68) verifies that the filled template is present, complete per the table above, and passes the §Sign-off checklist. PRs without a sign-off block on the merge gate.
Invariants¶
- Template lives at
documentation/templates/TEMPLATE_mlops_readiness.md— single source of truth, all Stories copy from there. Updates to the template are themselves ADR-eligible if they remove sections. - Filled file path is predictable — either
documentation/stories/<cvn_id>/mlops_readiness.mdor the GH issue with a## MLOps readinessH2 anchor. PR template (when introduced) auto-links one of the two. - Skipping a MUST section requires written justification — entry of the form
**SKIP — JUSTIFICATION**: <why>directly in the section. Not in chat, not in a separate doc. Committee may reject. - DRI is a single human — never a team alias, never "the operator". The handle named in §6 is accountable for the next 90 days. Re-assignment requires editing the file and commenting on the OP Story.
- Rollback must be config-only — §5 must document a revert path that does NOT require a code deploy (FTF factor flag, MLflow alias flip, or
ftf_configedit per ADR-59). If a code deploy is the only revert path, the change has structural debt that triggers ADR-56 review. - Canary stage names a specific crypto — §4 must name the crypto hosting the canary (e.g. "BTCUSDC because deepest order book"), not "we'll pick later". Vague rollout plans defeat the gate.
- Sunset date is 90 days post full rollout — every change is reviewed at +90d to either become permanent or be revisited. Forces accountability for "we shipped it and forgot".
- Drift thresholds are calibration-eligible — the template proposes defaults (PSI 0.2/0.5, perf-gap 0.05/0.10) but Stories MAY justify per-crypto-volatility values in §3 with a one-line rationale. The defaults are not sacred ; the discipline of naming a threshold is.
Alternatives rejected¶
- Per-Epic MLOps section instead of per-Story — too coarse: Stories within the same Epic (e.g., F1_buy boost S01-S05) ship at different times and need separate canary strategies. The template fits per-Story to match merge granularity.
- Wiki page or Confluence-style central document — centralised pages drift from the actual change set ; the template lives in the Story dossier so it travels with the diff.
- Soft recommendation, no merge gate — exact failure mode of the past quarter. Without the gate, the template becomes a "we should fill this someday" artefact.
- Auto-generated from code (e.g., pytest plugin scanning for
@mlops_readydecorators) — over-engineered for our current cadence (~1 ML PR / week). Revisit if cadence reaches > 5 / week.
Consequences¶
- Positive: every ML production change carries a written rollback plan and a named DRI. "Who do I page when this breaks at 3am?" has an answer in the PR description.
- Positive: drift detection becomes a default, not an afterthought. New tracks ship with PSI + concept-drift wiring from day one.
- Positive: synergistic with ADR-69 (OpenProject orchestrator) — the filled template lives in the Story dossier, which is the artifact the OP Story points to.
- Positive: synergistic with ADR-68 (committee = default review channel) —
pr_reviewsession has a checklist to verify, not a blank slate. - Negative: ~30-60 min overhead per ML Story to fill the template. Acceptable given the cost of one silent regression (~4-8h to diagnose).
- Negative: the gate creates one more committee dependency on the merge path — mitigated by keeping the template short (1 page when filled).
- Neutral: the template is opinionated about minima (≥ 1 P1 alert, ≥ 7d shadow, etc.). These can be adjusted via ADR amendment if proven wrong.
Rollback¶
This ADR is process. Rollback = remove the file and revert references to ADR-70 in the template + CLAUDE.md. The filled-template files in Story dossiers stay (they are useful documentation regardless of policy).
If the gate proves systematically wasteful (e.g., > 20 % of ML Stories blocked at merge purely for template completeness over a sample of ≥ 20 Stories, with zero correlated production incidents prevented), revisit the section list before retiring the policy.
References¶
- ADR-25 — No silent fallback in ML pipelines (the policy this template operationalises)
- ADR-26 — Grafana as single entry point (where §1 metrics surface)
- ADR-30 — Structured logs as stable interface (how §1 metrics propagate)
- ADR-32, ADR-33 — log_event format + closed event catalogue (the contract §1 builds on)
- ADR-56 — Every pipeline change must be FTF-testable / A/B testable (the rollback flag in §5)
- ADR-59 — All pipeline params in PostgreSQL ftf_config (the config-only revert in §5)
- ADR-68 — Expert Committee = default review channel (the
pr_reviewthat enforces this gate) - ADR-69 — OpenProject is the project orchestrator (where the filled template lives in the Story dossier)
documentation/templates/TEMPLATE_mlops_readiness.md— the template itself- Issues : #707 (F1_buy boost plan), #709 (this Story), #729 (Epic CVN-N001-EF)
- Committee sessions :
8db2529d(original Ops finding source),3e0a3008(this ADR'splan_review, PASSED / OK avg 8.6, 9 recommendations applied to template + scope-table)