ADR-0070 — MLOps readiness plan template is mandatory before any ML production merge¶

Status: active Date: 2026-04-28 Introduced by: CVN-N001-EF-S01 (#709), F1_buy boost committee session 8db2529d Supersedes: none

Context¶

The F1_buy boost plan committee review (session 8db2529d, Ops finding) flagged a recurring failure mode in the project: ML changes have been merging to production with monitoring, drift detection, and rollback paths defined informally — sometimes by chat, sometimes not at all. When a change degrades silently, the operator discovers it days later via aggregate dashboards rather than via a paged alert.

Concrete observations from the last quarter:

The variance feature-selection method shipped to production for ~2 weeks emitting 1.000291 for every feature post-StandardScaler (issue #706, root cause structural). No alert fired because no metric measured "feature selection variance dispersion" — there was no template forcing the question.
Predictions hook silently dropped 100% of OOS predictions for 1 fold (#700/#701) due to a reset_index() exception caught silently. The runbook for "predictions_captured=False > X% of folds" did not exist; we noticed via per-row spot-check.
Phase 2 rerun was launched without a documented canary stage. When a flaw surfaces mid-run, rollback is "kill all variants and restart" — a 4-hour cost.

The committee finding was : "Mandate a Comprehensive MLOps Readiness Plan for Each Track: Before any track proceeds to implementation, require a detailed plan covering: (1) Production Monitoring, (2) Alerting & Runbooks, (3) Drift Detection, (4) Staged Rollout Strategy."

We need to convert this from a one-off recommendation into a binding gate.

Decision¶

Every ML Story whose implementation touches model training, label generation, feature engineering, inference path, or trading filters MUST publish a filled TEMPLATE_mlops_readiness.md BEFORE its implementation PR can be merged.

The template lives at documentation/templates/TEMPLATE_mlops_readiness.md and has 6 mandatory sections : production monitoring, alerting & runbooks, drift detection, staged rollout, rollback plan, owner & DRI.

The filled version is committed to one of:

documentation/stories/<cvn_id>/mlops_readiness.md (preferred for substantial Stories), or
a ## MLOps readiness section pasted into the GitHub issue body, with the OP Story arch_notes cf pointing to the issue (lighter for small Stories).

The PR description MUST link the filled file (or the issue section) and the committee pr_review session that approved it.

Story scope	Template required	Sections enforced
Touches `src/training/`, `src/commun/{pipeline,inference,filters,labels}/`, model artefacts, FTF factor matrix	YES	all 6
Touches `src/backtest/` only (no live trading impact)	YES	§1, §2, §5 (no rollout/drift since not live)
Touches `src/commun/audit/` or pure observability code	YES (lite)	§1 + §2 + §5 + §6 (audit failures are silent ; need rollback path + DRI)
Pure docs / dashboards / FTF-only experiment with no model promotion path	NO	—

The committee pr_review (per ADR-68) verifies that the filled template is present, complete per the table above, and passes the §Sign-off checklist. PRs without a sign-off block on the merge gate.

Invariants¶

Template lives at documentation/templates/TEMPLATE_mlops_readiness.md — single source of truth, all Stories copy from there. Updates to the template are themselves ADR-eligible if they remove sections.
Filled file path is predictable — either documentation/stories/<cvn_id>/mlops_readiness.md or the GH issue with a ## MLOps readiness H2 anchor. PR template (when introduced) auto-links one of the two.
Skipping a MUST section requires written justification — entry of the form **SKIP — JUSTIFICATION**: <why> directly in the section. Not in chat, not in a separate doc. Committee may reject.
DRI is a single human — never a team alias, never "the operator". The handle named in §6 is accountable for the next 90 days. Re-assignment requires editing the file and commenting on the OP Story.
Rollback must be config-only — §5 must document a revert path that does NOT require a code deploy (FTF factor flag, MLflow alias flip, or ftf_config edit per ADR-59). If a code deploy is the only revert path, the change has structural debt that triggers ADR-56 review.
Canary stage names a specific crypto — §4 must name the crypto hosting the canary (e.g. "BTCUSDC because deepest order book"), not "we'll pick later". Vague rollout plans defeat the gate.
Sunset date is 90 days post full rollout — every change is reviewed at +90d to either become permanent or be revisited. Forces accountability for "we shipped it and forgot".
Drift thresholds are calibration-eligible — the template proposes defaults (PSI 0.2/0.5, perf-gap 0.05/0.10) but Stories MAY justify per-crypto-volatility values in §3 with a one-line rationale. The defaults are not sacred ; the discipline of naming a threshold is.

Alternatives rejected¶

Per-Epic MLOps section instead of per-Story — too coarse: Stories within the same Epic (e.g., F1_buy boost S01-S05) ship at different times and need separate canary strategies. The template fits per-Story to match merge granularity.
Wiki page or Confluence-style central document — centralised pages drift from the actual change set ; the template lives in the Story dossier so it travels with the diff.
Soft recommendation, no merge gate — exact failure mode of the past quarter. Without the gate, the template becomes a "we should fill this someday" artefact.
Auto-generated from code (e.g., pytest plugin scanning for @mlops_ready decorators) — over-engineered for our current cadence (~1 ML PR / week). Revisit if cadence reaches > 5 / week.

Consequences¶

Positive: every ML production change carries a written rollback plan and a named DRI. "Who do I page when this breaks at 3am?" has an answer in the PR description.
Positive: drift detection becomes a default, not an afterthought. New tracks ship with PSI + concept-drift wiring from day one.
Positive: synergistic with ADR-69 (OpenProject orchestrator) — the filled template lives in the Story dossier, which is the artifact the OP Story points to.
Positive: synergistic with ADR-68 (committee = default review channel) — pr_review session has a checklist to verify, not a blank slate.
Negative: ~30-60 min overhead per ML Story to fill the template. Acceptable given the cost of one silent regression (~4-8h to diagnose).
Negative: the gate creates one more committee dependency on the merge path — mitigated by keeping the template short (1 page when filled).
Neutral: the template is opinionated about minima (≥ 1 P1 alert, ≥ 7d shadow, etc.). These can be adjusted via ADR amendment if proven wrong.

Rollback¶

This ADR is process. Rollback = remove the file and revert references to ADR-70 in the template + CLAUDE.md. The filled-template files in Story dossiers stay (they are useful documentation regardless of policy).

If the gate proves systematically wasteful (e.g., > 20 % of ML Stories blocked at merge purely for template completeness over a sample of ≥ 20 Stories, with zero correlated production incidents prevented), revisit the section list before retiring the policy.

References¶

ADR-25 — No silent fallback in ML pipelines (the policy this template operationalises)
ADR-26 — Grafana as single entry point (where §1 metrics surface)
ADR-30 — Structured logs as stable interface (how §1 metrics propagate)
ADR-32, ADR-33 — log_event format + closed event catalogue (the contract §1 builds on)
ADR-56 — Every pipeline change must be FTF-testable / A/B testable (the rollback flag in §5)
ADR-59 — All pipeline params in PostgreSQL ftf_config (the config-only revert in §5)
ADR-68 — Expert Committee = default review channel (the pr_review that enforces this gate)
ADR-69 — OpenProject is the project orchestrator (where the filled template lives in the Story dossier)
documentation/templates/TEMPLATE_mlops_readiness.md — the template itself
Issues : #707 (F1_buy boost plan), #709 (this Story), #729 (Epic CVN-N001-EF)
Committee sessions : 8db2529d (original Ops finding source), 3e0a3008 (this ADR's plan_review, PASSED / OK avg 8.6, 9 recommendations applied to template + scope-table)