MLOps readiness — `CVN-N001-EI-S04 — LightGBM capacity ablation (Block 2)`¶

Story: CVN-N001-EI-S04 — GH #1059 · OP wp#227 · Epic CVN-N001-EI (wp#223) Owner: dococeven (DRI for production behaviour of this change) Filled on: 2026-06-01 Reviewed by committee: plan_review session 38f920fa PASSED (Mistral 8/7/8/8/8 unanimous ; Gemini cap → CONSOLIDATOR_ERROR, judged on Mistral per operator policy). PR pr_review pending — src/commun/finetune/diagnostic/ tooling scope.

Nature of this change (read first): this Story is diagnostic-harness tooling, not a production-ML change. It adds the Block 2 LightGBM capacity ablation (s42) — a read-only diagnostic that replays the S07-pinned captured fold against a grid of LGB capacity hyperparameters (primary axis num_leaves + exploratory learning_rate / min_child_samples / lambda_l2 / min_gain_to_split, one-at-a-time) to decide Epic status B (is the prod LGB under- or over-capacity for the signal?). It produces verdicts about a trained model (B_CAPACITY_OK / B_DEFAULTS_OVERFIT / B_SYSTEMATIC / B_SYSTEMATIC_OVERFIT / B_PER_ASSET / INCONCLUSIVE_*), never ships one. It changes nothing in production predictions, calibration, thresholds, features, labels, or trading: the harness audits models, it does not train production artefacts. It reuses the S07 warm-pin path (s41_io._pin_load/_pin_store) and the S03 Q1.g harness feature_name fix by inheritance. The behavioural sections below (drift, staged rollout, expectancy, canary) are therefore N/A — justified; the Story is itself a diagnostic-tooling addition.

1. Production monitoring (MUST)¶

Metric	Type	Source	Dashboard	Threshold (warn / crit)	Owner
`s42_verdict.outcome`	counter (per outcome)	Loki `{namespace="cvntrade"} \\|= "event=s42_verdict"`	diagnostic-harness panel (TBD link)	warn on any `INCONCLUSIVE_*` outcome (`INCONCLUSIVE_TOOLING`/`_UNDERPOWERED`/`_UNCOVERED`) — diagnostic could not conclude	dococeven
`s42_group_verdict.outcome`	counter (per outcome)	Loki `\\|= "event=s42_group_verdict"`	same	warn on `INCONCLUSIVE_GROUP_COVERAGE` (group synthesis under coverage floor)	dococeven
`s42_io_parquet_load_failed`	counter	Loki `\\|= "event=s42_io_parquet_load_failed"` (captured-fold load guard)	same	warn on any emission (`severity=error` — capture/pin missing)	dococeven
`s42_io_hp_resolve_failed`	counter	Loki `\\|= "event=s42_io_hp_resolve_failed"` (ADR-90 HP resolver fail-loud)	same	warn on any emission (ftf_config key missing)	dococeven
`s42_io_sha_recompute_failed` / `s42_s22a1_raised`	counter	Loki `\\|= "event=s42_io_sha_recompute_failed"` / `\\|= "event=s42_s22a1_raised"`	same	n/a (diagnostic provenance / probe-exception capture → `INCONCLUSIVE_TOOLING`)	dococeven

Required minima — assessed for a diagnostic-tooling Story: - Prediction-rate metric: N/A — adds no prediction. Production prediction-rate metrics are unchanged by this Story. - Outcome metric: N/A — no trading-outcome change. The harness produces verdicts about model capacity, not trades. - Health metric: ✅ — every error path is legible in Loki (s42_verdict/s42_group_verdict with explicit outcome, plus the s42_io_* fail-loud guards) instead of an opaque Airflow traceback. ADR-25: every error path yields an INCONCLUSIVE_* verdict, never a raised exception to the operator UI. - Tagged per ADR-30 (crypto, fold_id, outcome). No FTF variant applies (diagnostic, not an A/B factor).

2. Alerting & runbooks (MUST)¶

SKIP — JUSTIFICATION: diagnostic-tooling change with no silent-revenue-loss failure mode. The ablation is operator-triggered and read-only w.r.t. trading; its failure degrades diagnosis, not revenue. The relevant failure surface (a probe error not surfacing) is handled by ADR-25 — every error path emits a structured INCONCLUSIVE_* verdict to Loki/Grafana (ADR-26/30), the operator's real channel. No P1 page is warranted.

3. Drift detection (MUST)¶

N/A — no feature / label / architecture / calibration change. This Story replays a pinned, content-addressed captured fold (the S07 warm pin) against a capacity grid; it does not alter any production model, its inputs, or its outputs. Pinning the fold is precisely what removes live-data drift between re-audits (diagnostic reproducibility), distinct from production drift detection (owned by the N010 Drift Store epics).

4. Staged rollout (MUST)¶

N/A — non-behavioural, operator-triggered tooling. No traffic, no predictions, no trading behaviour. The ablation runs only when an operator triggers the diagnostic DAG. The smoke filters (axes_subset / points_subset_per_axis, plan §5.6) only reduce the grid for fast dry-runs — they never touch a production path. There is nothing to shadow / canary / full-rollout. No canary crypto needed.

5. Rollback plan (MUST)¶

Mechanism	Description	Revert SLA
Don't trigger	The diagnostic DAG is operator-triggered, schedule=None (ADR-18). Not triggering it = zero effect on any production surface.	immediate
Git revert	Revert the PR commits on `main` — removes the `s42` module + DAG. No model/feature/config artefact involved.	< 5 min (next DAG-sync + harness build)

Required minima: - Config-only flip: the ablation is purely additive diagnostic code (a new s42 module + DAG) → nothing to flip off; not triggering it is the off state. The capacity HP grid resolves from ftf_config (ADR-90) and fails loud on a missing key (s42_io_hp_resolve_failed), never silently defaulting (ADR-25). - "Tested in shadow": exercised by the unit suite (tests/unit/test_s42_* — 32 fast tests: preconditions, stats primitives, _decide_s42 verdict matrix, group synthesis, ADR-0094 Inv 7 from_training_cache call_count==0 spy) + the test_s42_parity oracle + DAG-smoke import — equivalent verification for a non-behavioural tooling change.

6. Owner & DRI (MUST)¶

DRI: dococeven
Backup DRI: dococeven (solo-operator project; no separate backup at this time)
Decision authority: dococeven (revert = don't-trigger / git revert, no committee needed)
Sunset date: 2026-08-30 (90 days) — by then the Block 2 capacity verdict is recorded against Epic status B and the s42 ablation is either retained as a standing harness probe or removed.

Sign-off checklist (gate before PR merge)¶

§1 monitoring : verdict outcomes (s42_verdict, s42_group_verdict) + fail-loud guards (s42_io_*) defined + Loki-discoverable ; prediction/outcome N/A-justified (no behavioural change)
§2 alerting : SKIP-justified (no silent-revenue-loss failure mode ; every error path surfaces via INCONCLUSIVE_* verdict, ADR-25)
§3 drift : N/A-justified (replays a pinned captured fold ; no feature/label/architecture/calibration change)
§4 rollout : N/A-justified (non-behavioural, operator-triggered tooling ; smoke filters only shrink the grid)
§5 rollback : don't-trigger + git-revert paths documented (additive diagnostic code, no artefact pinning)
§6 DRI : dococeven named, sunset 2026-08-30
Dependency declarations (F15) : no NEW Python import introduced — H1 deptry gate green on this PR
Committee pr_review session reviewed this filled template (ADR-68 scope: src/commun/finetune/diagnostic/ harness tooling)
Story OP comment links the committee session id (pending pr_review)

MLOps readiness — CVN-N001-EI-S04 — LightGBM capacity ablation (Block 2)¶