MLOps readiness — CVN-N001-EI-S04 — LightGBM capacity ablation (Block 2)¶
Story: CVN-N001-EI-S04 — GH #1059 · OP wp#227 · Epic CVN-N001-EI (wp#223)
Owner: dococeven (DRI for production behaviour of this change)
Filled on: 2026-06-01
Reviewed by committee: plan_review session 38f920fa PASSED (Mistral 8/7/8/8/8 unanimous ; Gemini cap → CONSOLIDATOR_ERROR, judged on Mistral per operator policy). PR pr_review pending — src/commun/finetune/diagnostic/ tooling scope.
Nature of this change (read first): this Story is diagnostic-harness tooling, not a production-ML change. It adds the Block 2 LightGBM capacity ablation (
s42) — a read-only diagnostic that replays the S07-pinned captured fold against a grid of LGB capacity hyperparameters (primary axisnum_leaves+ exploratorylearning_rate/min_child_samples/lambda_l2/min_gain_to_split, one-at-a-time) to decide Epic status B (is the prod LGB under- or over-capacity for the signal?). It produces verdicts about a trained model (B_CAPACITY_OK/B_DEFAULTS_OVERFIT/B_SYSTEMATIC/B_SYSTEMATIC_OVERFIT/B_PER_ASSET/INCONCLUSIVE_*), never ships one. It changes nothing in production predictions, calibration, thresholds, features, labels, or trading: the harness audits models, it does not train production artefacts. It reuses the S07 warm-pin path (s41_io._pin_load/_pin_store) and the S03 Q1.g harnessfeature_namefix by inheritance. The behavioural sections below (drift, staged rollout, expectancy, canary) are therefore N/A — justified; the Story is itself a diagnostic-tooling addition.
1. Production monitoring (MUST)¶
| Metric | Type | Source | Dashboard | Threshold (warn / crit) | Owner |
|---|---|---|---|---|---|
s42_verdict.outcome |
counter (per outcome) | Loki {namespace="cvntrade"} \|= "event=s42_verdict" |
diagnostic-harness panel (TBD link) | warn on any INCONCLUSIVE_* outcome (INCONCLUSIVE_TOOLING/_UNDERPOWERED/_UNCOVERED) — diagnostic could not conclude |
dococeven |
s42_group_verdict.outcome |
counter (per outcome) | Loki \|= "event=s42_group_verdict" |
same | warn on INCONCLUSIVE_GROUP_COVERAGE (group synthesis under coverage floor) |
dococeven |
s42_io_parquet_load_failed |
counter | Loki \|= "event=s42_io_parquet_load_failed" (captured-fold load guard) |
same | warn on any emission (severity=error — capture/pin missing) |
dococeven |
s42_io_hp_resolve_failed |
counter | Loki \|= "event=s42_io_hp_resolve_failed" (ADR-90 HP resolver fail-loud) |
same | warn on any emission (ftf_config key missing) | dococeven |
s42_io_sha_recompute_failed / s42_s22a1_raised |
counter | Loki \|= "event=s42_io_sha_recompute_failed" / \|= "event=s42_s22a1_raised" |
same | n/a (diagnostic provenance / probe-exception capture → INCONCLUSIVE_TOOLING) |
dococeven |
Required minima — assessed for a diagnostic-tooling Story:
- Prediction-rate metric: N/A — adds no prediction. Production prediction-rate metrics are unchanged by this Story.
- Outcome metric: N/A — no trading-outcome change. The harness produces verdicts about model capacity, not trades.
- Health metric: ✅ — every error path is legible in Loki (s42_verdict/s42_group_verdict with explicit outcome, plus the s42_io_* fail-loud guards) instead of an opaque Airflow traceback. ADR-25: every error path yields an INCONCLUSIVE_* verdict, never a raised exception to the operator UI.
- Tagged per ADR-30 (crypto, fold_id, outcome). No FTF variant applies (diagnostic, not an A/B factor).
2. Alerting & runbooks (MUST)¶
SKIP — JUSTIFICATION: diagnostic-tooling change with no silent-revenue-loss failure mode. The ablation is operator-triggered and read-only w.r.t. trading; its failure degrades diagnosis, not revenue. The relevant failure surface (a probe error not surfacing) is handled by ADR-25 — every error path emits a structured INCONCLUSIVE_* verdict to Loki/Grafana (ADR-26/30), the operator's real channel. No P1 page is warranted.
3. Drift detection (MUST)¶
N/A — no feature / label / architecture / calibration change. This Story replays a pinned, content-addressed captured fold (the S07 warm pin) against a capacity grid; it does not alter any production model, its inputs, or its outputs. Pinning the fold is precisely what removes live-data drift between re-audits (diagnostic reproducibility), distinct from production drift detection (owned by the N010 Drift Store epics).
4. Staged rollout (MUST)¶
N/A — non-behavioural, operator-triggered tooling. No traffic, no predictions, no trading behaviour. The ablation runs only when an operator triggers the diagnostic DAG. The smoke filters (axes_subset / points_subset_per_axis, plan §5.6) only reduce the grid for fast dry-runs — they never touch a production path. There is nothing to shadow / canary / full-rollout. No canary crypto needed.
5. Rollback plan (MUST)¶
| Mechanism | Description | Revert SLA |
|---|---|---|
| Don't trigger | The diagnostic DAG is operator-triggered, schedule=None (ADR-18). Not triggering it = zero effect on any production surface. | immediate |
| Git revert | Revert the PR commits on main — removes the s42 module + DAG. No model/feature/config artefact involved. |
< 5 min (next DAG-sync + harness build) |
Required minima:
- Config-only flip: the ablation is purely additive diagnostic code (a new s42 module + DAG) → nothing to flip off; not triggering it is the off state. The capacity HP grid resolves from ftf_config (ADR-90) and fails loud on a missing key (s42_io_hp_resolve_failed), never silently defaulting (ADR-25).
- "Tested in shadow": exercised by the unit suite (tests/unit/test_s42_* — 32 fast tests: preconditions, stats primitives, _decide_s42 verdict matrix, group synthesis, ADR-0094 Inv 7 from_training_cache call_count==0 spy) + the test_s42_parity oracle + DAG-smoke import — equivalent verification for a non-behavioural tooling change.
6. Owner & DRI (MUST)¶
- DRI:
dococeven - Backup DRI:
dococeven(solo-operator project; no separate backup at this time) - Decision authority:
dococeven(revert = don't-trigger / git revert, no committee needed) - Sunset date: 2026-08-30 (90 days) — by then the Block 2 capacity verdict is recorded against Epic status B and the
s42ablation is either retained as a standing harness probe or removed.
Sign-off checklist (gate before PR merge)¶
- §1 monitoring : verdict outcomes (
s42_verdict,s42_group_verdict) + fail-loud guards (s42_io_*) defined + Loki-discoverable ; prediction/outcome N/A-justified (no behavioural change) - §2 alerting : SKIP-justified (no silent-revenue-loss failure mode ; every error path surfaces via
INCONCLUSIVE_*verdict, ADR-25) - §3 drift : N/A-justified (replays a pinned captured fold ; no feature/label/architecture/calibration change)
- §4 rollout : N/A-justified (non-behavioural, operator-triggered tooling ; smoke filters only shrink the grid)
- §5 rollback : don't-trigger + git-revert paths documented (additive diagnostic code, no artefact pinning)
- §6 DRI :
dococevennamed, sunset 2026-08-30 - Dependency declarations (F15) : no NEW Python import introduced — H1
deptrygate green on this PR - Committee
pr_reviewsession reviewed this filled template (ADR-68 scope:src/commun/finetune/diagnostic/harness tooling) - Story OP comment links the committee session id (pending pr_review)