Skip to content

MLOps readiness — CVN-N001-EI-S04 — LightGBM capacity ablation (Block 2)

Story: CVN-N001-EI-S04 — GH #1059 · OP wp#227 · Epic CVN-N001-EI (wp#223) Owner: dococeven (DRI for production behaviour of this change) Filled on: 2026-06-01 Reviewed by committee: plan_review session 38f920fa PASSED (Mistral 8/7/8/8/8 unanimous ; Gemini cap → CONSOLIDATOR_ERROR, judged on Mistral per operator policy). PR pr_review pending — src/commun/finetune/diagnostic/ tooling scope.

Nature of this change (read first): this Story is diagnostic-harness tooling, not a production-ML change. It adds the Block 2 LightGBM capacity ablation (s42) — a read-only diagnostic that replays the S07-pinned captured fold against a grid of LGB capacity hyperparameters (primary axis num_leaves + exploratory learning_rate / min_child_samples / lambda_l2 / min_gain_to_split, one-at-a-time) to decide Epic status B (is the prod LGB under- or over-capacity for the signal?). It produces verdicts about a trained model (B_CAPACITY_OK / B_DEFAULTS_OVERFIT / B_SYSTEMATIC / B_SYSTEMATIC_OVERFIT / B_PER_ASSET / INCONCLUSIVE_*), never ships one. It changes nothing in production predictions, calibration, thresholds, features, labels, or trading: the harness audits models, it does not train production artefacts. It reuses the S07 warm-pin path (s41_io._pin_load/_pin_store) and the S03 Q1.g harness feature_name fix by inheritance. The behavioural sections below (drift, staged rollout, expectancy, canary) are therefore N/A — justified; the Story is itself a diagnostic-tooling addition.


1. Production monitoring (MUST)

Metric Type Source Dashboard Threshold (warn / crit) Owner
s42_verdict.outcome counter (per outcome) Loki {namespace="cvntrade"} \|= "event=s42_verdict" diagnostic-harness panel (TBD link) warn on any INCONCLUSIVE_* outcome (INCONCLUSIVE_TOOLING/_UNDERPOWERED/_UNCOVERED) — diagnostic could not conclude dococeven
s42_group_verdict.outcome counter (per outcome) Loki \|= "event=s42_group_verdict" same warn on INCONCLUSIVE_GROUP_COVERAGE (group synthesis under coverage floor) dococeven
s42_io_parquet_load_failed counter Loki \|= "event=s42_io_parquet_load_failed" (captured-fold load guard) same warn on any emission (severity=error — capture/pin missing) dococeven
s42_io_hp_resolve_failed counter Loki \|= "event=s42_io_hp_resolve_failed" (ADR-90 HP resolver fail-loud) same warn on any emission (ftf_config key missing) dococeven
s42_io_sha_recompute_failed / s42_s22a1_raised counter Loki \|= "event=s42_io_sha_recompute_failed" / \|= "event=s42_s22a1_raised" same n/a (diagnostic provenance / probe-exception capture → INCONCLUSIVE_TOOLING) dococeven

Required minima — assessed for a diagnostic-tooling Story: - Prediction-rate metric: N/A — adds no prediction. Production prediction-rate metrics are unchanged by this Story. - Outcome metric: N/A — no trading-outcome change. The harness produces verdicts about model capacity, not trades. - Health metric: ✅ — every error path is legible in Loki (s42_verdict/s42_group_verdict with explicit outcome, plus the s42_io_* fail-loud guards) instead of an opaque Airflow traceback. ADR-25: every error path yields an INCONCLUSIVE_* verdict, never a raised exception to the operator UI. - Tagged per ADR-30 (crypto, fold_id, outcome). No FTF variant applies (diagnostic, not an A/B factor).

2. Alerting & runbooks (MUST)

SKIP — JUSTIFICATION: diagnostic-tooling change with no silent-revenue-loss failure mode. The ablation is operator-triggered and read-only w.r.t. trading; its failure degrades diagnosis, not revenue. The relevant failure surface (a probe error not surfacing) is handled by ADR-25 — every error path emits a structured INCONCLUSIVE_* verdict to Loki/Grafana (ADR-26/30), the operator's real channel. No P1 page is warranted.

3. Drift detection (MUST)

N/A — no feature / label / architecture / calibration change. This Story replays a pinned, content-addressed captured fold (the S07 warm pin) against a capacity grid; it does not alter any production model, its inputs, or its outputs. Pinning the fold is precisely what removes live-data drift between re-audits (diagnostic reproducibility), distinct from production drift detection (owned by the N010 Drift Store epics).

4. Staged rollout (MUST)

N/A — non-behavioural, operator-triggered tooling. No traffic, no predictions, no trading behaviour. The ablation runs only when an operator triggers the diagnostic DAG. The smoke filters (axes_subset / points_subset_per_axis, plan §5.6) only reduce the grid for fast dry-runs — they never touch a production path. There is nothing to shadow / canary / full-rollout. No canary crypto needed.

5. Rollback plan (MUST)

Mechanism Description Revert SLA
Don't trigger The diagnostic DAG is operator-triggered, schedule=None (ADR-18). Not triggering it = zero effect on any production surface. immediate
Git revert Revert the PR commits on main — removes the s42 module + DAG. No model/feature/config artefact involved. < 5 min (next DAG-sync + harness build)

Required minima: - Config-only flip: the ablation is purely additive diagnostic code (a new s42 module + DAG) → nothing to flip off; not triggering it is the off state. The capacity HP grid resolves from ftf_config (ADR-90) and fails loud on a missing key (s42_io_hp_resolve_failed), never silently defaulting (ADR-25). - "Tested in shadow": exercised by the unit suite (tests/unit/test_s42_* — 32 fast tests: preconditions, stats primitives, _decide_s42 verdict matrix, group synthesis, ADR-0094 Inv 7 from_training_cache call_count==0 spy) + the test_s42_parity oracle + DAG-smoke import — equivalent verification for a non-behavioural tooling change.

6. Owner & DRI (MUST)

  • DRI: dococeven
  • Backup DRI: dococeven (solo-operator project; no separate backup at this time)
  • Decision authority: dococeven (revert = don't-trigger / git revert, no committee needed)
  • Sunset date: 2026-08-30 (90 days) — by then the Block 2 capacity verdict is recorded against Epic status B and the s42 ablation is either retained as a standing harness probe or removed.

Sign-off checklist (gate before PR merge)

  • §1 monitoring : verdict outcomes (s42_verdict, s42_group_verdict) + fail-loud guards (s42_io_*) defined + Loki-discoverable ; prediction/outcome N/A-justified (no behavioural change)
  • §2 alerting : SKIP-justified (no silent-revenue-loss failure mode ; every error path surfaces via INCONCLUSIVE_* verdict, ADR-25)
  • §3 drift : N/A-justified (replays a pinned captured fold ; no feature/label/architecture/calibration change)
  • §4 rollout : N/A-justified (non-behavioural, operator-triggered tooling ; smoke filters only shrink the grid)
  • §5 rollback : don't-trigger + git-revert paths documented (additive diagnostic code, no artefact pinning)
  • §6 DRI : dococeven named, sunset 2026-08-30
  • Dependency declarations (F15) : no NEW Python import introduced — H1 deptry gate green on this PR
  • Committee pr_review session reviewed this filled template (ADR-68 scope: src/commun/finetune/diagnostic/ harness tooling)
  • Story OP comment links the committee session id (pending pr_review)