Skip to content

Plan dossier — CVN-N001-EI-S01 : Learning-curve instrumentation (Block 1)

Story: CVN-N001-EI-S01 — GH #1056 · OP wp#224 · Epic CVN-N001-EI (#1055, wp#223) Type: implementation plan (ADR-68). The approach was already approved by committee experiment_review 24745ff4 (PASSED) as Block 1 of the best_iter=1 diagnostic program — this dossier is the per-Story implementation plan that satisfies the G3 guardrail. Date: 2026-05-24

Problem

The training harness discards the per-round eval trajectory (lightgbm_dag.py:174 used log_evaluation(period=0) with no record_evaluation), so best_iteration is not explainable from Loki alone — the §11.1 observability gap from the best_iter=1 diagnostic study. Block 1 of the §12 plan: instrument the learning curves so every future best_iteration is self-explaining.

Approach (Hamilton-native, per operator)

  • New emit_learning_curve in nodes/log_emit.py — compact per-(valid_name, metric) summary (first / min / last, 1-indexed argmin/argmax, n_rounds, first→min drop). The argmin of val binary_logloss is the LGB best_iteration. No raw per-round array dump (ADR-35 INFO summary, ADR-37 Loki budget).
  • One first-class Hamilton node per modellgb_learning_curve / xgb_learning_curve / cb_learning_curve — consuming the trajectory the trainer now returns, emitting event=learning_curve, wired as an input to <model>_artifact so it always runs. NOT a side-effect buried in the trainer.
  • Capture sources: LGB record_evaluation on [val] only · XGB evals_result (train+val, already evaluated for early-stop) · CB get_evals_result() (CatBoostError-safe).

Files

  • src/training/harness/nodes/log_emit.pyemit_learning_curve
  • src/training/harness/dags/models/{lightgbm,xgboost,catboost}_dag.py — trajectory return + dedicated node + artifact wiring
  • tests/unit/training_harness/test_logging_contract.py — 3-model parity + summary-math unit tests

Risks & mitigations

  • Behavioural drift via early-stoppingmitigated: LGB valid_sets unchanged ([val] only); capture only observes what training already computes. best_iteration bit-identical. (Adding train to LGB valid_sets was rejected for exactly this reason.)
  • Loki volumemitigated: compact summary scalars, not 300-float arrays.
  • CatBoost API variancemitigated: get_evals_result() wrapped, falls back to {} (no crash, ADR-25).

Success criteria

  • event=learning_curve emitted by all 3 models with an identical field schema (parity contract).
  • 203 harness tests green; lint clean; non-behavioural (existing metrics unchanged).
  • MLOps readiness filed (ADR-70): documentation/stories/CVN-N001-EI-S01/mlops_readiness.md.