Plan dossier — CVN-N001-EI-S01 : Learning-curve instrumentation (Block 1)¶
Story: CVN-N001-EI-S01 — GH #1056 · OP wp#224 · Epic CVN-N001-EI (#1055, wp#223)
Type: implementation plan (ADR-68). The approach was already approved by committee experiment_review 24745ff4 (PASSED) as Block 1 of the best_iter=1 diagnostic program — this dossier is the per-Story implementation plan that satisfies the G3 guardrail.
Date: 2026-05-24
Problem¶
The training harness discards the per-round eval trajectory (lightgbm_dag.py:174 used log_evaluation(period=0) with no record_evaluation), so best_iteration is not explainable from Loki alone — the §11.1 observability gap from the best_iter=1 diagnostic study. Block 1 of the §12 plan: instrument the learning curves so every future best_iteration is self-explaining.
Approach (Hamilton-native, per operator)¶
- New
emit_learning_curveinnodes/log_emit.py— compact per-(valid_name, metric)summary (first / min / last, 1-indexedargmin/argmax,n_rounds, first→min drop). Theargminof val binary_logloss is the LGBbest_iteration. No raw per-round array dump (ADR-35 INFO summary, ADR-37 Loki budget). - One first-class Hamilton node per model —
lgb_learning_curve/xgb_learning_curve/cb_learning_curve— consuming the trajectory the trainer now returns, emittingevent=learning_curve, wired as an input to<model>_artifactso it always runs. NOT a side-effect buried in the trainer. - Capture sources: LGB
record_evaluationon[val]only · XGBevals_result(train+val, already evaluated for early-stop) · CBget_evals_result()(CatBoostError-safe).
Files¶
src/training/harness/nodes/log_emit.py—emit_learning_curvesrc/training/harness/dags/models/{lightgbm,xgboost,catboost}_dag.py— trajectory return + dedicated node + artifact wiringtests/unit/training_harness/test_logging_contract.py— 3-model parity + summary-math unit tests
Risks & mitigations¶
- Behavioural drift via early-stopping — mitigated: LGB
valid_setsunchanged ([val]only); capture only observes what training already computes.best_iterationbit-identical. (Addingtrainto LGB valid_sets was rejected for exactly this reason.) - Loki volume — mitigated: compact summary scalars, not 300-float arrays.
- CatBoost API variance — mitigated:
get_evals_result()wrapped, falls back to{}(no crash, ADR-25).
Success criteria¶
event=learning_curveemitted by all 3 models with an identical field schema (parity contract).- 203 harness tests green; lint clean; non-behavioural (existing metrics unchanged).
- MLOps readiness filed (ADR-70):
documentation/stories/CVN-N001-EI-S01/mlops_readiness.md.