Hotfix v2 Dossier — Track 5 Production Failure (XGBoost feature-name mismatch)¶
Date: 2026-04-28
PR: #754
Issue: #753
Story: CVN-N001-EE-S01 (Track 5, OP wp#40 In testing)
Author: Dominique (operator) + Claude
Session type: pr_review (per ADR-68 — substantive code in src/training/)
Severity: CRITICAL (production FTF sweep — second failure on the same surface in <24h)
Predecessor: Hotfix v1 dossier (PR #752 merged commit 7e1e0eeb)
1. What happened — chronology¶
| When | Event |
|---|---|
| 2026-04-28 12:25 UTC | Operator triggered FTF sweep — incident #1 : Hamilton validate_inputs raised on DataFrame inputs (~75% variants failed) |
| 2026-04-28 ~14:00 UTC | Hotfix v1 (PR #752) merged after committee pr_review PASSED — added defensive coercion DataFrame → ndarray |
| 2026-04-28 ~15:30 UTC | Operator re-triggered FTF sweep — incident #2 (this dossier) |
| 2026-04-28 ~16:00 UTC | Hotfix v2 (PR #754) opened |
2. Production failure observed (incident #2)¶
ValueError: data did not contain feature names, but the following fields are expected:
open, high, low, close, BBL_8_2_0, BBM_8_2_0, BBU_8_2_0, RSI_14, MACD_12_26_9, ...
Failure path : xgb.train(params, dtrain, evals=[(dtrain, "train"), (dval, "val")], ...) raised inside XGBoost's _validate_features.
3. Root cause analysis¶
After hotfix v1, apply_label_pipeline's coercion layer transformed X_train (DataFrame → ndarray, no feature names). But the trainer never touches X_val (still DataFrame, with feature names). The XGBoost call sequence becomes :
# src/training/XGBoost/cvntrade_XGBoost_trainer.py
X_train, y_train = datasets["train"] # DataFrame, Series
X_train, y_train, sw = apply_label_pipeline(X_train, y_train, ...) # → ndarray (post hotfix v1)
X_val, y_val = datasets["val"] # DataFrame, Series — UNTOUCHED
dtrain = xgb.DMatrix(X_train, label=y_train, weight=sw) # no feature_names
dval = xgb.DMatrix(X_val, label=y_val) # has feature_names
xgb.train(params, dtrain, evals=[(dtrain, "train"), (dval, "val")], ...) # ← BOOM
XGBoost's _validate_features cross-checks dtrain ↔ dval and raises when one has feature names and the other doesn't. The hotfix v1 defensive coercion thus broke an implicit, undocumented contract that the trainer assumed (both eval-set DMatrices either have feature names or neither does).
4. Why hotfix v1's tests didn't catch it¶
Hotfix v1 added TestDataFrameCoercion in tests/unit/training/labels/test_label_pipeline.py. Those tests exercised apply_label_pipeline in isolation : DataFrame in, ndarray out, no downstream XGBoost call. They satisfied the Hamilton contract but never replayed the full trainer codepath.
Methodology gap : unit tests of a transform are not enough — the integration with downstream consumers must be tested at the boundary.
5. Hotfix v2 — what changed¶
Preserve input type round-trip. The transform internally still coerces to ndarray (Hamilton requires it), but on output re-wraps to the original type with the original metadata.
Patch summary :
- apply_label_pipeline saves X_was_dataframe, X_columns, y_was_series, y_name, w_was_series, w_name BEFORE the coercion block
- A nested _restore_input_types(x_out, y_out, w_out) helper re-wraps to original type
- Helper is invoked in both return paths : identity short-circuit AND Hamilton path
- Net contract : apply_label_pipeline is a type-preserving transform (DataFrame in → DataFrame out with same column names ; ndarray in → ndarray out)
6. New tests — TestTrainerEndToEndDataFrame¶
Located in tests/integration/test_track5_label_smoothing.py (6 tests added).
The class replays the exact trainer codepath, not the transform in isolation :
def _trainer_xgb_train(self, X_train, y_train, X_val, y_val, sample_weights):
dtrain = xgb.DMatrix(X_train, label=y_train, weight=sample_weights)
dval = xgb.DMatrix(X_val, label=y_val)
xgb.train(params, dtrain, num_boost_round=5,
evals=[(dtrain, "train"), (dval, "val")], verbose_eval=False)
preds = booster.predict(dval)
assert preds.shape == (X_val.shape[0],)
assert (preds >= 0.0).all() and (preds <= 1.0).all()
Coverage :
1. 4 parametrized variants of label_smoothing × cleanlab matrix (baseline, mild × off, none × filter, mild × reweight) — all 4 trigger xgb.train end-to-end through the trainer pattern
2. test_ndarray_inputs_still_return_ndarray — reverse direction, no surprise type promotion
3. test_dataframe_columns_preserved_through_filter_mode — column metadata survives row-shrinking filter mode
Regression bar proven by temporary revert : 5/6 of these tests fail with the production error message when _restore_input_types is removed. Re-applied → 6/6 green.
7. Validation¶
pytest tests/unit/training/labels/ tests/integration/test_track5_label_smoothing.py tests/unit/test_ftf_guardrails.py→ 187 passedblack --line-length=120→ 1 file reformatted, all greenmkdocs build --strict→ green (only pre-existing INFO links)
8. Question for the committee¶
Is the hotfix v2 a complete fix and a sufficient regression bar for this class of bug — or are there sibling failure modes still latent in
apply_label_pipeline(e.g., dtype coercion, sparse matrix support, weight Series alignment, NaN handling) that should be added to the test class before merge ?Bonus : what is the right durable mechanism to prevent a third incident on this surface ? (e.g., trainer-side type contract assertion ; ADR change ; test fixture parity rule between unit and integration ; …)
9. Linked context¶
- ADR-25 (no silent fallback / fail-fast)
- ADR-61 (batch DAGs use Hamilton)
- ADR-68 (Expert Committee for substantive PR)
- ADR-69 (OpenProject as orchestrator) — wp#40 in
In testinguntil Phase 3 sweep produces results - OPERATIONS §17 (incident #1) + §17.2 (incident #2 — to be added post-merge)
- 6 forward-looking recos from previous committee (DataFrame fixture parity, production smoke gate, cache layer audit) — backlog