Hotfix Dossier — Track 5 Production Failure (DataFrame coercion)¶
Date: 2026-04-28
PR: #751 (commit ee64ad33)
Issue: #750
Story: CVN-N001-EE-S01 (Track 5, OP wp#40 In testing)
Author: Dominique (operator) + Claude
Session type: pr_review (per ADR-68 — substantive code in src/training/)
Severity: CRITICAL (production FTF sweep failure)
1. Production failure observed¶
Operator triggered FTF sweep on 2026-04-28 12:25 UTC with config :
factor=label_smoothing,cleanlab
crypto_group=defi_top5
phase=manual
power_mode=standard
confirm_long_run=true
Airflow logs showed cascading failures :
[12:25:34] regime_trainer.py:927 ERROR - event=weighted_variant_failed variant=reweight crypto=OPUSDC
ValueError: 2 errors encountered:
Error: Type requirement mismatch.
Expected X_train:<class 'numpy.ndarray'>
got [...DataFrame with columns open, xgb_accel_amplitude_ratio_24_grp2, ...]
[12:25:34] ablation_runner.py:936 ERROR - event=training_variant_error variant=reweight crypto=OPUSDC fold=3 error=training_failed
2. Root cause analysis¶
apply_label_pipeline in src/training/labels/label_pipeline.py:556 invokes Hamilton :
dr = driver.Driver({}, this_module, adapter=base.SimplePythonGraphAdapter(base.DictResult()))
outputs = dr.execute(["final_X", "final_y", "final_weights"], inputs={
"raw_y_train": y_train,
"X_train": X_train,
...
})
The Hamilton nodes (e.g. cv_pred_probs, smoothed_y) declare type hints :
Hamilton's validate_inputs (in driver.py:608) raises ValueError when the runtime input type doesn't match the declared type hint.
Production trainer feeds pd.DataFrame for X_train and pd.Series for y_train (cf. cvntrade_XGBoost_trainer.py:144-145 datasets unpacking) — the upstream cvntrade_autonomous_orchestrator.py propagates pandas types through the cache layer.
3. Test gap¶
tests/unit/training/labels/test_label_pipeline.py:_make_imbalanced_dataset returns (np.ndarray, np.ndarray). The integration test in tests/integration/test_track5_label_smoothing.py reuses the same fixture pattern. No test covered DataFrame inputs.
This is the test-vs-production type drift class of bug. The test signal was clean (181/181 pass at merge time) but didn't exercise the production codepath.
4. Fix¶
Defensive coercion at apply_label_pipeline entry, BEFORE the Hamilton driver invocation :
import pandas as pd
if isinstance(X_train, pd.DataFrame):
X_train = X_train.to_numpy()
if isinstance(y_train, pd.Series):
y_train = y_train.to_numpy()
if base_sample_weights is not None and isinstance(base_sample_weights, pd.Series):
base_sample_weights = base_sample_weights.to_numpy()
- Conditional → no-op when input is already ndarray (existing tests unchanged)
- Pandas import in try/except → helper stays runnable in minimal envs without pandas (defensive, even if pandas is always present in this repo)
Plus a new test class TestDataFrameCoercion (4 cases) :
1. Identity off-path with DataFrame inputs (no Hamilton invocation, just the short-circuit)
2. Smoothing-only with DataFrame inputs (Hamilton runs, no cleanlab CV)
3. Full Hamilton + cleanlab filter mode with DataFrame inputs (cleanlab CV runs, suspect mask, filter-mode dataset shrinkage)
4. base_sample_weights as pd.Series with cleanlab reweight mode
5. Why not "fix the type hints to accept Union[ndarray, DataFrame]"¶
Considered. Rejected because : - Hamilton's strict validation is a feature, not a bug — it catches type drift at validate-time, before the function body runs - Loosening the type hints would make every Hamilton node accept either, but the downstream code (cleanlab, np.where, etc.) would still need defensive handling - Coercing once at entry is the minimal change ; the Hamilton dataflow stays strictly typed - Per ADR-25 spirit (no silent fallback) — coercion is explicit, observable in code review, regression-tested
6. Severity / blast radius¶
- CRITICAL for FTF sweep results — Track 5 gate decision blocked until fix merges
- NOT IMPACTING LIVE TRADING — Track 5 code is FTF-only, no production trade decisions go through
apply_label_pipelineoutside the sweep context - Bug existed since PR #734 merge (commit
77aa6389, ~6 hours ago) - The identity short-circuit baseline path (
label_smoothing=none × cleanlab=off) is unaffected — it returns early before any Hamilton call - No data corruption — the failed variants raise loudly and are recorded as
training_failedinfinetune_results; they don't pollute the analysis (they're filterable by status)
7. Verification¶
- 4 new TestDataFrameCoercion tests pass
- Pre-existing 177 tests still pass (no regression)
-
mkdocs build --strictSUCCESS -
black --line-length=120 src/training/labels/ tests/unit/training/labels/clean - CI all green on PR #751
- Operator re-triggers FTF sweep post-merge → 125+ clean rows in
finetune_results
8. Lessons / follow-up¶
- Test fixture parity : going forward, every new
apply_*helper that integrates with the trainer MUST have at least one test fixture that usespd.DataFrame+pd.Series(matching production). To be added to the MLOps readiness template (ADR-70) §6 sign-off checklist as a recommended item, OR enforced by a per-helper unit test convention. - Hamilton validation : the type-hint contract is load-bearing for catching this class of bug — don't loosen it. Coerce at entry instead.
- Production smoke before declaring victory : Track 5 was committee-approved and merged with green tests, but the production smoke (FTF sweep) wasn't part of the merge gate. Consider adding a "smoke variant" Story step that runs 1 crypto × 1 fold × 1 variant on the cluster as a pre-merge gate for substantive ML code (out of scope here, possible follow-up under CVN-N010 or process ADR amendment).
9. Questions for committee¶
- Is the entry-point coercion the right boundary, or should the Hamilton nodes accept
Union[ndarray, DataFrame]directly ? - Should we add a regression test fixture convention (always include a DataFrame variant) to the MLOps readiness template ?
- Should we add a "production smoke" gate (1×1×1 sweep) to the merge process for ML code touching the FTF surface ?
- Severity classification : do we treat this as a bug that needs a post-mortem entry in
documentation/OPERATIONS.md, or is the PR + test coverage enough ?
10. Acceptance criteria¶
- PR #751 mergeable (CI green)
- Committee
pr_review≥ ACCEPTED, no blockers - Operator re-triggers FTF sweep → 125+ rows succeed
- Track 5 gate decision can proceed (per F1 plan §6)
11. References¶
- Track 5 implementation : PR #734 commit
77aa6389 - Plan dossier :
documentation/reviews/2026-04-28-track5-label-smoothing-plan.md - Plan committee : session
8a202a18PASSED OK 8.7 - pr_review committee round 1 : session
989a6567REJECTED METHODOLOGY_FLAW (3 blockers fixed) → round 21bde4bc2PASSED OK 8.2 - ADR-25 (no silent fallback — basis of explicit coercion + tests)
- ADR-68 (this committee call)
- ADR-69 (Story discipline — wp#40 stays In testing)
- Issues : #712 (parent Story), #750 (this hotfix)