Skip to content

Hotfix Dossier — Track 5 Production Failure (DataFrame coercion)

Date: 2026-04-28 PR: #751 (commit ee64ad33) Issue: #750 Story: CVN-N001-EE-S01 (Track 5, OP wp#40 In testing) Author: Dominique (operator) + Claude Session type: pr_review (per ADR-68 — substantive code in src/training/) Severity: CRITICAL (production FTF sweep failure)


1. Production failure observed

Operator triggered FTF sweep on 2026-04-28 12:25 UTC with config :

factor=label_smoothing,cleanlab
crypto_group=defi_top5
phase=manual
power_mode=standard
confirm_long_run=true

Airflow logs showed cascading failures :

[12:25:34] regime_trainer.py:927 ERROR - event=weighted_variant_failed variant=reweight crypto=OPUSDC
ValueError: 2 errors encountered:
  Error: Type requirement mismatch.
  Expected X_train:<class 'numpy.ndarray'>
  got [...DataFrame with columns open, xgb_accel_amplitude_ratio_24_grp2, ...]
[12:25:34] ablation_runner.py:936 ERROR - event=training_variant_error variant=reweight crypto=OPUSDC fold=3 error=training_failed

2. Root cause analysis

apply_label_pipeline in src/training/labels/label_pipeline.py:556 invokes Hamilton :

dr = driver.Driver({}, this_module, adapter=base.SimplePythonGraphAdapter(base.DictResult()))
outputs = dr.execute(["final_X", "final_y", "final_weights"], inputs={
    "raw_y_train": y_train,
    "X_train": X_train,
    ...
})

The Hamilton nodes (e.g. cv_pred_probs, smoothed_y) declare type hints :

def cv_pred_probs(X_train: np.ndarray, smoothed_y: np.ndarray, ...) -> Optional[np.ndarray]: ...

Hamilton's validate_inputs (in driver.py:608) raises ValueError when the runtime input type doesn't match the declared type hint.

Production trainer feeds pd.DataFrame for X_train and pd.Series for y_train (cf. cvntrade_XGBoost_trainer.py:144-145 datasets unpacking) — the upstream cvntrade_autonomous_orchestrator.py propagates pandas types through the cache layer.

3. Test gap

tests/unit/training/labels/test_label_pipeline.py:_make_imbalanced_dataset returns (np.ndarray, np.ndarray). The integration test in tests/integration/test_track5_label_smoothing.py reuses the same fixture pattern. No test covered DataFrame inputs.

This is the test-vs-production type drift class of bug. The test signal was clean (181/181 pass at merge time) but didn't exercise the production codepath.

4. Fix

Defensive coercion at apply_label_pipeline entry, BEFORE the Hamilton driver invocation :

import pandas as pd
if isinstance(X_train, pd.DataFrame):
    X_train = X_train.to_numpy()
if isinstance(y_train, pd.Series):
    y_train = y_train.to_numpy()
if base_sample_weights is not None and isinstance(base_sample_weights, pd.Series):
    base_sample_weights = base_sample_weights.to_numpy()
  • Conditional → no-op when input is already ndarray (existing tests unchanged)
  • Pandas import in try/except → helper stays runnable in minimal envs without pandas (defensive, even if pandas is always present in this repo)

Plus a new test class TestDataFrameCoercion (4 cases) : 1. Identity off-path with DataFrame inputs (no Hamilton invocation, just the short-circuit) 2. Smoothing-only with DataFrame inputs (Hamilton runs, no cleanlab CV) 3. Full Hamilton + cleanlab filter mode with DataFrame inputs (cleanlab CV runs, suspect mask, filter-mode dataset shrinkage) 4. base_sample_weights as pd.Series with cleanlab reweight mode

5. Why not "fix the type hints to accept Union[ndarray, DataFrame]"

Considered. Rejected because : - Hamilton's strict validation is a feature, not a bug — it catches type drift at validate-time, before the function body runs - Loosening the type hints would make every Hamilton node accept either, but the downstream code (cleanlab, np.where, etc.) would still need defensive handling - Coercing once at entry is the minimal change ; the Hamilton dataflow stays strictly typed - Per ADR-25 spirit (no silent fallback) — coercion is explicit, observable in code review, regression-tested

6. Severity / blast radius

  • CRITICAL for FTF sweep results — Track 5 gate decision blocked until fix merges
  • NOT IMPACTING LIVE TRADING — Track 5 code is FTF-only, no production trade decisions go through apply_label_pipeline outside the sweep context
  • Bug existed since PR #734 merge (commit 77aa6389, ~6 hours ago)
  • The identity short-circuit baseline path (label_smoothing=none × cleanlab=off) is unaffected — it returns early before any Hamilton call
  • No data corruption — the failed variants raise loudly and are recorded as training_failed in finetune_results ; they don't pollute the analysis (they're filterable by status)

7. Verification

  • 4 new TestDataFrameCoercion tests pass
  • Pre-existing 177 tests still pass (no regression)
  • mkdocs build --strict SUCCESS
  • black --line-length=120 src/training/labels/ tests/unit/training/labels/ clean
  • CI all green on PR #751
  • Operator re-triggers FTF sweep post-merge → 125+ clean rows in finetune_results

8. Lessons / follow-up

  • Test fixture parity : going forward, every new apply_* helper that integrates with the trainer MUST have at least one test fixture that uses pd.DataFrame + pd.Series (matching production). To be added to the MLOps readiness template (ADR-70) §6 sign-off checklist as a recommended item, OR enforced by a per-helper unit test convention.
  • Hamilton validation : the type-hint contract is load-bearing for catching this class of bug — don't loosen it. Coerce at entry instead.
  • Production smoke before declaring victory : Track 5 was committee-approved and merged with green tests, but the production smoke (FTF sweep) wasn't part of the merge gate. Consider adding a "smoke variant" Story step that runs 1 crypto × 1 fold × 1 variant on the cluster as a pre-merge gate for substantive ML code (out of scope here, possible follow-up under CVN-N010 or process ADR amendment).

9. Questions for committee

  1. Is the entry-point coercion the right boundary, or should the Hamilton nodes accept Union[ndarray, DataFrame] directly ?
  2. Should we add a regression test fixture convention (always include a DataFrame variant) to the MLOps readiness template ?
  3. Should we add a "production smoke" gate (1×1×1 sweep) to the merge process for ML code touching the FTF surface ?
  4. Severity classification : do we treat this as a bug that needs a post-mortem entry in documentation/OPERATIONS.md, or is the PR + test coverage enough ?

10. Acceptance criteria

  • PR #751 mergeable (CI green)
  • Committee pr_review ≥ ACCEPTED, no blockers
  • Operator re-triggers FTF sweep → 125+ rows succeed
  • Track 5 gate decision can proceed (per F1 plan §6)

11. References

  • Track 5 implementation : PR #734 commit 77aa6389
  • Plan dossier : documentation/reviews/2026-04-28-track5-label-smoothing-plan.md
  • Plan committee : session 8a202a18 PASSED OK 8.7
  • pr_review committee round 1 : session 989a6567 REJECTED METHODOLOGY_FLAW (3 blockers fixed) → round 2 1bde4bc2 PASSED OK 8.2
  • ADR-25 (no silent fallback — basis of explicit coercion + tests)
  • ADR-68 (this committee call)
  • ADR-69 (Story discipline — wp#40 stays In testing)
  • Issues : #712 (parent Story), #750 (this hotfix)