Plan dossier — CVN-N001-EE-S19 : Harness over-trade fix¶

Story : CVN-N001-EE-S19 (OP wp#165, GH #940) Parent epic : CVN-N001-EE — F1_buy boost (10-track plan) Predecessor : CVN-N001-EE-S18 (closed 2026-05-14, OP wp#154) Author : Operator + Claude Date : 2026-05-14 Status : v2 — committee plan_review PASSED_WITH_REVISIONS 2026-05-14 (session 1f4335a2, OP Meeting #135, strong consensus, 0 blockers, 8 recommendations integrated below — see §13)

1. Context¶

S18 (closed 2026-05-14) ran a 4-step diagnostic chain on the AAVEUSDC fold=3 cell to localise the post-#891 harness regression. After Steps 0-4 ruled out hypotheses H1-H7 of the parent dossier §5.1 :

Step	Verdict	What it ruled out
Step 0	PASS — replay reproduces canonical f1=0.3520	reproducibility OK
Step 1	(capture, no verdict)	full Optuna trial trace serialized to parquet
Step 2	(analysis dossier)	side-by-side legacy vs harness code diff
Step 3	REFUTED on H2 (`valid_sets` composition)	both `[train,val]` and `[val]` produce `best_iter=1`
Step 4	NO_DIVERGENCE on F1-F6	data is clean (label, features, drift, iter-1 probe all PASS)

Phase A logs (chained DAG run 2026-05-14 14:44) revealed the actual mechanism :

event=training_complete model_type=lightgbm best_iteration=1 training_time_sec=2.465
  theta_picked=0.2 f1_buy_val=0.352 auc_buy_val=0.6461 rate_buy_val=0.4611
event=signal_funnel raw_buy_signals=1210 final_trades=251 primary_killer=concurrency
event=weighted_variant_evaluated sortino=-9.512 n_trades=251 return=-91.35%

The model is NOT broken at the booster level (AUC 0.65, calibration acceptable) — it's broken at the post-training θ selection : scale_pos_weight=4.71 (auto-injected by class_balance.py) inflates positive-class probas, the harness θ-sweep over [0.05, 0.95] finds θ=0.2 because that's where f1_val maxes on the inflated distribution, and the model emits BUY 46 % of the time on val → catastrophic backtest.

The S18 Step 5 dossier (committee experiment_review PASSED_WITH_REVISIONS) re-scoped this as H8 = scale_pos_weight auto-injection × wide θ-sweep range coupling. H8 was not in the parent dossier's H1-H7 enumeration — gap in the original analysis.

The committee also validated 3 latent bugs found by operator code audit (§6.bis of Step 5 dossier) : - Bug 1 — class_balance.py:55-62 fails-fast on n_pos == 0 only, NOT n_neg == 0 → silent scale_pos_weight=0 on degenerate splits - Bug 2 — theta_sweep.py:59 + eval_metrics.py:69 missing labels=[0, 1] → wrong f1 on mono-class splits - Bug 3 — adapters/lgb.py:42-43 strips DataFrame column names via .to_numpy() → silently wrong predictions if column order drifts. LGB-only (XGB + CB safe).

S19 is the remediation Story.

2. Goals¶

Restore reasonable trading rate on the LGB harness path : rate_buy_val ≤ 0.20 (from current 0.46) on the AAVEUSDC fold=3 cell, with no degradation > 5 % on f1_buy_val vs the post-S17 canonical reference (0.3520485).
Backtest sortino > 0 on AAVEUSDC fold=3 + cross-fold validation cells.
Fail-fast on degenerate training splits (Bug 1) — no more silent scale_pos_weight=0.
Correct f1 measurement on mono-class splits (Bug 2) — Optuna no longer optimises on broken signal.
Preserve DataFrame column order at LGB inference (Bug 3) — eliminate the silent feature-scrambling risk in backtest / live inference.
Per-model θ-sweep range configurable via the existing ADR-90 PG-keys mechanism — no hardcoded constants.

3. Non-goals¶

Disabling scale_pos_weight globally (S18 committee Option A REJECTED — insufficient evidence on XGB/CB regression risk).
Adding scale_pos_weight to Optuna search (S18 committee Option D REJECTED — same f1-on-val attractor, would re-converge to the same over-trading optimum).
Re-designing the θ-sweep to optimise sortino instead of f1_val (Option E — major scope, future Story).
Touching XGB or CB adapters / θ-sweep (the regression is LGB-specific per the S18 evidence ; XGB has its own f1=0.089 issue covered in a separate Story scope).
Modifying the FTF data prep, labeling, or feature pipeline (Step 4 confirmed the data is clean).
Touching the autonomous trainer entry point (the regression is in the harness post-training nodes).

4. Architecture¶

4.1 Two-PR split (committee directive)¶

The committee verdict (Step 5 dossier §9) explicitly rejected Option F (bundling H8 + bug fixes in one PR) for clean revert envelope + independent CR cycles. S19 ships as 2 PRs :

PR	Scope	Surface	Behavioural change
S19 main	H8 fix (LGB θ range + over-trade guard) + Bug 2 (BOTH files)	5 files : `theta_sweep.py`, `eval_metrics.py`, `lightgbm_dag.py`, `hyperparams.py` registration, PG seed	Yes — alters the θ pick on LGB
S19-hardening	3 defensive fixes : Bug 1 (`class_balance.py`) + Bug 3 (`adapters/lgb.py`) + bool extension to `hyperparams.py`	3 files : `class_balance.py`, `adapters/lgb.py`, `hyperparams.py` (bool branch + `_PARAM_TYPES` entry)	No on healthy paths ; defensive fail-fast on degenerate paths

CR PR #941 r3 BLOCKER #1 resolved : the v3 split had Bug 2 (theta_sweep.py:59 + eval_metrics.py:69) in the hardening PR, but theta_sweep.py is also modified by the main PR (θ range + guard). That contradicted the "independently mergeable" claim and would have produced a merge conflict regardless of merge order. Resolution : Bug 2 (both files) MOVED to S19 main, where theta_sweep.py is touched anyway and eval_metrics.py shares the same precision_recall_fscore_support signature. S19-hardening becomes strictly orthogonal to S19 main on the file surface.

The 2 PRs are now truly independently mergeable — zero file overlap. Recommended merge order remains : hardening first (defensive, low risk), then main (behavioural, higher review cost). The hardening PR also lands the bool extension to hyperparams.py which the main PR depends on for FEATURE_ORDER_STRICT (per CR r3 reco #5 — must NOT use the temporary os.environ.get bridge in main).

Concern	Pre-r3 (v3)	Post-r3 (v4)
`theta_sweep.py`	Touched by BOTH (Bug 2 + θ range) — merge conflict	Touched ONLY by main
`eval_metrics.py`	Touched by hardening (Bug 2)	Touched ONLY by main (bundled with `theta_sweep.py` Bug 2)
`class_balance.py`	Touched ONLY by hardening (Bug 1)	Unchanged — only by hardening
`adapters/lgb.py`	Touched ONLY by hardening (Bug 3)	Unchanged — only by hardening
`hyperparams.py`	Touched by main (param registration)	Touched by BOTH but on DIFFERENT lines — main adds new keys to `_PARAM_TYPES`, hardening adds bool branch + `FEATURE_ORDER_STRICT` entry. Merge order : hardening first to avoid conflict in `_PARAM_TYPES` dict.
`lightgbm_dag.py`	Touched by main only	Touched by main only

4.2 S19 main PR — H8 fix design¶

4.2.1 Per-model θ-sweep range via ADR-90 keys

The current theta_sweep.pick_threshold_on_val() hard-codes the candidates np.linspace(0.05, 0.95, 19) + [0.5]. S19 main adds two kwargs theta_min + theta_max :

# src/training/harness/nodes/theta_sweep.py
def pick_threshold_on_val(
    y_val: np.ndarray,
    p_buy_val: np.ndarray,
    *,
    theta_min: float = 0.05,
    theta_max: float = 0.95,
) -> ThetaPick:
    """..."""
    # CR PR #941 r2 reco #1 — when (theta_min, theta_max) is restricted (e.g.
    # [0.30, 0.40] for the LGB tightening), the canonical [0.5] anchor MUST
    # NOT be appended unconditionally — otherwise the picker can return θ=0.5
    # outside the configured bounds, contradicting the AC.
    if not (0.0 < theta_min < theta_max < 1.0):
        raise ValueError(
            f"pick_threshold_on_val: invalid bounds — "
            f"requires 0 < theta_min ({theta_min}) < theta_max ({theta_max}) < 1"
        )
    candidates = np.linspace(theta_min, theta_max, 19)
    if theta_min <= 0.5 <= theta_max:
        candidates = np.concatenate([candidates, [0.5]])
    candidates = np.unique(candidates)
    # rest unchanged

The caller (lightgbm_dag.py) resolves the LGB bounds via the ADR-90 resolver :

# src/training/harness/dags/models/lightgbm_dag.py
from commun.finetune.hyperparams import resolve

theta_min = resolve("LGB", tf, "THETA_MIN", fallback=0.30)  # legacy lgbm_config.py:117
theta_max = resolve("LGB", tf, "THETA_MAX", fallback=0.40)
lgb_theta_pick = pick_threshold_on_val(y_val, p_buy_val, theta_min=theta_min, theta_max=theta_max)

PG seeding (Console UI, NOT git per ADR-59) : - CVN_HPO_LGB_5M_THETA_MIN = 0.30 - CVN_HPO_LGB_5M_THETA_MAX = 0.40 - (and per-timeframe entries if needed)

theta_sweep.py stays generic ; only LGB callers tighten the range. CB / future model can keep the wide default by not setting the kwargs.

4.2.2 Over-trade guard (committee enhancement)

Inside theta_sweep.pick_threshold_on_val(), after the best θ is computed, calculate the resulting rate_buy_val and emit a structured warning if it exceeds a threshold :

# src/training/harness/nodes/theta_sweep.py — additional logic
def pick_threshold_on_val(
    y_val,
    p_buy_val,
    *,
    theta_min=0.05,
    theta_max=0.95,
    rate_buy_warn_threshold: float = 0.20,
    rate_buy_fail_threshold: float = 0.25,
) -> ThetaPick:
    # ... pick best_t as before ...
    rate_buy_val = float(((p_buy_val >= best_t).astype(int) == 1).mean())
    if rate_buy_val > rate_buy_warn_threshold:
        log_event(
            logger,
            "theta_overtrade_warning",
            theta_picked=best_t,
            rate_buy_val=rate_buy_val,
            rate_buy_warn_threshold=rate_buy_warn_threshold,
            rate_buy_fail_threshold=rate_buy_fail_threshold,
            f1_buy=best_f1,
        )
    if rate_buy_val > rate_buy_fail_threshold:
        raise RuntimeError(
            f"theta_sweep over-trade guard fired : rate_buy_val={rate_buy_val:.3f} > "
            f"fail_threshold={rate_buy_fail_threshold:.3f} (theta={best_t})"
        )
    return ThetaPick(threshold=best_t, f1_buy=best_f1, n_candidates=len(candidates))

Both thresholds are PG-mandatory (no optional / fallback=None semantics — the existing hyperparams.resolve API treats fallback=None as "raise if unset", which is incompatible with an opt-in unset state ; CR PR #941 r2 reco #3) : - CVN_HPO_LGB_5M_RATE_BUY_WARN_THRESHOLD = 0.20 (committee reco #2 — initial heuristic, see also §4.2.2 justification) - CVN_HPO_LGB_5M_RATE_BUY_FAIL_THRESHOLD = 0.25 (committee reco #5 — 5 % above warn, enables fail-fast on extreme over-trade)

The behaviour band is : - rate_buy_val ≤ 0.20 → run proceeds, no log - 0.20 < rate_buy_val ≤ 0.25 → run proceeds, event=theta_overtrade_warning emitted - rate_buy_val > 0.25 → run fails with RuntimeError

To restore warn-only behaviour temporarily (e.g. for a debugging window), the operator sets CVN_HPO_LGB_5M_OVERTRADE_GUARD_MODE = warn_only via Console UI (CR r3 reco #7 — replaces the previous FAIL=1.0 magic-number escape with an explicit semantic flag, aligning with ADR-90 patterns). The two accepted values :

warn_only — only the event=theta_overtrade_warning is emitted ; the RATE_BUY_FAIL_THRESHOLD is ignored, no RuntimeError is ever raised
fail (default) — full behaviour as described above ; warn at 0.20, fail at 0.25

The OVERTRADE_GUARD_MODE is a string-typed PG key, parsed by hyperparams.resolve with an enum guard :

# in pick_threshold_on_val(...)
mode = resolve("LGB", tf, "OVERTRADE_GUARD_MODE", fallback="fail")
if mode not in {"warn_only", "fail"}:
    raise ValueError(f"OVERTRADE_GUARD_MODE must be 'warn_only' or 'fail', got {mode!r}")
if rate_buy_val > rate_buy_warn_threshold:
    log_event(logger, "theta_overtrade_warning", ..., guard_mode=mode)
if mode == "fail" and rate_buy_val > rate_buy_fail_threshold:
    raise RuntimeError(...)

_PARAM_TYPES extension (in S19-hardening alongside the bool branch) :

"OVERTRADE_GUARD_MODE": str,  # enum-validated at the call site

4.2.3 Files touched (S19 main)

src/training/harness/nodes/theta_sweep.py          # add 4 kwargs + over-trade guard logic
src/training/harness/dags/models/lightgbm_dag.py   # resolve LGB θ bounds + warn threshold ; pass to pick_threshold_on_val
src/commun/finetune/hyperparams.py                 # register THETA_MIN, THETA_MAX, RATE_BUY_WARN_THRESHOLD, RATE_BUY_FAIL_THRESHOLD in _PARAM_TYPES
documentation/adr/0090-...md                       # (optional) add a note about the new param family
tests/unit/training_harness/nodes/test_theta_sweep.py  # tests for the bounds + over-trade guard

PG migration : seed 4 PG keys via Console UI (per ADR-59 — Console-only, no git PR).

4.3 S19-hardening PR — 3 bug fixes¶

Bug 1 — class_balance.py:55-62 :

# BEFORE
if n_pos == 0:
    raise ValueError("compute_class_balance: degenerate training labels — n_pos=0. ...")

# AFTER
if n_pos == 0 or n_neg == 0:
    raise ValueError(
        f"compute_class_balance: degenerate training labels — "
        f"n_pos={n_pos}, n_neg={n_neg}. Cannot train binary on a single-class split (ADR-25 fail-fast)."
    )

Bug 2 — theta_sweep.py:59 + eval_metrics.py:69 :

Both call sites :

# BEFORE
_, _, f1, _ = precision_recall_fscore_support(y, y_pred, average=None, zero_division=0)
f1_buy = float(f1[1]) if len(f1) > 1 else 0.0

# AFTER
_, _, f1, _ = precision_recall_fscore_support(
    y, y_pred, labels=[0, 1], average=None, zero_division=0,
)
f1_buy = float(f1[1])  # always safe with explicit labels

eval_metrics.evaluate_split_binary line 69 gets the same treatment for the precision/recall/f1 unpacking.

Bug 3 — adapters/lgb.py:42-43 :

# BEFORE
def predict_proba(self, x: Union[pd.DataFrame, np.ndarray]) -> np.ndarray:
    if isinstance(x, pd.DataFrame):
        x = x.to_numpy()                       # ← strips column names
    if self.best_iteration is not None:
        raw = self._native.predict(x, num_iteration=int(self.best_iteration))
    ...

# AFTER
def predict_proba(self, x: Union[pd.DataFrame, np.ndarray]) -> np.ndarray:
    if isinstance(x, pd.DataFrame):
        expected = list(self._native.feature_name())
        actual = list(x.columns)
        if actual != expected:
            missing = set(expected) - set(actual)
            extra = set(actual) - set(expected)
            if missing or extra:
                raise ValueError(
                    f"LGBAdapter.predict_proba: feature mismatch — "
                    f"missing={sorted(missing)}, extra={sorted(extra)}"
                )
            x = x[expected]                    # reorder to training-time schema
        # leave as DataFrame — lgb.Booster.predict accepts both
    if self.best_iteration is not None:
        raw = self._native.predict(x, num_iteration=int(self.best_iteration))
    else:
        raw = self._native.predict(x)
    proba = np.asarray(raw, dtype=float)
    if proba.ndim == 1:
        return np.column_stack([1.0 - proba, proba])
    return proba

Defensive : raises if columns don't match the training schema (rather than silently .to_numpy() and predict on whatever is passed).

Committee reco #3 (expert-crypto-trader, expert-ops) — add a strict mode flag to allow a transition window. The default is strict=True (raise as above) but operators can flip to strict=False via PG (CVN_HPO_LGB_5M_FEATURE_ORDER_STRICT=false) for an emergency graceful fallback.

Implementation note (CR PR #941 r2 reco #4) : the existing commun.finetune.hyperparams.resolve() API only handles int / float per _PARAM_TYPES ; passing "false" would crash on the float-fallback parse. S19-hardening MUST extend _PARAM_TYPES with a bool entry for FEATURE_ORDER_STRICT :

# src/commun/finetune/hyperparams.py — additional case in the parser
_PARAM_TYPES: Final = {
    # ... existing entries ...
    "FEATURE_ORDER_STRICT": bool,
}

# In the parse function — explicit bool branch
def _parse_value(raw: str, expected_type: type, key: str):
    ...
    if expected_type is bool:
        s = raw.strip().lower()
        if s in {"true", "1", "yes", "on"}:
            return True
        if s in {"false", "0", "no", "off"}:
            return False
        raise RuntimeError(f"hyperparams.resolve: cannot parse {raw!r} as bool for {key!r}")
    ...

CR r3 reco #5 + #6 — bool extension is now a HARD prerequisite. The bool entry + parse branch land in S19-hardening (alongside Bug 1 + Bug 3) — NOT delayed. The temporary os.environ.get bridge in the LGB adapter is REMOVED before S19-hardening merges ; the adapter calls hyperparams.resolve("LGB", "5M", "FEATURE_ORDER_STRICT", fallback=True) from the start. Documented in the S19-hardening PR body as a hard merge-order dependency for S19 main.

The S19-hardening PR body explicitly lists, in its description : - "S19 main PR (#NNN) MUST NOT merge before this PR — lightgbm_dag.py depends on the bool extension landing here" - "If the deadline (30 days from this PR's merge) elapses without S19 main landing, this PR is reverted to roll back the unused bool branch — schedules in the team agenda"

The patch :

# AFTER (final v2 with strict mode)
def predict_proba(self, x: Union[pd.DataFrame, np.ndarray]) -> np.ndarray:
    if isinstance(x, pd.DataFrame):
        expected = list(self._native.feature_name())
        actual = list(x.columns)
        if actual != expected:
            missing = set(expected) - set(actual)
            extra = set(actual) - set(expected)
            if missing or extra:
                # Hard mismatch — schema is wrong, not just reordered.
                # Raise regardless of strict (no safe reorder possible).
                raise ValueError(
                    f"LGBAdapter.predict_proba: feature mismatch — "
                    f"missing={sorted(missing)}, extra={sorted(extra)}"
                )
            # Same columns, different order — strict vs warn+reorder
            if _is_strict_mode():  # reads CVN_HPO_LGB_5M_FEATURE_ORDER_STRICT, defaults True
                raise ValueError(
                    f"LGBAdapter.predict_proba: column order mismatch (strict mode). "
                    f"expected={expected[:5]}... actual={actual[:5]}..."
                )
            log_event(
                logger,
                "lgb_adapter_column_reorder",
                expected_first5=expected[:5],
                actual_first5=actual[:5],
            )
            x = x[expected]  # reorder to training-time schema
        # leave as DataFrame — lgb.Booster.predict accepts both
    if self.best_iteration is not None:
        raw = self._native.predict(x, num_iteration=int(self.best_iteration))
    else:
        raw = self._native.predict(x)
    proba = np.asarray(raw, dtype=float)
    if proba.ndim == 1:
        return np.column_stack([1.0 - proba, proba])
    return proba

Default strict=True keeps the ADR-25 fail-fast posture as the contract ; the strict=False PG escape hatch is a transition-window concession (mirrors ADR-90 fallback pattern). Hard mismatches (missing / extra features) ALWAYS raise — only column-order divergence on identical sets falls back to WARN+reorder.

Files touched (S19-hardening) :

src/training/harness/nodes/class_balance.py        # Bug 1 — extend n_pos guard with n_neg
src/training/harness/nodes/theta_sweep.py          # Bug 2 — add labels=[0, 1] (line 59)
src/training/harness/nodes/eval_metrics.py         # Bug 2 — add labels=[0, 1] (line 69)
src/training/harness/adapters/lgb.py               # Bug 3 — preserve / reorder columns + raise on mismatch
tests/unit/training_harness/nodes/test_class_balance.py    # n_neg=0 raises
tests/unit/training_harness/nodes/test_theta_sweep.py      # mono-class split returns correct f1
tests/unit/training_harness/nodes/test_eval_metrics.py     # ditto
tests/unit/training_harness/adapters/test_lgb_adapter.py   # column-order preservation + mismatch raise

5. Implementation plan¶

Phase 1 — Plan dossier + committee review (this PR)¶

Plan dossier finalised, committee plan_review PASSED.
Story transitions New → Specified.

Phase 2 — S19-hardening PR (defensive first, lower risk)¶

Branch fix/CVN-N001-EE-S19-hardening
3 bug fixes (1 per file) + unit tests for each (4 test files)
CR cycle (4-5 rounds per memory rule)
Committee pr_review per ADR-68 (touches src/training/harness/)
Merge

Phase 3 — S19 main PR (H8 behavioural fix)¶

Branch feat/CVN-N001-EE-S19-theta-range-and-overtrade-guard
ADR-90 param registration (THETA_MIN, THETA_MAX, RATE_BUY_WARN_THRESHOLD, RATE_BUY_FAIL_THRESHOLD)
theta_sweep.py API extension (4 new kwargs)
lightgbm_dag.py wiring (resolve bounds + warn threshold + pass through)
Unit tests for bounds + guard
CR cycle
Committee pr_review
Merge — but gate by cross-fold validation success before pushing to prod

Phase 4 — Cross-fold validation (operator-driven)¶

Run diagnostic__s18_step1_4_chain with the SAME parameters on 4 cells : - AAVEUSDC fold=3 (the canary) - OPUSDC fold=3 - LDOUSDC fold=4 - ETHUSDC fold=3 (added per committee reco #1 — control cell with healthier baseline, ADR-14 alignment)

For each cell, capture BEFORE/AFTER metrics :

Metric	Source event	Acceptance threshold
`theta_picked`	`event=training_complete`	`0.30 ≤ θ ≤ 0.40` (the new bound)
`rate_buy_val`	`event=training_complete`	`≤ 0.20` (down from 0.46)
`raw_buy_signals`	`event=signal_funnel`	down at least 50 % vs BEFORE
`final_trades`	`event=signal_funnel`	down — exact target depends on cell
`sortino`	`event=weighted_variant_evaluated`	`> 0` (up from -9.5)
`return`	`event=weighted_variant_evaluated`	`> -50%` (loss-bounded)
`f1_buy_val`	`event=training_complete`	within 5 % of post-S17 reference (no model-quality regression)

PG seeding for the new ADR-90 keys (Console UI) BEFORE the validation runs.

Committee reco #5 (expert-ops) : RATE_BUY_FAIL_THRESHOLD is set to a default of 0.25 in PG seeding (5 % above the warn threshold), enabling fail-fast on extreme over-trade cases instead of leaving it unset (silent-degradation risk). The warn-only mode is preserved in the [0.20, 0.25] band ; above 0.25 the run fails.

Committee reco #6 (expert-architect, expert-ops) : a pre-deploy CI check verifies that the 4 new ADR-90 keys + FEATURE_ORDER_STRICT (5 keys total per CR r3 reco #8) are seeded in PG before any deploy that includes the Phase 3 PR. Implementation : a new gate in the deploy CI that calls hyperparams.resolve("LGB", "5M", "THETA_MIN") etc. without the fallback ; failure to resolve = build fails with an actionable message. Keys checked : - CVN_HPO_LGB_5M_THETA_MIN - CVN_HPO_LGB_5M_THETA_MAX - CVN_HPO_LGB_5M_RATE_BUY_WARN_THRESHOLD - CVN_HPO_LGB_5M_RATE_BUY_FAIL_THRESHOLD - CVN_HPO_LGB_5M_OVERTRADE_GUARD_MODE - CVN_HPO_LGB_5M_FEATURE_ORDER_STRICT

Loki query {namespace="cvntrade"} |~ "event=hpo_fallback_applied" is monitored post-deploy ; any hit on these 6 keys triggers an oncall page.

Committee reco #11 (expert-ops) — pre-deploy LGB strict-mode compatibility check : before flipping FEATURE_ORDER_STRICT=true in prod, run a dry-run backtest with strict=false for 1 hour against live FE-pipeline output ; verify that event=lgb_adapter_column_reorder does NOT fire (= no upstream caller silently reorders columns). If any reorder event is logged, fix the upstream caller before flipping strict=true. Documented as a Phase 4 prerequisite step.

Committee reco #7 (expert-architect) : Phase 4 cross-fold validation includes a rollback test — manually trigger an over-trade fail (set RATE_BUY_FAIL_THRESHOLD=0.10 so the guard fires), verify the run errors as expected, then re-set to 0.25 via Console UI in seconds. Documents the rollback latency.

Committee reco #8 (expert-architect, expert-ml-engineer) : the Bug 3 raise behaviour (LGB column-mismatch) is documented in a new ADR (e.g. ADR-91 — to be drafted in a follow-up PR after S19-hardening merges) formalising the column-order contract between the harness and the backtest engine. Until the ADR lands, the contract is documented in the S19-hardening PR body + in the LGBAdapter.predict_proba docstring.

Phase 5 — Post-validation deploy + Story closure¶

If all 4 cells PASS the acceptance thresholds → push to prod via the standard deploy CI path
Trigger an FTF mini-sweep (1 crypto, 1 fold) post-deploy ; verify event=theta_overtrade_warning does NOT fire spuriously on a fresh prod run
OP wp#165 transitions In testing → Tested → Closed per ADR-81
Closure note on wp#165 with PR SHA + Loki snapshot of the 3 cell verdicts

6. Risk analysis¶

Risk	Likelihood	Impact	Mitigation
`[0.30, 0.40]` range too narrow → LGB f1 degrades > 5 % on some folds	Medium	Medium (model-quality regression)	Cross-fold validation pre-merge ; if any cell fails the f1 threshold, widen the range to `[0.25, 0.45]` and re-validate
Over-trade guard fires spuriously on healthy folds (false positive)	Low	Low (warning only ; no behavioural change in default mode)	Default to warn-only ; operator can flip to fail-mode via PG ; post-deploy mini-sweep verifies no spurious fires
Bug 3 fix raises on a healthy backtest call where columns happen to be reordered (false alarm)	Medium	Medium (backtest fails instead of silently mispredicting)	Defensive raise IS the contract per ADR-25 ; the alternative (silent mispredict) is the actual bug. If the backtest engine reorders columns, the engine MUST be fixed to pass them in training order. Document this in PR body.
The 2 PRs land in the wrong order (main before hardening) → main is harder to revert	Low	Low	Merge strategy doc'd in PR body of S19 main : "DO NOT MERGE before S19-hardening (#NNN)"
PG seeding forgotten before deploy → fallback to legacy `0.05-0.95` range fires (the bug we're fixing)	Medium	High (regression re-emerges silently)	`lightgbm_dag.py` uses `resolve(..., fallback=0.30)` and `fallback=0.40` so the fix IS the default ; `event=hpo_fallback_applied` Loki-queryable to detect any unintended fallback path
Cross-fold validation reveals H8 isn't the dominant cause on OPUSDC/LDOUSDC	Medium	Medium	Treat as new finding ; halt merge ; reopen S18 (or open S20) to investigate per-crypto variance
Bug 3 fix exposes a hidden upstream bug (FE pipeline reorders columns sometimes)	Medium	High	The exposure IS the goal — fail-fast surfaces it. Fix the upstream caller separately if it manifests.

7. Rollback procedure¶

Per phase :

Phase 2 (hardening) PR not merged : git close PR.
Phase 2 merged but Bug 3 starts raising in backtest : revert PR via git revert <SHA> + emergency PR ; investigate the upstream caller that passes mis-ordered columns.
Phase 3 (main) PR not merged : git close PR.
Phase 3 merged but cross-fold validation reveals problem : revert PR + rollback PG seeds via Console (set THETA_MIN/MAX back to 0.05/0.95 or unset).
Cutover after Phase 4 validation passes but issue surfaces in prod : git revert + helm rollback if needed ; PG seeds are Console-rollback in seconds.

RTO : git revert + CI build + helm upgrade ≈ 15 min.

8. Definition of done¶

Hard criteria for closing the Story (all must pass) :

S19-hardening PR merged on main (3 bug fixes + 4 unit test files)
S19 main PR merged on main (θ-sweep range + over-trade guard + ADR-90 registrations + unit tests)
PG seeds CVN_HPO_LGB_5M_THETA_MIN/MAX + CVN_HPO_LGB_5M_RATE_BUY_WARN_THRESHOLD set via Console UI
Cross-fold validation passed on 4 cells (AAVEUSDC fold=3 + OPUSDC fold=3 + LDOUSDC fold=4 + ETHUSDC fold=3 control) per §5 Phase 4 thresholds
Post-deploy FTF mini-sweep emits no spurious event=theta_overtrade_warning
Loki query {namespace="cvntrade"} |~ "event=theta_overtrade_warning" returns the validation runs only (audit trail)
OP wp#165 transitions In testing → Tested → Closed per ADR-81 with closure note + PR SHAs

9. ADR alignment¶

ADR-25 (no silent fallback) — Bug 1 fix (fail-fast on degenerate splits) + over-trade guard fail-mode + Bug 3 raise on column mismatch
ADR-31/32/33 (structured logging) — event=theta_overtrade_warning follows key=value format
ADR-59 (Console-only PG params) — new THETA_MIN/MAX + warn/fail thresholds via Console, NOT in git
ADR-68 (Expert Committee) — pr_review mandatory per PR (touches src/training/harness/)
ADR-77 (MkDocs SSoT) — this dossier under documentation/reviews/
ADR-89 (training harness as plugin registry) — preserved ; per-model θ range fits the existing plugin model
ADR-90 (training hyperparams in PG) — extends the existing CVN_HPO_<MODEL>_<TF>_<PARAM> scheme to a new param family (THETA_*)

10. Out-of-scope follow-ups (filed separately)¶

CVN-N001-EE-S20 : XGB-specific f1=0.089 fix (parent dossier §3.1 cited the canary number ; the XGB regression is structurally different — fixed θ=0.5, no θ-sweep — and likely a different mechanism than H8) — to be opened post-S19 close
CVN-N001-EE-S21 (committee r2 reco #9 — confirmed) : data-driven calibration spike for the over-trade thresholds + θ-sweep bounds per crypto, using the 4 cross-fold validation cells as the starting dataset. Replaces the current "initial heuristic" basis for 0.20 / 0.25 with empirical per-crypto bounds. Out-of-scope for S19 (heuristic is good enough for the H8 unblock).
Pre-#891 baseline empirical re-establishment : run train_with_fixed_params_lgbm with scale_pos_weight=1.0 + θ=0.4 on the captured fold to anchor the f1≈0.42 reference cited in S18 parent dossier §3 (S18 Step 5 §7 question #3) — could be a 1-day spike Story.
Live inference / execution kill switch (CR r3 reco #3) — already in scope under ADR-71 + Epic CVN-N001-EG (kill-switch implementation Story tracked separately ; design dossier documentation/design/CVN-N001-EF-S02-kill-switch-design.md). S19 does NOT implement the kill switch ; it relies on the pre-existing kill-switch contract for emergency halt independent of code rollback.
CVN-N001-EE-S22 (NEW, CR r3 reco #4) : continuous data / label / concept drift detection for production LGB models — metrics (KS test on feature distribution, label-prior drift, prediction-distribution drift), thresholds, alerting, runbooks. Currently no automated detection ; relies on FTF sweep cadence + operator manual inspection. Major scope ; out-of-scope for S19.
ADR-91 : formalise the LGB column-order contract between harness adapter and backtest engine (S18 Step 5 reco #8) — already filed as a follow-up after S19-hardening merges.

11. Test plan¶

Unit (S19-hardening)¶

tests/unit/training_harness/nodes/test_class_balance.py
  - test_compute_class_balance_raises_on_n_neg_zero  # NEW
  - test_compute_class_balance_raises_on_n_pos_zero  # existing — keep as regression guard

tests/unit/training_harness/nodes/test_theta_sweep.py
  - test_pick_threshold_returns_correct_f1_when_y_all_positive  # NEW (Bug 2 regression)
  - test_pick_threshold_returns_correct_f1_when_y_pred_all_positive  # NEW
  - test_pick_threshold_default_bounds_unchanged  # regression guard

tests/unit/training_harness/nodes/test_eval_metrics.py
  - test_evaluate_split_binary_returns_correct_f1_mono_class  # NEW

tests/unit/training_harness/adapters/test_lgb_adapter.py
  - test_predict_proba_strict_true_raises_on_column_order_mismatch    # NEW (CR PR #941 r2 reco split)
  - test_predict_proba_strict_false_reorders_and_logs                 # NEW (CR PR #941 r2 reco split)
  - test_predict_proba_raises_on_missing_feature_regardless_of_strict # NEW (hard mismatch always raises)
  - test_predict_proba_raises_on_extra_feature_regardless_of_strict   # NEW (hard mismatch always raises)
  - test_predict_proba_accepts_ndarray_unchanged                      # regression guard

Unit (S19 main)¶

tests/unit/training_harness/nodes/test_theta_sweep.py
  - test_pick_threshold_respects_theta_min_kwarg                       # NEW
  - test_pick_threshold_respects_theta_max_kwarg                       # NEW
  - test_pick_threshold_excludes_anchor_05_when_outside_bounds         # NEW (CR PR #941 r2 reco #1 — bounds violation regression)
  - test_pick_threshold_includes_anchor_05_when_inside_bounds          # NEW (positive case for the same logic)
  - test_pick_threshold_raises_on_invalid_bounds                       # NEW (theta_min >= theta_max OR bounds outside (0, 1))
  - test_pick_threshold_emits_overtrade_warning_above_warn_threshold   # NEW
  - test_pick_threshold_raises_on_overtrade_above_fail_threshold       # NEW (mandatory fail threshold per CR r2 reco #2/3)
  - test_pick_threshold_no_warning_below_warn_threshold                # NEW

tests/unit/finetune/test_hyperparams_bool_extension.py
  - test_resolve_parses_true_variants_when_param_type_bool             # NEW (CR PR #941 r2 reco #4 — bool extension)
  - test_resolve_parses_false_variants_when_param_type_bool            # NEW
  - test_resolve_raises_on_unparseable_bool_literal                    # NEW

Unit (S19 main — CR r3 additions)¶

tests/unit/training_harness/nodes/test_theta_sweep.py (additions to v3 plan)
  - test_pick_threshold_warn_only_mode_does_not_raise_above_fail_threshold  # NEW (CR r3 reco #10 — explicit mode flag coverage)
  - test_pick_threshold_fail_mode_raises_above_fail_threshold               # NEW (positive complement)
  - test_pick_threshold_invalid_overtrade_guard_mode_raises                 # NEW (enum guard regression)
  - test_pick_threshold_includes_eval_metrics_labels_zero_one               # NEW (CR r2 Bug 2 — moved from hardening to main per r3 reco #1)

tests/unit/training_harness/nodes/test_eval_metrics.py (moved from hardening)
  - test_evaluate_split_binary_returns_correct_f1_mono_class                # NEW (Bug 2 — moved here from hardening)

Downstream integration (CR r3 reco #2 — explicit test cases)¶

tests/integration/training_harness/test_lgb_downstream_integration.py
  - test_autonomous_trainer_consumes_lgb_artifact_with_new_theta_range
    # Run autonomous_orchestrator.train_one_crypto for AAVEUSDC fold=3 ;
    # assert TrainedArtifact.threshold_buy ∈ [0.30, 0.40] ; assert
    # autonomous_trained log event shows the new theta_picked field
  - test_regime_trainer_propagates_overtrade_guard_event
    # Run weighted_variant_trained on a synthetic over-trade case ;
    # assert event=theta_overtrade_warning bubbles up the regime trainer
    # log chain ; assert Loki-queryable
  - test_walk_forward_predictor_uses_picked_theta_from_artifact
    # Load a TrainedArtifact with threshold_buy=0.35 ; run
    # WalkForwardPredictor.predict on a synthetic feature window ;
    # assert the predictor gates buy signals at θ=0.35 (NOT the legacy 0.40)

These 3 tests gate the S19 main PR merge. They run as pytest -m integration and are added to the medium tier (per CLAUDE.md Pytest markers).

Integration (cross-fold validation, manual via DAG)¶

3 runs of diagnostic__s18_step1_4_chain : - crypto=AAVEUSDC, fold_id=3 - crypto=OPUSDC, fold_id=3 - crypto=LDOUSDC, fold_id=4

Per cell, compare BEFORE/AFTER on the 7 metrics in §5 Phase 4.

Smoke (post-deploy)¶

1-crypto FTF mini-sweep ; verify Loki shows the new theta_overtrade_warning event registry but no actual warnings on a healthy run
Grafana FTF dashboard panels still render correctly (no LogQL parse errors with the new events)

Regression (post-merge for both PRs)¶

Existing parity tests in tests/unit/training_harness/parity/test_lgb_harness_vs_legacy.py still PASS (the bug fixes are defensive, no behavioural change on currently-tested healthy paths)
Existing tests/unit/training_harness/test_phase4_lgb_cutover.py still PASS

12. Committee questions (for `plan_review`)¶

Per-model θ range via ADR-90 keys vs hardcoded constants : the dossier proposes the ADR-90 path (CVN_HPO_LGB_5M_THETA_MIN/MAX) — is this the right level of indirection, or should the bounds live in code with the hardcoded LGB legacy values for clarity ? ADR-90 path adds Console UI complexity but unifies the pattern.
Over-trade guard threshold : 0.20 is the operator's recommendation from S18 Step 5 §9. Should the default warn threshold come from the legacy LightGBMConfig.threshold_buy=0.4 × n_pos/n_train ≈ 0.175 ≈ 0.20, or be data-driven per crypto ?
PR merge order : hardening-first (defensive) then main (behavioural). Any objection ?
Bug 3 raise behaviour : the proposed fix RAISES on column mismatch. Should it be a WARN+reorder instead (graceful fix) ? The committee opinion matters because raising could break currently-running backtests if they happen to reorder columns silently.
Cross-fold validation gating : 3 cells before merge. Is this enough or should we add e.g. ETHUSDC fold=3 as a "control" with a healthier baseline ?
Verdict : PASS / PASS_WITH_REVISIONS / REJECTED with explicit AC for the next concrete step (S19 implementation kickoff).

13. Committee verdict (`plan_review`) — 2026-05-14¶

Status : PASSED_WITH_REVISIONS / OK / strong consensus / 0 blockers (session 1f4335a2, OP Meeting #135, 5 experts).

13.1 Areas of agreement (5)¶

2-PR split (hardening first, main second) excellent for risk management + clean revert envelope
Strong ADR-25 alignment via Bug 1 fail-fast + Bug 3 raise on column mismatch
ADR-90 extension for THETA_MIN/MAX + RATE_BUY_*_THRESHOLD consistent with existing pattern
Over-trade guard with structured logging is a crucial operational control plane
Comprehensive risk analysis + rollback procedure

13.2 Areas of dissent (4)¶

Topic	Pro / Against	Resolution
`rate_buy_val=0.20` warn threshold lacks data-driven justification	3 / 2 (data-scientist + ops)	reco #2 — document as initial heuristic, schedule follow-up calibration spike
Bug 3 raise vs graceful reorder	4 / 1 (ops)	reco #3 — `strict=True` default + opt-in `strict=False` via PG
Cross-fold size of 3 cells (ADR-14 robustness)	3 / 2 (data-scientist + ops)	reco #1 — add ETHUSDC fold=3 as control (4 cells total)
Downstream system integration not fully validated	3 / 1 (data-scientist)	reco #4 — add explicit downstream integration tests in Phase 4

13.3 8 recommendations integrated¶

#	Reco	Section updated
1	Add ETHUSDC fold=3 to cross-fold validation set	§5 Phase 4
2	Document `rate_buy_val=0.20` as initial heuristic (justification : `n_pos / n_train ≈ 0.175 ≈ 0.20`, follow-up data-driven calibration tracked separately)	§4.2.2 + §10
3	Bug 3 add `strict=False` PG flag for graceful WARN+reorder transition mode (default `strict=True`)	§4.3 (Bug 3 patch updated)
4	Validate downstream systems (autonomous trainer, regime trainer, walk-forward predictor) — add integration tests in Phase 4	§5 Phase 4 + §11
5	Set `RATE_BUY_FAIL_THRESHOLD=0.25` default in PG seeding (5 % above warn)	§4.2.2 / §5 Phase 4
6	Pre-deploy CI check for PG seeding + monitor `event=hpo_fallback_applied` Loki	§5 Phase 4 + §6 risk row
7	Add rollback test in Phase 4 (synthetic fail trigger)	§5 Phase 4
8	Document Bug 3 raise behaviour in a new ADR-91 (follow-up after S19-hardening merges)	§10 (out-of-scope follow-up)

13.4 Decision¶

The committee unanimously approves the 2-PR split and the H8 fix posture. No blockers. The 8 revisions integrated above tighten validation scope (1 control cell + downstream integration tests + rollback test) and operational ergonomics (graceful reorder mode + default fail threshold + pre-deploy gate).

Story transition : OP wp#165 → New → Specified after this dossier merges to main.

13.5 Round 2 verdict (2026-05-14, session `aa76ed46`, OP Meeting #136)¶

Status : PASSED_WITH_REVISIONS / EXECUTION_RISK / strong consensus / 1 BLOCKER + 10 recommendations.

Round 2 was triggered by 5 user-surfaced corrections to v3 that have all been addressed in v4 (the current revision of this dossier). The committee then ran a fresh review against v3 and surfaced :

13.5.1 BLOCKER (resolved in v4)¶

theta_sweep.py modified by both PRs (Bug 2 in hardening + θ range in main) → contradicts "independently mergeable" claim, would produce a merge conflict regardless of merge order. Resolution : Bug 2 (both theta_sweep.py:59 AND eval_metrics.py:69) MOVED to S19 main. S19-hardening becomes strictly orthogonal on the file surface (3 files : class_balance.py for Bug 1, adapters/lgb.py for Bug 3, hyperparams.py for the bool extension). See §4.1 updated table for the post-r3 file split.

13.5.2 11 recommendations integrated in v4¶

#	Recommendation	Section updated
1	Resolve `theta_sweep.py` PR conflict (BLOCKER) — Bug 2 moved to main	§4.1 (table) + §11 (test files moved)
2	Detail downstream integration tests (autonomous trainer + regime trainer + walk-forward predictor) — explicit test cases	§11 new "Downstream integration" subsection (3 explicit tests)
3	Live inference / execution kill switch — REFERENCE existing ADR-71 + Epic CVN-N001-EG, NOT in S19 scope	§10 (out-of-scope reference)
4	Continuous data / label / concept drift detection — NEW Story CVN-N001-EE-S22 filed	§10 (out-of-scope)
5	bool extension MUST land in S19-hardening — `os.environ.get` bridge REMOVED before merge	§4.3 (hard prerequisite)
6	Document the bool-parsing bridge as temporary with hard 30-day deadline + revert plan	§4.3 (PR body lock)
7	Replace `RATE_BUY_FAIL_THRESHOLD=1.0` magic-number escape with explicit `OVERTRADE_GUARD_MODE = warn_only \\| fail` flag	§4.2.2 (new explicit flag + enum guard)
8	Pre-deploy CI check extended to verify `FEATURE_ORDER_STRICT` is seeded (5 keys → 6 keys total)	§5 Phase 4 (key list updated)
9	Schedule CVN-N001-EE-S21 calibration spike (data-driven thresholds) — confirmed	§10 (out-of-scope)
10	Unit test for warn-only mode (verifies FAIL path is unreachable in `warn_only`)	§11 (3 tests added : warn_only / fail / invalid mode)
11	Pre-deploy LGB strict-mode dry-run (`strict=False` 1h-backtest, verify no `lgb_adapter_column_reorder` event)	§5 Phase 4 (added prerequisite step)

13.5.3 Areas of dissent (4)¶

Topic	Resolution
`theta_sweep.py` cross-PR modification	reco #1 — file moved (BLOCKER resolved)
`RATE_BUY_FAIL_THRESHOLD=1.0` magic-number escape ergonomics	reco #7 — explicit `OVERTRADE_GUARD_MODE` flag
Downstream integration test specificity	reco #2 — 3 explicit test cases added in §11
Live inference kill switch + drift detection absence	recos #3 + #4 — referenced ADR-71 / filed S22

13.5.4 Round 2 decision¶

The committee unanimously approves the v4 plan once the 11 recommendations are integrated (now done). The EXECUTION_RISK code reflected the v3 BLOCKER + the operational gaps ; both addressed in v4. The dossier is READY FOR IMPLEMENTATION KICKOFF post-merge of PR #941.

Story transition (post-merge) : OP wp#165 → New → Specified.

14. References¶

Predecessor Story : CVN-N001-EE-S18 (closed) OP wp#154
Step 5 re-scope dossier (the design source) : documentation/missions/cvn-n001-ee-s18-diagnostic/step5-rescope-dossier.md
Parent S18 plan dossier : 2026-05-13-cvn-n001-ee-s18-harness-shallow-training-diagnostic-plan.md
ADR-25 (no silent fallback) : documentation/adr/0025-pas-de-fallback-silencieux-dans-les-pipelines-ml.md
ADR-89 (training harness as plugin registry) : documentation/adr/0089-...md
ADR-90 (training hyperparams in PG / Console only) : documentation/adr/0090-...md
Trigger PR : #891 (the harness migration that introduced the regression)
S18 chained DAG run that produced the H8 evidence : Loki window 2026-05-14 14:44-15:08 UTC, search {event="s18_chain_verdict"}