Skip to content

CVN-N001-EI-S03 — Split / regime reconstruction (Block 3b)

Implementation summary for the S41 split-ablation diagnostic and the gated production split. Full rationale + review history: the plan dossier (committee plan_review 0854dc31 PASSED, OP Meeting #234) and the regime-signal decision.

What it answers

Is the weak signal (AUC ≈ 0.64, best_iter = 1) driven by how we split train/test (validation design, hypothesis C-d — the program's leading hypothesis) rather than the model? Two jobs:

  • Job A — resolve the S02 reserve (gate). S02 returned SYSTEMATIC_LEAK driven solely by the temporal-autocorrelation probe under degraded proxies; the leakage probes were clean. S41 re-measures the autocorrelation under a real purge + embargo split. A genuine temporal leak → HALT; the expected label-overlap autocorrelation neutralised by embargo → clear the reserve.
  • Job B — the split ablation. Re-fit under six split families and measure two deltas:
  • Δ_leak = AUC(purged-WF) − AUC(naive). Material negative ⇒ the naive AUC was leak-inflated (BASELINE_LEAK_INFLATED).
  • Δ_cd = AUC(regime family) − AUC(purged-WF). Material positive ⇒ validation design is the driver (SPLIT_MOVES_VALIDATION).

AUC-gated (ΔAUC ≥ 0.02, CI excludes 0), best_iter corroborating-not-vetoing, with a power gate (SPLIT_STABLE only at ≥ 0.80 power, else INCONCLUSIVE_UNDERPOWEREDevidence of absence vs absence of evidence).

Architecture (two-layer, Hamilton-native)

s41_io.load_cell_inputs (Airflow-owned I/O: run_s22a1 anchor → SHA-drift guard → captured-parquet load → PG-resolved LGB hp, ADR-90) feeds the Hamilton graph in hamilton/s41_nodes.py:

combined → regime_signals → family_splits → family_auc → {delta_leak, delta_cd, acf_reserve} → cell_verdict
                                                                              group_verdict (strict-majority)

The heavy LGB fit is a single family_auc node (one fit per family); the Δ probes derive from it. Split constructors live in src/training/cv/regime_split.py (pure, leak-guarded). Six families: naive (Δ_leak reference, unguarded), purged_wf (honest baseline), strict_temporal, market_period, volatility_bucket, regime_ab, plus the group-level crypto_loo.

Decided design points (regime-signal decision)

  • Option B, unconditional — the regime signal comes from captured features (price_volatility_20 vol, distance_SMA_192 trend, trend_strength structure), never raw close. This keeps the instrument a pure function of the S07-pinned fold (no external data, no drift, no dual path).
  • Multivariate family-6 — the regime code is (vol_band × trend_sign × structure_band) with absolute pre-registered bands (stable identity across folds), so regime_ab tests a genuine regime, not a single seen-axis slice.
  • Q-feature label-correlation gate — a regime feature with |Spearman corr(feature, label)| > 0.10 is rejected / flagged DIFFICULTY_CONFOUND (else the split confounds difficulty with transfer).
  • Leak-safety — every guarded family passes assert_no_leak (no train/test overlap + embargo gap; violation → event=s41_embargo_violation severity=error, clean verdict, never a crash).

Gated production split (CVN_SPLIT_MODE)

The purged walk-forward split is implemented in ablation_runner._generate_folds as an off-by-default, A/B-testable capability (ADR-56). Default (off/unset) is byte-identical to the prior behaviour (train_end == test_start); CVN_SPLIT_MODE=purged_wf inserts a CVN_SPLIT_EMBARGO_DAYS gap (ftf_config, ADR-59). It is not promoted to the production default in S03 (ADR-2 — awaits the S03 + S04 verdicts); promotion would be a separate Story with its own staged rollout.

Scope

Every verdict is specific to the defi_top5 control group — it informs the program but is not a program-wide claim. A positive S03 (SPLIT_MOVES_VALIDATION) triggers a generalisation Story, not a direct program-wide split ship; BASELINE_LEAK_INFLATED escalates to a program-level leak investigation.

Operator surface

  • DAG: diagnostic__s41 (two-layer, crypto_group=defi_top5 fan-out, reuses the S07 pinned fold via the run_s22a1 anchor + SHA-drift guard; schedule=None, paused).
  • Verdict catalogue: SPLIT_MOVES_VALIDATION / SPLIT_STABLE / BASELINE_LEAK_INFLATED / INCONCLUSIVE_UNDERPOWERED / INCONCLUSIVE_TOOLING; group: SYSTEMATIC_VALIDATION_DESIGN / PER_ASSET_SPLIT_EFFECT.
  • Observability: event=s41_* (Loki). See the MLOps readiness.