Skip to content

CVN-N001-EH — Post-S18 Predictivity Expansion (Epic descriptor)

🪧 Status — 2026-05-13 — PLANNING SCAFFOLD, not yet activated. This Epic is gated by the closure of S18 (wp#154 — harness shallow-training diagnostic) and S19 (remediation, scope TBD by S18 verdict). It activates the moment S19 closes with a restored baseline (f1_buy ≥ 0.40 on ≥ 4/5 cryptos for ALL 3 model types — the post-S18 gate per the F1 plan §0ter target ladder). Sprint planning update — 2026-05-13 : the 9 child Stories (S01-S09) and the Epic itself are now created in OpenProject (wp#155-164) + GitHub (#920, #922-#930) for sprint-planning visibility ; all are status=New and version=None. Pickup into a sprint version happens at activation time per ADR-69 + memory feedback_create_op_wp_for_stories.md. See §5.2 for the full reference table.

epic_id: CVN-N001-EH parent need: CVN-N001 (F1 mission, #608) — sibling to CVN-N001-EE (F1_buy boost v1-v5) OP work-package (visibility-only — does NOT mark the Epic active): wp#155 (created 2026-05-13 via openproject_import_gh.py, status New). The Epic transitions to active only after the §10 activation checklist completes (S18+S19 closure + post-S19 baseline FTF + first-Story plan dossier). Creating an OP wp is for project-visibility / sprint-planning ; activation is a separate workflow gate per ADR-69. GitHub issue: #920 Status: planning — not yet active. OP wp + 9 child Stories + GH issues exist solely so the Epic appears on the OpenProject roadmap and the GH project board for sprint-planning visibility ; OP wp existence is NOT the activation criterion (see §10 for the actual activation gate). Author: operator + Claude Date: 2026-05-13 (planning scaffold) Dependencies: CVN-N001-EE-S18 (diagnostic) → CVN-N001-EE-S19 (remediation, scope = S18 output) Execution mode: STRICT SEQUENTIAL (one Story In progress at a time, single-WIP per ADR-81)


1. Objective

Restore and extend model predictivity beyond the pre-#891 baseline (f1_buy ~0.42) by opening new signal dimensions after S18 root cause resolution + S19 baseline restoration. The Epic is the planned successor to the F1_buy boost mission (CVN-N001-EE) v1-v5 which closed its Phase 1 (quick-win) with 4 ABANDONED tracks + 1 retracted INCONCLUSIVE + 1 F1_LOCK on Track 14 (5m timeframe), then froze pending S18/S19.

This Epic operationalises F1 plan v5 Tier 7 as a strictly-sequenced 9-Story program with explicit go/no-go gates between phases.


2. Hard constraints

These are not preferences — they are gating invariants. Any Story that violates one of these fails its plan_review committee gate :

  1. NO Story may start before S18 verdict lands as LOCK or KEEP_AVAILABLE (i.e. S19 closes with f1_buy ≥ 0.40 baseline restored). ABANDON on S18 escalates to architecture review (see §6 Failure protocol).
  2. NO Story may modify the training harness core (src/training/harness/) — Stories CONSUME the harness API (train_one, train_with_fixed_params, run_hpo) but do NOT change it. Harness changes go through a separate Epic (CVN-N001-EE-S16 / S17 surface).
  3. NO parallel execution — Stories are strictly sequential within the Epic. Single-WIP per ADR-81 ; the buffer Specified may stack multiple Stories, but only ONE is In progress at any time.
  4. Each Story MUST pass F1_LOCK gates (per F1 plan §4) before the next starts. F1_LOCK is sufficient at the Story level ; TRADING_LOCK is a separate filter-tuning Story.
  5. No architecture scaling (deep learning addition, new model family, sequence models = F1 plan Track 8) is allowed inside this Epic. Track 8 stays in the parent EE Epic with its own gating.
  6. All Stories MUST be reversible via env var / config (ADR-59) — every Story ships a FTF factor + Console-managed variant set + ADR-90 hyperparameter externalisation if relevant. Hot-rollback via Console toggle, no code-deploy rollback.

3. Execution strategy — 3 phases

Phase Goal Stories Phase gate
A — Signal validation (diagnostic-first) confirm an extractable edge exists S01 (simple baseline) → S02 (feature block ablation) → S03 (horizon sweep) ≥ 1 Story produces Δf1_buy ≥ +0.02 OR diagnostic surface convinces the operator the signal is real ; otherwise → §6 Failure protocol
B — Signal expansion grow predictive power S04 (edge regression) → S05 (regime-aware) → S06 (continuous labels) → S07 (class purity) composite gain Δf1_buy ≥ +0.05 across A+B Stories ; otherwise → §6 Failure protocol
C — Selective amplification reach high f1_buy via filtering S08 (confidence gating) f1_buy@top20% ≥ +0.10 vs baseline
D — Optional exploit model diversity post-signal S09 (ensemble re-eval) only triggered if A+B+C plateau without reaching the Epic-level criteria below

4. Epic-level success criteria

The Epic closes successfully when all four conditions hold on a verifiable FTF run after the Epic's last Story closes :

  1. f1_buy ≥ 0.55 on ≥ 4/5 cryptos (cleared T1 of the target ladder)
  2. At least one configuration achieves f1_buy ≥ 0.65 (full or confidence-gated subset)
  3. Per-fold variance of f1_buy ≤ 0.05 (stability gate per F1 plan §6)
  4. ≥ 50 BUY trades / fold on the winning configuration (sample-size gate per F1 plan §6)

Closure ritual : Epic descriptor body appended with a final paragraph citing the closure verdict + the F1 mission plan §10 final track status snapshot SHA + the OP wp closure note. Sibling F1 plan §10 row updated to reference the Epic outcome.


5. Stories scaffold (9 Stories)

Sprint planning update — 2026-05-13 : the 9 Stories below were created in OpenProject + GitHub on 2026-05-13 for sprint-planning visibility. All Stories are status=New and remain so until the Epic activates (post-S19 baseline restoration per §10). The OP storyPoints + estimatedTime + priority fields are populated ; the version (sprint) field is intentionally NOT set per ADR-69 + STORY_WORKFLOW.md §1.1 (Stories get pulled into a sprint version on operator pickup, not at creation).

5.1 Stories — work / points / priority scaffold

# Story (CVN-N001-EH-…) Tier 7 Track Phase Priority Points Wall Goal
S01 Simple model baseline (signal sanity check) T19 A 🔴 critical 3 0.5-1 d LR + shallow tree → triangulate signal vs pipeline-bug
S02 Feature block ablation T20 A 🔴 critical 5 2 d identify ≥ 1 block whose removal lifts f1_buy ≥ +0.02
S03 Temporal horizon sweep T21 A 🟠 high 3 1-2 d H ∈ {2, 4, 8, 12} → find best horizon at 5m timeframe
S04 Edge regression (target reformulation) T16 B 🔴 critical 8 2-3 d y = net_return regressor + threshold rule
S05 Regime-aware modeling T17 B 🟠 high 5 2 d regime features (Variant A) or per-regime models (Variant B)
S06 Continuous / probabilistic labels T18 B 🟡 medium 8 2-3 d continuous target + soft loss
S07 Class purity refinement T22 B 🟡 medium 5 1-2 d filter low-return BUY samples or weight by return
S08 Confidence gating / selective prediction T15 C 🔴 critical 5 1-1.5 d f1_buy@top20%f1_buy_global + 0.10
S09 Ensemble re-evaluation (post-signal) re-use T11 surface D 🟡 low 8 3-4 d restricted to best 2 models post-signal validation

5.2 Stories — OP / GH references (created 2026-05-13)

Story GH issue OP work-package OP status
S01 Simple model baseline #922 wp#156 New
S02 Feature block ablation #923 wp#157 New
S03 Horizon sweep #924 wp#158 New
S04 Edge regression #925 wp#159 New
S05 Regime-aware modeling #926 wp#160 New
S06 Continuous labels #927 wp#161 New
S07 Class purity #928 wp#162 New
S08 Confidence gating #929 wp#163 New
S09 Ensemble re-eval (opt) #930 wp#164 New

Cumulative scope : 50 story points, ~17-19 wall days (worst case Phase A+B+C, optional S09 adds +4d). Excludes wait time between Stories for plan_review committees + FTF sweep wall.

Mandatory execution order :

S01 → S02 → S03 → S04 → S05 → S06 → S07 → S08 → (optional) S09

Cross-Story baseline rule : every Story compares against the same post-S19 restored baseline (a single canonical FTF run captured at S19 closure). No moving baseline allowed within the Epic.


6. Failure protocol

If any of the following triggers fires, the Epic execution STOPS and escalates :

Trigger Escalation
Phase A completes with no Story producing Δf1_buy ≥ +0.02 Architecture review (operator + Claude pair-review) — re-assess whether predictivity expansion is the right next mission or whether the F1 mission should pivot to a different axis (PTE envelope re-sweep, filter funnel re-engineering, exec layer audit).
Simple model baseline (S01) shows simple ≈ complex Signal-limited diagnosis confirmed. STOP Epic execution. Open a new Epic / Need scoped to data acquisition (new feature dimensions, alternative cryptos, alternative timeframes) before resuming predictivity work.
f1_buy plateau persists below 0.45 after all Phase A+B Stories Open [gate-failure] issue + run a self-pr_review committee on the cumulative Epic state. The committee may decide to (a) extend to Phase D ensemble, (b) waive the Epic gate with a documented risk acceptance, or (c) close the Epic as KEEP_AVAILABLE (no LOCK, no ABANDON) and pivot.

The failure protocol is NOT an extension — it converts the Epic from "deliver predictivity gain" to "deliver decisional clarity". Both outcomes are acceptable closures of the Epic ; a "silent extension" is not.


7. Operator constraints (for Claude execution)

These are operational rules for the Claude sessions that will implement the Stories. They restate constraints from project memory + CLAUDE.md + the F1 plan in a single block for easy reference :

  1. Do NOT optimise hyperparameters prematurely — the harness ships ADR-90 externalised HPO ranges via PG. Tune via Console only when the Story explicitly calls for an HPO range change.
  2. Do NOT introduce new models outside scope — each Story has a defined model surface ; do not add CatBoost / NN variants opportunistically.
  3. Do NOT mix multiple Stories in one FTF run — single Story = single FTF factor = single sweep. Cross-Story interactions are evaluated post-hoc on per-Story sweep outputs.
  4. Always compare against the SAME baseline run — captured at S19 closure, used as the canonical reference for all Stories in this Epic.
  5. Always use 5 cryptos (defi_top5) — no group extension during the Epic. Cross-asset extension is a separate mission scope.
  6. Always report bootstrap CI + paired-t BH-adjusted p-values + Cohen's d per F1 plan §6 reporting standard.
  7. Console-only config writes (ADR-59) — no direct PG UPDATE bypass, no in-code defaults that override PG.
  8. Airflow-only execution (memory feedback_no_shortcuts.md) — NEVER python scripts/... directly for ML training. Always via launcher DAG.

8. Per-Story Story descriptors (light)

The full per-Story plan dossiers (documentation/reviews/<date>-cvn-n001-eh-sNN-<slug>-plan.md) will be authored at Story activation time per ADR-81 + the standard plan-review committee ritual. The block below is a one-liner per Story to feed those dossiers.

S01 — Simple model baseline (T19)

Train logistic regression + decision tree (max_depth=3) on the same train/val/test splits the FTF uses ; no HPO, no CUSUM, no θ-sweep. Compare resulting f1_buy to the harness pipeline. Diagnostic verdict : simple ≈ complex → signal issue ; simple > complex → harness bug confirmed (feeds back into S18 / S19) ; simple < complex → harness adds value (expected).

S02 — Feature block ablation (T20)

Tag each FE block with block="…" metadata. Add FTF factor feature_block_mask toggling each block off (1 variant per block-removed). Identify ≥ 1 block whose removal lifts f1_buy ≥ +0.02. Result is informative either way (knowing nothing is removable IS a signal).

S03 — Horizon sweep (T21)

FTF factor horizon_hours sweeping H ∈ {2, 4, 8, 12}. Composes with Track 14 LOCK on 5m timeframe. Falsifiability : Δ f1_buy ≥ +0.02 on ≥ 4/5 cryptos at the best horizon.

S04 — Edge regression (T16)

Target = y = net_return_after_costs. Train XGBRegressor / LGBMRegressor plug-ins via the harness adapter surface. Decision rule : trade if y_pred > threshold (threshold calibrated OOS per ADR-15). Falsifiability : corr(y_pred, realized_pnl) > 0.2 AND derived f1_buy ≥ baseline.

S05 — Regime-aware modeling (T17)

Regime features (volatility bucket + trend state + BTC regime + sector index). Variant A : features added globally. Variant B : per-regime models (regime classifier routes inference). Falsifiability : Δ f1_buy ≥ +0.02 AND reduced fold variance. Absorbs the v4 "Regime Engine" + "Market Context" tracks.

S06 — Continuous labels (T18)

Continuous target options : (a) normalised return at horizon, (b) P(TP) proxy. Regression / soft-classification with KL / CE loss. Falsifiability : Δ f1_buy ≥ +0.015 AND ECE improves ≥ 20%. Overlaps with S04 — must clarify which lever drives the lift if both ship positively.

S07 — Class purity refinement (T22)

Filter low-return BUY samples (drop trades with realised return < ε) OR weight samples by return magnitude in the loss. Falsifiability : precision ↑, recall ↓, f1_buy ↑ ≥ +0.02. Risk : class-definition leakage (the trades that "would have been profitable" are easy to predict in-sample).

S08 — Confidence gating (T15)

Add FTF metrics precision@k, f1_buy@k for k ∈ {10%, 20%, 30%}. Add factor confidence_threshold ∈ {none, p70, p80, p90}. Calibrated probabilities only (depends on calibration quality — Story S01 sanity-check feeds this). Falsifiability : f1_buy@top20% ≥ +0.10 vs global baseline AND ≥ 4/5 cryptos improve. Directly unlocks T4 of the target ladder.

S09 — Ensemble re-eval (optional, post-signal)

Re-use the F1 plan Track 11 ensemble diversity surface, restricted to the best 2 models surfaced by S01-S08. Falsifiability : stacked f1_buy ≥ best_single + 0.02. Triggered only if Phases A+B+C close without reaching the Epic-level success criteria (§4).


9. References

ADR alignment

ADR Application
ADR-14 (multi-fold) every Story uses ≥ 5-fold CV with purge+embargo
ADR-15 (theta calibrated OOS) S04 + S08 calibrate decision thresholds OOS only
ADR-25 (no silent fallback) all Stories fail-fast on missing config / data
ADR-59 (PG-only config) every FTF factor + variant set lives in ftf_config
ADR-65 (Airflow DAG params run-level) factor / variant_id / crypto_group on the trigger form ; config in PG
ADR-70 (MLOps readiness mandatory) each Story files documentation/stories/<cvn_id>/mlops_readiness.md
ADR-79 (LOCK / KEEP_AVAILABLE / ABANDON) every Story closes via the verdict tree
ADR-81 (8-state Story workflow) each Story follows New → In specification → Specified → In progress → Developed → In testing → Tested → Closed
ADR-82 (committee → OP Meeting) every plan_review + pr_review session is logged as an OP Meeting
ADR-89 (harness as plugin registry) Stories CONSUME the harness, do NOT modify it (per §2 hard constraint 2)
ADR-90 (HP externalisation) any HPO range tweak per Story lands as a Console-only change

10. Activation checklist (gate before this Epic moves to active)

  • OP wp created for the Epic (wp#155, 2026-05-13)
  • OP wps created for all 9 child Stories (wp#156-164) + GH issues (#922-930) for sprint-planning visibility — see §5.2
  • Epic descriptor body updated with the new OP wp URL + GH issue ref
  • S18 verdict closes (LOCK or KEEP_AVAILABLE) — see S18 plan §5.2 Step 5 (currently in flight via Step 0 DAG)
  • S19 closure ritual — baseline restored to f1_buy ≥ 0.40 on ≥ 4/5 cryptos for ALL 3 model types
  • Operator runs the canonical post-S19 baseline FTF + captures the run_id as the reference baseline for this Epic (the cross-Story comparison invariant per §2 hard constraint 4)
  • First Story (S01 wp#156) plan dossier authored under documentation/reviews/<date>-cvn-n001-eh-s01-simple-model-baseline-plan.md + plan_review committee session scheduled per ADR-68
  • Next-sprint OP version created with S01-S03 (Phase A) pulled in as the Specified buffer (Stories transitioned from NewIn specificationSpecified per ADR-81)
  • S01 transition Specified → In progress triggered by operator pickup (single-WIP per ADR-81)

Until the bold items are checked, the Epic stays in planning state — the OP wp exists for visibility but no Story is In progress. The first 3 boxes are checked because they were addressed by this sprint-planning ritual (2026-05-13).


11. Closure check

When the Epic OP wp status flips to Closed, append a final paragraph here citing :

  • closure verdict (LOCK / KEEP_AVAILABLE / ABANDON at Epic level per §4 success criteria)
  • the closing FTF run_id (the canonical baseline against which all Stories were measured)
  • per-Story verdict summary (which Stories landed F1_LOCK, which were ABANDON-on-gate, which were skipped)
  • the F1 plan §10 final track status snapshot SHA

Until then, this file stays in planning state.