Plan dossier — Track 1 leakage root-cause + purge_bars sensitivity sweep¶
Date : 2026-05-05
Story : CVN-N001-EE-S14 (OP wp#103)
GH issue : #806
Author : Dominique (operator) + Claude
Session type : plan_review (per ADR-68 — committee invocation gated on memory project_committee_keys_dead_2026-05-02 ; smoke-test before invoking, waiver path available)
Severity : P2 — investigation gating Track 1 LOCK candidacy + Track 12 launch
Sequencing : per F1_BUY_BOOST_PLAN.md §6 — Track 12 (frac diff) is NOT cleared until this investigation produces a verdict ; Track 11 (ensemble diversity) is unaffected (independent feature set).
Committee plan_review : ✅ PASSED (session dd248118, 2026-05-05, 5 experts strong consensus, no blockers, 14 recs). High-value amendments adopted in this dossier : (1) Phase A power n=25 → n=50 (rec #1), (2) Phase B densified with purge=2 + purge=15 (rec #2), (3) Phase D dossier MUST report Sortino alongside f1_buy on the verdict variant (rec #4), (4) explicit α=0.05 statement (rec #6). Deferred to follow-up Stories : 2:1 purge:embargo ratio ADR (rec #11), permutation test for Phase A (rec #8), live production timing validation (rec #9), post-deployment drift monitoring (rec #10), p-threshold sensitivity analysis (rec #13), high-vol stress-test variant (rec #14).
1. Context — why this investigation, why now¶
Track 1 (BTC cross-asset features, wp#43) was sweep-tested 2026-05-02 with the full 6-variant FTF matrix. The mandatory leakage check per parent plan dossier §4.6 failed :
| Test | Result |
|---|---|
Paired t-test on f1_buy(btc_full_purge0) − f1_buy(btc_full) over 25 (crypto, fold) cells |
p = 0.0401, t = 2.171, Cohen d = +0.434, Δ = +0.00727 |
| Plan §4.6 verdict rule | "If the paired difference is significantly positive (BH-corrected p < 0.05) → leakage suspected → ABANDON Track 1 pending root-cause investigation." |
But Sortino strongly contradicts : on the same 25 cells, canonical btc_full (purge=20) BEATS btc_full_purge0 on Sortino by +0.32 (1.710 vs 1.390, +23%). The leakage signal lives at the per-prediction level (ML metric f1_buy) but does NOT translate into trade-level economic lift. Diagnostic in Track 1 results dossier §5.1 + §5.2.
Three candidate interpretations live in the original results dossier and structure this investigation :
- (a) Real but small ADR-14 violation : one or more BTC features (likely
btc_correlation_15m_lag5,btc_z_score_close, orbtc_realized_vol_24h) include same-bar BTC info that overlaps with the altcoin's H4 label window. Canonical purge=20 already plugs this — gate is correct, signal is real, no remediation needed beyond confirming purge=20 is sufficient. - (b) Production-exploitable signal mistakenly purged : BTC's bar-i close IS available at altcoin's bar-i decision time in production. The "leakage" detected by the check might be a real signal we're conservatively discarding via the 5h purge window. Adjusting
purge_barsdownward could give bigger lift. - (c) Statistical noise : at p=0.0401 the gate just fails ; with a slightly different fold split or HPO seed it could pass. Distinguishing requires a deep re-sweep.
Track 12 (frac diff + interactions) gate stays NOT cleared until this investigation produces an answer. Track 1's own LOCK candidacy is on hold ; the FTF factor stays in MODEL_FACTORS per ADR-79 invariant 6 — the leakage investigation may produce a corrected variant set that re-opens LOCK candidacy.
2. Hypotheses (falsifiable)¶
Phase A hypothesis — per-feature attribution¶
H0_A (null) : the +0.0073 f1_buy lift on btc_full_purge0 vs btc_full is distributed uniformly across the 6 BTC features — no single feature dominates the leakage signal.
H1_A (alternative) : ≥ 1 BTC feature contributes statistically more leakage than others (paired t-test on f1_buy(feature_i_only_purge0) − f1_buy(none) produces a higher Cohen d than the average, with BH-corrected p < 0.10 for ≥ 1 feature).
Pre-registered prediction : the time-aligned features (btc_correlation_15m_lag5, btc_z_score_close, btc_realized_vol_24h — all derived from same-bar or recent-bar BTC state) carry more leakage than the directional return features (btc_return_1h/4h/24h). If predicted, this localises which features need the strongest purging.
Phase B hypothesis — purge_bars sensitivity¶
H0_B (null) : f1_buy is monotonically non-increasing in purge_bars ∈ {0, 5, 10, 20, 40} (more purging = more lost signal, no inflection point — i.e., f1_buy decreases or stays constant as purge_bars increases) → canonical purge=20 is the right balance.
H1_B (alternative) : f1_buy shows an inflection in the [0, 20] range — there exists a purge_bars* < 20 where the lift over baseline peaks AND no leakage is detected (paired test vs purge=0 not significant). If purge_bars* ∈ [5, 15], canonical can be relaxed for bigger lift.
Pre-registered prediction : the curve will be roughly monotonic from purge=0 to purge=40 with most of the leakage absorbed in the [0, 5] range — i.e., purge=5 will look statistically similar to purge=20 on the leakage check while preserving more signal. If true, the canonical can move to purge=5 ; if false, canonical stays at purge=20.
Phase D hypothesis — outcome decision tree¶
The Phase A + Phase B results combine into one of three verdicts (decision tree in §7) :
| Verdict | Trigger | Action |
|---|---|---|
| CONFIRM canonical (purge=20) | Phase B shows monotonic curve OR purge=20 is the lowest level where leakage check passes | Re-sweep btc_full @ purge=20 at deep mode → corrected Track 1 dossier per ADR-79 |
| RELAX canonical (purge ∈ {5, 10}) | Phase B shows inflection at purge=5 or purge=10 with no leakage signal | Update FTF factor canonical → re-sweep at deep mode → corrected Track 1 dossier |
| TIGHTEN canonical (purge ≥ 40) | Phase B shows leakage still significant at purge=20 (i.e., the original gate failure was NOT marginal) | Update canonical to purge=40 → re-sweep ; expect Sortino to decay further |
3. Variant matrix¶
3.1 Phase A — per-feature leakage ablation (8 variants)¶
Adds 6 single-feature variants to the existing btc_features factor matrix in src/commun/finetune/ablation_matrix.py. Each variant tests one BTC feature alone at purge=0 ; comparing each to the none baseline isolates which feature(s) drive the leakage-pattern lift.
| Variant | Features active | Purge | Notes |
|---|---|---|---|
none (existing) |
0 | n/a | Reference baseline (BTC-blind) |
btc_full (existing) |
6 (full set) | 20 | Canonical — leakage-clean reference |
btc_full_purge0 (existing) |
6 (full set) | 0 | Leakage-permitted reference (the one that triggered the gate) |
btc_return_1h_only_purge0 (NEW) |
1 (btc_return_1h) |
0 | Tests directional 1h return alone |
btc_return_4h_only_purge0 (NEW) |
1 (btc_return_4h) |
0 | Tests directional 4h return alone |
btc_return_24h_only_purge0 (NEW) |
1 (btc_return_24h) |
0 | Tests directional 24h return alone |
btc_realized_vol_24h_only_purge0 (NEW) |
1 (btc_realized_vol_24h) |
0 | Tests volatility alone |
btc_z_score_close_only_purge0 (NEW) |
1 (btc_z_score_close) |
0 | Tests close z-score alone |
btc_correlation_15m_lag5_only_purge0 (NEW) |
1 (btc_correlation_15m_lag5) |
0 | Tests lagged correlation alone |
Statistical analysis : paired t-test on f1_buy(feature_i_only_purge0) − f1_buy(none) across 25 (crypto, fold) cells, BH-corrected across the 6 single-feature comparisons. Cohen's d for effect size. Output : ranking of features by leakage-pattern contribution.
3.2 Phase B — purge_bars sensitivity sweep (7 variants ; densified per committee dd248118 rec #2)¶
Extends the existing matrix. The btc_full @ purge=20 (= existing btc_full) and purge=10 (= existing btc_full_purge10) and purge=0 (= existing btc_full_purge0) are reused ; 4 new variants close + densify the curve.
| Variant | Features | Purge | Source |
|---|---|---|---|
btc_full_purge0 (existing) |
6 (full set) | 0 | Reused — leakage-permitted |
btc_full_purge2 (NEW) |
6 (full set) | 2 | Densifies near zero — committee dd248118 rec #2 |
btc_full_purge5 (NEW) |
6 (full set) | 5 | Bridges 0 → 10 ; main candidate for relaxation |
btc_full_purge10 (existing) |
6 (full set) | 10 | Reused |
btc_full_purge15 (NEW) |
6 (full set) | 15 | Densifies near canonical — committee dd248118 rec #2 |
btc_full (existing, = btc_full_purge20) |
6 (full set) | 20 | Reused — current canonical |
btc_full_purge40 (NEW) |
6 (full set) | 40 | Tightening sanity ; expect signal decay |
Statistical analysis : pairwise paired t-test on f1_buy(purge=k) − f1_buy(none) for k ∈ {0, 2, 5, 10, 15, 20, 40}, BH-corrected at α = 0.05 (per committee dd248118 rec #6 — explicit alpha statement). Plot the curve f1_buy vs purge_bars with CI95 bands. Identify the first k* where the leakage check (paired t vs purge=0) is not significant — that's the minimum production-feasible purge.
3.3 Total cell count¶
Phase A is now run at power_mode=medium (n=50 per variant) per committee dd248118 rec #1 — improves Cohen-d ranking confidence at +1h compute cost vs the original n=25 plan.
- Phase A : 6 NEW variants × 5 cryptos (BTC, ETH, SOL, AAVE, UNI) × 5 folds × 2 (medium n=50) × 1 PTE = 300 cells + reuse 75 cells from Phase 1 sweep
ftf_20260501_230526_2483f9fornone,btc_full,btc_full_purge0. Total 375 cells. - Phase B : 4 NEW variants × 5 cryptos × 5 folds × 1 PTE = 100 cells + reuse 75 cells for
btc_full_purge0,btc_full_purge10,btc_full. Total 175 cells.
Combined : 400 NEW cells, ~2h compute @ medium mode for Phase A, ~45 min Phase B. Total ~2.75h compute (vs ~1.5h with original plan ; +1.25h cost for tighter Phase A statistics + denser Phase B curve). Deep-mode rerun of the verdict variant (Phase D) is +3-4h.
4. Implementation path¶
4.1 Phase A code changes¶
File : src/commun/pipeline/btc_features.py
Extend _BTC_FEATURE_COLUMNS with 6 single-feature sets (one per feature) :
_BTC_FEATURE_COLUMNS = {
"min": frozenset({"btc_return_1h", "btc_return_4h", "btc_return_24h"}),
"full": frozenset({...}), # existing
"vol_only": frozenset({"btc_realized_vol_24h", "btc_z_score_close", "btc_correlation_15m_lag5"}),
# NEW — Phase A per-feature ablation (CVN-N001-EE-S14)
"return_1h_only": frozenset({"btc_return_1h"}),
"return_4h_only": frozenset({"btc_return_4h"}),
"return_24h_only": frozenset({"btc_return_24h"}),
"realized_vol_24h_only": frozenset({"btc_realized_vol_24h"}),
"z_score_close_only": frozenset({"btc_z_score_close"}),
"correlation_15m_lag5_only": frozenset({"btc_correlation_15m_lag5"}),
}
The compute_btc_features() function already filters its output by the requested set — no change to the compute logic. The 6 single-feature sets are pure data-driven additions.
File : src/commun/finetune/ablation_matrix.py
Extend the btc_features factor's env_vars dict with 6 NEW Phase A entries :
"btc_return_1h_only_purge0": {
"CVN_BTC_FEATURES_ENABLED": "1",
"CVN_BTC_FEATURES_SET": "return_1h_only",
"CVN_BTC_PURGE_BARS": "0",
"CVN_BTC_EMBARGO_BARS": "0",
},
# ... 5 more single-feature variants (same structure, different SET)
4.2 Phase B code changes¶
File : src/commun/finetune/ablation_matrix.py
Extend btc_features.env_vars with 4 NEW Phase B entries (densified per committee dd248118 rec #2) :
"btc_full_purge2": {
"CVN_BTC_FEATURES_ENABLED": "1",
"CVN_BTC_FEATURES_SET": "full",
"CVN_BTC_PURGE_BARS": "2",
"CVN_BTC_EMBARGO_BARS": "1",
},
"btc_full_purge5": {
"CVN_BTC_FEATURES_ENABLED": "1",
"CVN_BTC_FEATURES_SET": "full",
"CVN_BTC_PURGE_BARS": "5",
"CVN_BTC_EMBARGO_BARS": "2",
},
"btc_full_purge15": {
"CVN_BTC_FEATURES_ENABLED": "1",
"CVN_BTC_FEATURES_SET": "full",
"CVN_BTC_PURGE_BARS": "15",
"CVN_BTC_EMBARGO_BARS": "7",
},
"btc_full_purge40": {
"CVN_BTC_FEATURES_ENABLED": "1",
"CVN_BTC_FEATURES_SET": "full",
"CVN_BTC_PURGE_BARS": "40",
"CVN_BTC_EMBARGO_BARS": "20",
},
embargo_bars scales with purge_bars per the existing canonical proportions (purge=20 → embargo=10, ratio 2:1, rounded down for low purge values to preserve a non-zero embargo). Committee dd248118 rec #11 : the 2:1 ratio rationale should be lifted into a follow-up ADR ; deferred to a separate Story to keep this PR's scope tight.
4.3 Phase A + B guardrail (per ADR-58)¶
ADR-58 requires every FTF factor to have a guardrail + integration test. The existing btc_features factor already has tests/integration/test_btc_features_ablation.py (or equivalent — verify path during impl). The guardrail extension :
Test 1 (unit) : test_btc_features_single_feature_sets — for each of the 6 NEW single-feature sets, assert feature_columns_for_set(name) returns exactly one column matching the canonical naming.
Test 2 (unit) : test_btc_features_purge_5_and_40 — assert compute_btc_features(..., purge_bars=5) produces columns shifted by exactly 5 rows (regression bar on the ADR-14 invariant).
Test 3 (integration) : test_ablation_matrix_btc_features_phase_a_b_variants — assert the FTF factor has the 10 NEW variants (6 Phase A *_only_purge0 + 4 Phase B btc_full_purgeN for N∈{2,5,15,40}) with the expected env_var values.
4.4 Phase C — temporal contract analysis (no code, dossier markdown)¶
Per-feature data-flow table in the verdict dossier (documentation/missions/ml-boost/2026-05-15-track1-leakage-investigation-results.md) :
| Feature | BTC bars read | Reads same-bar BTC ? | Min safe purge_bars (theoretical) |
|---|---|---|---|
btc_return_1h |
(t-4) close vs t close | ⚠ same-bar input | 1 (or 0 if BTC closes simultaneously with altcoin) |
btc_return_4h |
(t-16) close vs t close | ⚠ same-bar input | 1 |
btc_return_24h |
(t-96) close vs t close | ⚠ same-bar input | 1 |
btc_realized_vol_24h |
rolling std over [t-96, t] | ⚠ same-bar input | 1 |
btc_z_score_close |
(close[t] − μ) / σ over [t-96, t] | ⚠ same-bar input | 1 |
btc_correlation_15m_lag5 |
corr(target[t-5..t], BTC[t-5..t]) | ⚠ same-bar input | 1 |
The "min safe purge_bars (theoretical)" is the value at which the feature contains zero same-bar BTC information. The current canonical purge=20 (= 5 hours) is conservative vs the theoretical minimum of 1 (= 15 min) — Phase B tests whether the conservatism costs predictive signal.
Phase C is a thinking artefact, not new code.
4.5 Phase D — verdict dossier + ADR-79 closure¶
File : documentation/missions/ml-boost/2026-05-15-track1-leakage-investigation-results.md (NEW)
Structure per ADR-79 8-step workflow :
1. Context (link to this plan + parent dossier)
2. Sweep results — Phase A per-feature attribution table
3. Sweep results — Phase B f1_buy vs purge_bars curve
4. Verdict per §7 decision tree
5. Re-sweep results (deep mode, n=170 per variant) for the verdict variant — MUST report both f1_buy AND Sortino for the verdict variant (per committee dd248118 rec #4 — the 2026-05-02 dossier surfaced the f1_buy/Sortino divergence ; the verdict must explicitly close the loop on whether the new canonical preserves Sortino lift, not just f1_buy)
6. Acceptance criteria check
7. Lock decision (LOCK / KEEP_AVAILABLE / ABANDON)
8. Cross-track impact — Track 1 wp#43 reopen + Track 12 gate state
Then per ADR-80 : make ftf-extract for the deep-mode run → PDF report committed under documentation/missions/ml-boost/reports/.
5. Acceptance criteria¶
| # | Criterion | Phase | Evidence |
|---|---|---|---|
| 1 | 10 NEW variants live in btc_features factor matrix (6 Phase A *_only_purge0 + 4 Phase B btc_full_purgeN) |
A+B | python -c "from commun.finetune.ablation_matrix import DATA_FACTORS; print(len([v for f in DATA_FACTORS if f.name=='btc_features' for v in f.env_vars]))" returns 16 (6 pre-existing + 10 NEW) |
| 2 | Unit tests for 6 NEW single-feature sets pass | A | pytest tests/unit/pipeline/test_btc_features.py -k "single_feature" |
| 3 | Unit tests for purge=5 + purge=40 ADR-14 invariant pass | B | pytest tests/unit/pipeline/test_btc_features.py -k "purge_5_and_40" |
| 4 | Phase A sweep complete with paired t-test results per feature | A | Run ID + paired-test table in verdict dossier §2 |
| 5 | Phase B sweep complete with f1_buy curve |
B | Run ID + curve plot in verdict dossier §3 |
| 6 | Phase C temporal contract table in verdict dossier | C | Markdown table with min safe purge per feature |
| 7 | Phase D verdict recorded with decision tree branch picked | D | Verdict statement + corrected canonical purge_bars value |
| 8 | Deep-mode re-sweep of verdict variant complete | D | Run ID + ADR-79 8-step results dossier |
| 9 | Track 1 wp#43 transitioned (reopened or stays Closed depending on verdict) | D | OP wp#43 status update with audit-trail comment |
| 10 | Track 12 gate state updated (still NOT cleared OR cleared with new canonical) | D | F1 plan §6 update + Track 12 plan dossier note |
| 11 | MLOps readiness filled (process+ML hybrid Story) | D | documentation/stories/CVN-N001-EE-S14/mlops_readiness.md |
| 12 | Committee pr_review invoked OR waiver per memory project_committee_keys_dead_2026-05-02 |
impl PR | Session JSON or waiver line in PR body |
6. Out of scope¶
- Re-sweeping Track 1 ALL variants — only the verdict variant gets the deep-mode rerun. The 6 single-feature variants are smoke-mode only (their purpose is attribution, not LOCK candidacy).
- Per-crypto BTC-feature gating (e.g., disabling BTC features for UNIUSDC where they under-perform) — separate Story if/when relevant ; out of scope here because per-asset gating is a different lever from the leakage investigation.
- Adaptive
purge_bars(per-regime or volatility-adaptive purging) — premature ; constantpurge_barsper ADR-14 standard is the contract. - Reformulating ADR-14 — if Phase B finds purge=5 is sufficient, the FTF canonical changes but ADR-14's invariant ("training time t uses only data ≤ t − purge_bars") stays unchanged ; only the canonical numeric value updates.
- Online BTC feature computation in paper/live — already excluded by parent plan §6 ; backtest-only stays in scope here too.
- Track 12 (frac diff) sweep — gated on this investigation's verdict ; Track 12 plan dossier may be drafted in parallel but the sweep waits.
7. Falsifiability + verdict decision tree¶
7.1 Decision tree¶
Phase A result :
├─ Single feature i has Cohen d ≥ 0.3 + BH-p < 0.10
│ → leakage attributed to feature i
│ ├─ Phase B shows purge=k* (k* < 20) clears leakage check
│ │ → VERDICT : RELAX canonical to purge=k*
│ │ (re-sweep at purge=k* + corrected dossier)
│ └─ Phase B shows purge=20 is the lowest leakage-clean level
│ → VERDICT : CONFIRM canonical at purge=20
│ (re-sweep at deep mode + corrected dossier)
│
└─ No single feature dominates (all d < 0.3)
→ leakage is distributed / statistical noise
├─ Phase B shows monotonic decay → CONFIRM canonical purge=20
│ (re-sweep at deep mode + corrected dossier)
├─ Phase B shows inflection at purge=5 with leakage clean
│ → VERDICT : RELAX canonical to purge=5
└─ Phase B shows leakage still at purge=20
→ VERDICT : TIGHTEN canonical to purge=40
(expect Sortino to decay further → may close ABANDON)
7.2 Pre-registered falsifiability¶
The investigation terminates with one of three verdicts (CONFIRM, RELAX, TIGHTEN) — there is no "run more sweeps" escape hatch. If Phase A + Phase B results are statistically inconclusive (all p ≥ 0.10, all d < 0.2), the verdict defaults to CONFIRM canonical (purge=20) — interpretation (c) of §1, statistical noise, original gate failure was marginal at p=0.0401.
7.3 Rollback path¶
This Story is an investigation — no production behaviour change ships from the impl PR alone. The verdict's downstream re-sweep + ADR-79 closure ships in a follow-up PR (per ADR-80 single-PR closure mechanic) ; rollback paths apply there.
For the investigation impl PR :
- Rollback the new variants : revert PR ; the existing 6-variant btc_features matrix continues to work unchanged.
- Rollback the new feature sets : same — the _BTC_FEATURE_COLUMNS additions are pure dict additions ; reverting removes them without touching the existing entries.
- No runtime model deployed : the variants only run in FTF sweeps ; no model trained from these variants ever reaches inference until the verdict's re-sweep + Console LOCK action.
8. Risks¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Phase A single-feature variants produce all-zero contribution (signal lives in interactions) | medium | medium | Phase A's null result still falsifies H1_A and points to Phase B as the discriminator ; not a blocker |
| Phase B curve is flat (no inflection) | medium | low | Confirms canonical purge=20 with no new information — clean CONFIRM verdict, fast Phase D |
| Compute budget overrun (1.5h smoke + 4h deep) | low | medium | Smoke mode is bounded by power_mode=medium (n=50) in ftf_config per amendment dd248118 — committee rec #1 raised n from 25→50 for Cohen-d ranking confidence ; deep mode is operator-triggered after smoke results, not back-to-back |
| Single-feature smoke n=50/variant is under-powered for d=0.2 detection | medium | low | Pre-registered : Phase A is for attribution / ranking, NOT for individual feature LOCK candidacy ; per §3.3 power_mode=medium (n=50) is sufficient to rank effect sizes by Cohen's d |
| Phase D verdict fails to converge (results genuinely contradict) | low | high | Pre-registered fallback to CONFIRM (interpretation (c) statistical noise) — investigation terminates with the original ABANDON standing |
| ADR-58 guardrail tests miss a regression in compute_btc_features() purge logic | low | high | Test 2 (purge=5 and purge=40 shift invariant) is the explicit regression bar ; failure → CI blocks merge |
9. Sequencing + cross-Epic impact¶
9.1 Within this Story¶
PR 1 (impl) : 10 NEW variants (6 Phase A + 4 Phase B) + 6 NEW feature sets + guardrail tests + this dossier
→ CR + committee → merge → wp#103 Specified → In progress → Developed
Operator-triggered FTF sweeps :
1. Phase A smoke run via Airflow launcher (factor=btc_features, variants=phase_a_set, power_mode=medium per dd248118 rec #1, n=50/variant)
2. Phase B smoke run via Airflow launcher (factor=btc_features, variants=phase_b_set, power_mode=medium per dd248118 rec #1, n=50/variant)
3. make ftf-extract on both runs → PDFs
Operator analysis :
4. Phase C temporal contract table (markdown only)
5. Phase D verdict per §7 decision tree
PR 2 (verdict + re-sweep) : verdict dossier + (if RELAX/TIGHTEN) updated canonical in matrix
→ CR + committee → merge → ADR-79 8-step closure → wp#103 Closed
Re-sweep at deep mode (operator-triggered after PR 2)
→ corrected Track 1 dossier per ADR-80 → wp#43 reopen or stays Closed
9.2 Cross-Epic impact¶
- Track 1 wp#43 : currently
Closed ABANDONED; this investigation may transition it toReopened(if RELAX with measurable lift) or leave itClosed(if CONFIRM and no lift). - Track 11 wp#?? (ensemble diversity) : independent — proceeds unchanged with OHLCV-only feature set.
- Track 12 wp#?? (frac diff + interactions) : gate NOT cleared until this investigation produces a verdict ; plan dossier may be drafted in parallel but sweep waits.
- F1 plan §6 : update needed in PR 2 to reflect the verdict's impact on Track 1's slot (re-LOCK candidate vs definitive ABANDON) and Track 12's gate state.
MODEL_FACTORSinvariant 6 (ADR-79) : factorbtc_featuresstays inMODEL_FACTORS✓ (per the existing 2026-05-02 results dossier §11.2).
10. References¶
| Ref | Purpose |
|---|---|
| OP wp#103 | Story tracking |
| GH #806 | Issue tracking |
| OP wp#43 | Parent Track 1 Story (Closed ABANDONED 2026-05-02 ; may reopen post-verdict) |
Parent plan dossier 2026-04-30-track1-btc-features-plan.md §4.6 |
Mandatory leakage check spec |
Track 1 results dossier 2026-05-02-track1-btc-features-results.md §5.1 + §5.2 |
Failure diagnostic + 3 candidate interpretations |
FTF sweep run ftf_20260501_230526_2483f9_ATR0.5_1.5_H4 |
Source data for re-extraction (paired t-test cells) |
F1 plan F1_BUY_BOOST_PLAN.md §5 + §6 |
Track 1 hypothesis + Track 12 gating |
| ADR-14 | Purging invariant (training time t ≤ t − purge_bars) |
| ADR-58 | FTF factor guardrail + integration test requirement |
| ADR-79 | FTF Story closure 8-step workflow |
| ADR-80 | FTF post-run extraction + dossier mechanics |
| ADR-81 | 8-state Story workflow |
Memory project_committee_keys_dead_2026-05-02 |
Committee invocation gating |
11. Plan-review questions for committee (or operator self-review under waiver)¶
- Phase A power : ~~is n=25/variant sufficient~~ — resolved by committee
dd248118rec #1 : Phase A ships atpower_mode=medium(n=50/variant) for tighter Cohen-d ranking. Resolved tradeoff : +1h compute accepted for higher confidence in feature attribution. - Phase B granularity : should we add
btc_full_purge2andbtc_full_purge15to densify the curve in the [0, 20] inflection zone, or is the {0, 5, 10, 20, 40} grid sufficient? Tradeoff : 2 more variants × 50 cells = +20 min compute. - Phase D verdict gating on per-asset 4/5 : the parent plan §4.4 requires
≥ 4/5 cryptos individually improvefor LOCK ; should the corrected re-sweep apply this gate, or should the investigation Story relax it given UNIUSDC is the known under-performer? Default : keep the 4/5 gate (no relaxation, Story is investigation not LOCK chase). - Pre-registered fallback to CONFIRM if all p ≥ 0.10 — is this the right default? Alternative : default to TIGHTEN (purge=40) on inconclusive results to err on the safe side. Tradeoff : safer but loses signal ; CONFIRM keeps the current ABANDON verdict standing.
- Sequencing PR 1 (impl) + PR 2 (verdict) : should we ship as one combined PR after the operator runs the sweeps, or two PRs (impl + verdict)? Two-PR shipping splits review surface and lets the impl land before the operator commits ~1.5h compute ; one-PR is leaner but requires the operator to run sweeps before opening any PR.