Plan dossier — Track 1 leakage root-cause + purge_bars sensitivity sweep¶

Date : 2026-05-05 Story : CVN-N001-EE-S14 (OP wp#103) GH issue : #806 Author : Dominique (operator) + Claude Session type : plan_review (per ADR-68 — committee invocation gated on memory project_committee_keys_dead_2026-05-02 ; smoke-test before invoking, waiver path available) Severity : P2 — investigation gating Track 1 LOCK candidacy + Track 12 launch Sequencing : per F1_BUY_BOOST_PLAN.md §6 — Track 12 (frac diff) is NOT cleared until this investigation produces a verdict ; Track 11 (ensemble diversity) is unaffected (independent feature set).

Committee plan_review : ✅ PASSED (session dd248118, 2026-05-05, 5 experts strong consensus, no blockers, 14 recs). High-value amendments adopted in this dossier : (1) Phase A power n=25 → n=50 (rec #1), (2) Phase B densified with purge=2 + purge=15 (rec #2), (3) Phase D dossier MUST report Sortino alongside f1_buy on the verdict variant (rec #4), (4) explicit α=0.05 statement (rec #6). Deferred to follow-up Stories : 2:1 purge:embargo ratio ADR (rec #11), permutation test for Phase A (rec #8), live production timing validation (rec #9), post-deployment drift monitoring (rec #10), p-threshold sensitivity analysis (rec #13), high-vol stress-test variant (rec #14).

1. Context — why this investigation, why now¶

Track 1 (BTC cross-asset features, wp#43) was sweep-tested 2026-05-02 with the full 6-variant FTF matrix. The mandatory leakage check per parent plan dossier §4.6 failed :

Test	Result
Paired t-test on `f1_buy(btc_full_purge0) − f1_buy(btc_full)` over 25 (crypto, fold) cells	p = 0.0401, t = 2.171, Cohen d = +0.434, Δ = +0.00727
Plan §4.6 verdict rule	"If the paired difference is significantly positive (BH-corrected p < 0.05) → leakage suspected → ABANDON Track 1 pending root-cause investigation."

But Sortino strongly contradicts : on the same 25 cells, canonical btc_full (purge=20) BEATS btc_full_purge0 on Sortino by +0.32 (1.710 vs 1.390, +23%). The leakage signal lives at the per-prediction level (ML metric f1_buy) but does NOT translate into trade-level economic lift. Diagnostic in Track 1 results dossier §5.1 + §5.2.

Three candidate interpretations live in the original results dossier and structure this investigation :

(a) Real but small ADR-14 violation : one or more BTC features (likely btc_correlation_15m_lag5, btc_z_score_close, or btc_realized_vol_24h) include same-bar BTC info that overlaps with the altcoin's H4 label window. Canonical purge=20 already plugs this — gate is correct, signal is real, no remediation needed beyond confirming purge=20 is sufficient.
(b) Production-exploitable signal mistakenly purged : BTC's bar-i close IS available at altcoin's bar-i decision time in production. The "leakage" detected by the check might be a real signal we're conservatively discarding via the 5h purge window. Adjusting purge_bars downward could give bigger lift.
(c) Statistical noise : at p=0.0401 the gate just fails ; with a slightly different fold split or HPO seed it could pass. Distinguishing requires a deep re-sweep.

Track 12 (frac diff + interactions) gate stays NOT cleared until this investigation produces an answer. Track 1's own LOCK candidacy is on hold ; the FTF factor stays in MODEL_FACTORS per ADR-79 invariant 6 — the leakage investigation may produce a corrected variant set that re-opens LOCK candidacy.

2. Hypotheses (falsifiable)¶

Phase A hypothesis — per-feature attribution¶

H0_A (null) : the +0.0073 f1_buy lift on btc_full_purge0 vs btc_full is distributed uniformly across the 6 BTC features — no single feature dominates the leakage signal.

H1_A (alternative) : ≥ 1 BTC feature contributes statistically more leakage than others (paired t-test on f1_buy(feature_i_only_purge0) − f1_buy(none) produces a higher Cohen d than the average, with BH-corrected p < 0.10 for ≥ 1 feature).

Pre-registered prediction : the time-aligned features (btc_correlation_15m_lag5, btc_z_score_close, btc_realized_vol_24h — all derived from same-bar or recent-bar BTC state) carry more leakage than the directional return features (btc_return_1h/4h/24h). If predicted, this localises which features need the strongest purging.

Phase B hypothesis — purge_bars sensitivity¶

H0_B (null) : f1_buy is monotonically non-increasing in purge_bars ∈ {0, 5, 10, 20, 40} (more purging = more lost signal, no inflection point — i.e., f1_buy decreases or stays constant as purge_bars increases) → canonical purge=20 is the right balance.

H1_B (alternative) : f1_buy shows an inflection in the [0, 20] range — there exists a purge_bars* < 20 where the lift over baseline peaks AND no leakage is detected (paired test vs purge=0 not significant). If purge_bars* ∈ [5, 15], canonical can be relaxed for bigger lift.

Pre-registered prediction : the curve will be roughly monotonic from purge=0 to purge=40 with most of the leakage absorbed in the [0, 5] range — i.e., purge=5 will look statistically similar to purge=20 on the leakage check while preserving more signal. If true, the canonical can move to purge=5 ; if false, canonical stays at purge=20.

Phase D hypothesis — outcome decision tree¶

The Phase A + Phase B results combine into one of three verdicts (decision tree in §7) :

Verdict	Trigger	Action
CONFIRM canonical (purge=20)	Phase B shows monotonic curve OR purge=20 is the lowest level where leakage check passes	Re-sweep `btc_full @ purge=20` at deep mode → corrected Track 1 dossier per ADR-79
RELAX canonical (purge ∈ {5, 10})	Phase B shows inflection at purge=5 or purge=10 with no leakage signal	Update FTF factor canonical → re-sweep at deep mode → corrected Track 1 dossier
TIGHTEN canonical (purge ≥ 40)	Phase B shows leakage still significant at purge=20 (i.e., the original gate failure was NOT marginal)	Update canonical to purge=40 → re-sweep ; expect Sortino to decay further

3. Variant matrix¶

3.1 Phase A — per-feature leakage ablation (8 variants)¶

Adds 6 single-feature variants to the existing btc_features factor matrix in src/commun/finetune/ablation_matrix.py. Each variant tests one BTC feature alone at purge=0 ; comparing each to the none baseline isolates which feature(s) drive the leakage-pattern lift.

Variant	Features active	Purge	Notes
`none` (existing)	0	n/a	Reference baseline (BTC-blind)
`btc_full` (existing)	6 (full set)	20	Canonical — leakage-clean reference
`btc_full_purge0` (existing)	6 (full set)	0	Leakage-permitted reference (the one that triggered the gate)
`btc_return_1h_only_purge0` (NEW)	1 (`btc_return_1h`)	0	Tests directional 1h return alone
`btc_return_4h_only_purge0` (NEW)	1 (`btc_return_4h`)	0	Tests directional 4h return alone
`btc_return_24h_only_purge0` (NEW)	1 (`btc_return_24h`)	0	Tests directional 24h return alone
`btc_realized_vol_24h_only_purge0` (NEW)	1 (`btc_realized_vol_24h`)	0	Tests volatility alone
`btc_z_score_close_only_purge0` (NEW)	1 (`btc_z_score_close`)	0	Tests close z-score alone
`btc_correlation_15m_lag5_only_purge0` (NEW)	1 (`btc_correlation_15m_lag5`)	0	Tests lagged correlation alone

Statistical analysis : paired t-test on f1_buy(feature_i_only_purge0) − f1_buy(none) across 25 (crypto, fold) cells, BH-corrected across the 6 single-feature comparisons. Cohen's d for effect size. Output : ranking of features by leakage-pattern contribution.

3.2 Phase B — purge_bars sensitivity sweep (7 variants ; densified per committee `dd248118` rec #2)¶

Extends the existing matrix. The btc_full @ purge=20 (= existing btc_full) and purge=10 (= existing btc_full_purge10) and purge=0 (= existing btc_full_purge0) are reused ; 4 new variants close + densify the curve.

Variant	Features	Purge	Source
`btc_full_purge0` (existing)	6 (full set)	0	Reused — leakage-permitted
`btc_full_purge2` (NEW)	6 (full set)	2	Densifies near zero — committee dd248118 rec #2
`btc_full_purge5` (NEW)	6 (full set)	5	Bridges 0 → 10 ; main candidate for relaxation
`btc_full_purge10` (existing)	6 (full set)	10	Reused
`btc_full_purge15` (NEW)	6 (full set)	15	Densifies near canonical — committee dd248118 rec #2
`btc_full` (existing, = `btc_full_purge20`)	6 (full set)	20	Reused — current canonical
`btc_full_purge40` (NEW)	6 (full set)	40	Tightening sanity ; expect signal decay

Statistical analysis : pairwise paired t-test on f1_buy(purge=k) − f1_buy(none) for k ∈ {0, 2, 5, 10, 15, 20, 40}, BH-corrected at α = 0.05 (per committee dd248118 rec #6 — explicit alpha statement). Plot the curve f1_buy vs purge_bars with CI95 bands. Identify the first k* where the leakage check (paired t vs purge=0) is not significant — that's the minimum production-feasible purge.

3.3 Total cell count¶

Phase A is now run at power_mode=medium (n=50 per variant) per committee dd248118 rec #1 — improves Cohen-d ranking confidence at +1h compute cost vs the original n=25 plan.

Phase A : 6 NEW variants × 5 cryptos (BTC, ETH, SOL, AAVE, UNI) × 5 folds × 2 (medium n=50) × 1 PTE = 300 cells + reuse 75 cells from Phase 1 sweep ftf_20260501_230526_2483f9 for none, btc_full, btc_full_purge0. Total 375 cells.
Phase B : 4 NEW variants × 5 cryptos × 5 folds × 1 PTE = 100 cells + reuse 75 cells for btc_full_purge0, btc_full_purge10, btc_full. Total 175 cells.

Combined : 400 NEW cells, ~2h compute @ medium mode for Phase A, ~45 min Phase B. Total ~2.75h compute (vs ~1.5h with original plan ; +1.25h cost for tighter Phase A statistics + denser Phase B curve). Deep-mode rerun of the verdict variant (Phase D) is +3-4h.

4. Implementation path¶

4.1 Phase A code changes¶

File : src/commun/pipeline/btc_features.py

Extend _BTC_FEATURE_COLUMNS with 6 single-feature sets (one per feature) :

_BTC_FEATURE_COLUMNS = {
    "min": frozenset({"btc_return_1h", "btc_return_4h", "btc_return_24h"}),
    "full": frozenset({...}),  # existing
    "vol_only": frozenset({"btc_realized_vol_24h", "btc_z_score_close", "btc_correlation_15m_lag5"}),
    # NEW — Phase A per-feature ablation (CVN-N001-EE-S14)
    "return_1h_only": frozenset({"btc_return_1h"}),
    "return_4h_only": frozenset({"btc_return_4h"}),
    "return_24h_only": frozenset({"btc_return_24h"}),
    "realized_vol_24h_only": frozenset({"btc_realized_vol_24h"}),
    "z_score_close_only": frozenset({"btc_z_score_close"}),
    "correlation_15m_lag5_only": frozenset({"btc_correlation_15m_lag5"}),
}

The compute_btc_features() function already filters its output by the requested set — no change to the compute logic. The 6 single-feature sets are pure data-driven additions.

File : src/commun/finetune/ablation_matrix.py

Extend the btc_features factor's env_vars dict with 6 NEW Phase A entries :

"btc_return_1h_only_purge0": {
    "CVN_BTC_FEATURES_ENABLED": "1",
    "CVN_BTC_FEATURES_SET": "return_1h_only",
    "CVN_BTC_PURGE_BARS": "0",
    "CVN_BTC_EMBARGO_BARS": "0",
},
# ... 5 more single-feature variants (same structure, different SET)

4.2 Phase B code changes¶

File : src/commun/finetune/ablation_matrix.py

Extend btc_features.env_vars with 4 NEW Phase B entries (densified per committee dd248118 rec #2) :

"btc_full_purge2": {
    "CVN_BTC_FEATURES_ENABLED": "1",
    "CVN_BTC_FEATURES_SET": "full",
    "CVN_BTC_PURGE_BARS": "2",
    "CVN_BTC_EMBARGO_BARS": "1",
},
"btc_full_purge5": {
    "CVN_BTC_FEATURES_ENABLED": "1",
    "CVN_BTC_FEATURES_SET": "full",
    "CVN_BTC_PURGE_BARS": "5",
    "CVN_BTC_EMBARGO_BARS": "2",
},
"btc_full_purge15": {
    "CVN_BTC_FEATURES_ENABLED": "1",
    "CVN_BTC_FEATURES_SET": "full",
    "CVN_BTC_PURGE_BARS": "15",
    "CVN_BTC_EMBARGO_BARS": "7",
},
"btc_full_purge40": {
    "CVN_BTC_FEATURES_ENABLED": "1",
    "CVN_BTC_FEATURES_SET": "full",
    "CVN_BTC_PURGE_BARS": "40",
    "CVN_BTC_EMBARGO_BARS": "20",
},

embargo_bars scales with purge_bars per the existing canonical proportions (purge=20 → embargo=10, ratio 2:1, rounded down for low purge values to preserve a non-zero embargo). Committee dd248118 rec #11 : the 2:1 ratio rationale should be lifted into a follow-up ADR ; deferred to a separate Story to keep this PR's scope tight.

4.3 Phase A + B guardrail (per ADR-58)¶

ADR-58 requires every FTF factor to have a guardrail + integration test. The existing btc_features factor already has tests/integration/test_btc_features_ablation.py (or equivalent — verify path during impl). The guardrail extension :

Test 1 (unit) : test_btc_features_single_feature_sets — for each of the 6 NEW single-feature sets, assert feature_columns_for_set(name) returns exactly one column matching the canonical naming.

Test 2 (unit) : test_btc_features_purge_5_and_40 — assert compute_btc_features(..., purge_bars=5) produces columns shifted by exactly 5 rows (regression bar on the ADR-14 invariant).

Test 3 (integration) : test_ablation_matrix_btc_features_phase_a_b_variants — assert the FTF factor has the 10 NEW variants (6 Phase A *_only_purge0 + 4 Phase B btc_full_purgeN for N∈{2,5,15,40}) with the expected env_var values.

4.4 Phase C — temporal contract analysis (no code, dossier markdown)¶

Per-feature data-flow table in the verdict dossier (documentation/missions/ml-boost/2026-05-15-track1-leakage-investigation-results.md) :

Feature	BTC bars read	Reads same-bar BTC ?	Min safe purge_bars (theoretical)
`btc_return_1h`	(t-4) close vs t close	⚠ same-bar input	1 (or 0 if BTC closes simultaneously with altcoin)
`btc_return_4h`	(t-16) close vs t close	⚠ same-bar input	1
`btc_return_24h`	(t-96) close vs t close	⚠ same-bar input	1
`btc_realized_vol_24h`	rolling std over [t-96, t]	⚠ same-bar input	1
`btc_z_score_close`	(close[t] − μ) / σ over [t-96, t]	⚠ same-bar input	1
`btc_correlation_15m_lag5`	corr(target[t-5..t], BTC[t-5..t])	⚠ same-bar input	1

The "min safe purge_bars (theoretical)" is the value at which the feature contains zero same-bar BTC information. The current canonical purge=20 (= 5 hours) is conservative vs the theoretical minimum of 1 (= 15 min) — Phase B tests whether the conservatism costs predictive signal.

Phase C is a thinking artefact, not new code.

4.5 Phase D — verdict dossier + ADR-79 closure¶

File : documentation/missions/ml-boost/2026-05-15-track1-leakage-investigation-results.md (NEW)

Structure per ADR-79 8-step workflow : 1. Context (link to this plan + parent dossier) 2. Sweep results — Phase A per-feature attribution table 3. Sweep results — Phase B f1_buy vs purge_bars curve 4. Verdict per §7 decision tree 5. Re-sweep results (deep mode, n=170 per variant) for the verdict variant — MUST report both f1_buy AND Sortino for the verdict variant (per committee dd248118 rec #4 — the 2026-05-02 dossier surfaced the f1_buy/Sortino divergence ; the verdict must explicitly close the loop on whether the new canonical preserves Sortino lift, not just f1_buy) 6. Acceptance criteria check 7. Lock decision (LOCK / KEEP_AVAILABLE / ABANDON) 8. Cross-track impact — Track 1 wp#43 reopen + Track 12 gate state

Then per ADR-80 : make ftf-extract for the deep-mode run → PDF report committed under documentation/missions/ml-boost/reports/.

5. Acceptance criteria¶

#	Criterion	Phase	Evidence
1	10 NEW variants live in `btc_features` factor matrix (6 Phase A `*_only_purge0` + 4 Phase B `btc_full_purgeN`)	A+B	`python -c "from commun.finetune.ablation_matrix import DATA_FACTORS; print(len([v for f in DATA_FACTORS if f.name=='btc_features' for v in f.env_vars]))"` returns 16 (6 pre-existing + 10 NEW)
2	Unit tests for 6 NEW single-feature sets pass	A	`pytest tests/unit/pipeline/test_btc_features.py -k "single_feature"`
3	Unit tests for purge=5 + purge=40 ADR-14 invariant pass	B	`pytest tests/unit/pipeline/test_btc_features.py -k "purge_5_and_40"`
4	Phase A sweep complete with paired t-test results per feature	A	Run ID + paired-test table in verdict dossier §2
5	Phase B sweep complete with `f1_buy` curve	B	Run ID + curve plot in verdict dossier §3
6	Phase C temporal contract table in verdict dossier	C	Markdown table with min safe purge per feature
7	Phase D verdict recorded with decision tree branch picked	D	Verdict statement + corrected canonical `purge_bars` value
8	Deep-mode re-sweep of verdict variant complete	D	Run ID + ADR-79 8-step results dossier
9	Track 1 wp#43 transitioned (reopened or stays Closed depending on verdict)	D	OP wp#43 status update with audit-trail comment
10	Track 12 gate state updated (still NOT cleared OR cleared with new canonical)	D	F1 plan §6 update + Track 12 plan dossier note
11	MLOps readiness filled (process+ML hybrid Story)	D	`documentation/stories/CVN-N001-EE-S14/mlops_readiness.md`
12	Committee `pr_review` invoked OR waiver per memory `project_committee_keys_dead_2026-05-02`	impl PR	Session JSON or waiver line in PR body

6. Out of scope¶

Re-sweeping Track 1 ALL variants — only the verdict variant gets the deep-mode rerun. The 6 single-feature variants are smoke-mode only (their purpose is attribution, not LOCK candidacy).
Per-crypto BTC-feature gating (e.g., disabling BTC features for UNIUSDC where they under-perform) — separate Story if/when relevant ; out of scope here because per-asset gating is a different lever from the leakage investigation.
Adaptive purge_bars (per-regime or volatility-adaptive purging) — premature ; constant purge_bars per ADR-14 standard is the contract.
Reformulating ADR-14 — if Phase B finds purge=5 is sufficient, the FTF canonical changes but ADR-14's invariant ("training time t uses only data ≤ t − purge_bars") stays unchanged ; only the canonical numeric value updates.
Online BTC feature computation in paper/live — already excluded by parent plan §6 ; backtest-only stays in scope here too.
Track 12 (frac diff) sweep — gated on this investigation's verdict ; Track 12 plan dossier may be drafted in parallel but the sweep waits.

7. Falsifiability + verdict decision tree¶

7.1 Decision tree¶

Phase A result :
├─ Single feature i has Cohen d ≥ 0.3 + BH-p < 0.10
│   → leakage attributed to feature i
│   ├─ Phase B shows purge=k* (k* < 20) clears leakage check
│   │   → VERDICT : RELAX canonical to purge=k*
│   │     (re-sweep at purge=k* + corrected dossier)
│   └─ Phase B shows purge=20 is the lowest leakage-clean level
│       → VERDICT : CONFIRM canonical at purge=20
│         (re-sweep at deep mode + corrected dossier)
│
└─ No single feature dominates (all d < 0.3)
    → leakage is distributed / statistical noise
    ├─ Phase B shows monotonic decay → CONFIRM canonical purge=20
    │   (re-sweep at deep mode + corrected dossier)
    ├─ Phase B shows inflection at purge=5 with leakage clean
    │   → VERDICT : RELAX canonical to purge=5
    └─ Phase B shows leakage still at purge=20
        → VERDICT : TIGHTEN canonical to purge=40
          (expect Sortino to decay further → may close ABANDON)

7.2 Pre-registered falsifiability¶

The investigation terminates with one of three verdicts (CONFIRM, RELAX, TIGHTEN) — there is no "run more sweeps" escape hatch. If Phase A + Phase B results are statistically inconclusive (all p ≥ 0.10, all d < 0.2), the verdict defaults to CONFIRM canonical (purge=20) — interpretation (c) of §1, statistical noise, original gate failure was marginal at p=0.0401.

7.3 Rollback path¶

This Story is an investigation — no production behaviour change ships from the impl PR alone. The verdict's downstream re-sweep + ADR-79 closure ships in a follow-up PR (per ADR-80 single-PR closure mechanic) ; rollback paths apply there.

For the investigation impl PR : - Rollback the new variants : revert PR ; the existing 6-variant btc_features matrix continues to work unchanged. - Rollback the new feature sets : same — the _BTC_FEATURE_COLUMNS additions are pure dict additions ; reverting removes them without touching the existing entries. - No runtime model deployed : the variants only run in FTF sweeps ; no model trained from these variants ever reaches inference until the verdict's re-sweep + Console LOCK action.

8. Risks¶

Risk	Likelihood	Impact	Mitigation
Phase A single-feature variants produce all-zero contribution (signal lives in interactions)	medium	medium	Phase A's null result still falsifies H1_A and points to Phase B as the discriminator ; not a blocker
Phase B curve is flat (no inflection)	medium	low	Confirms canonical purge=20 with no new information — clean CONFIRM verdict, fast Phase D
Compute budget overrun (1.5h smoke + 4h deep)	low	medium	Smoke mode is bounded by `power_mode=medium` (n=50) in `ftf_config` per amendment dd248118 — committee rec #1 raised n from 25→50 for Cohen-d ranking confidence ; deep mode is operator-triggered after smoke results, not back-to-back
Single-feature smoke n=50/variant is under-powered for d=0.2 detection	medium	low	Pre-registered : Phase A is for attribution / ranking, NOT for individual feature LOCK candidacy ; per §3.3 `power_mode=medium` (n=50) is sufficient to rank effect sizes by Cohen's d
Phase D verdict fails to converge (results genuinely contradict)	low	high	Pre-registered fallback to CONFIRM (interpretation (c) statistical noise) — investigation terminates with the original ABANDON standing
ADR-58 guardrail tests miss a regression in compute_btc_features() purge logic	low	high	Test 2 (purge=5 and purge=40 shift invariant) is the explicit regression bar ; failure → CI blocks merge

9. Sequencing + cross-Epic impact¶

9.1 Within this Story¶

PR 1 (impl) : 10 NEW variants (6 Phase A + 4 Phase B) + 6 NEW feature sets + guardrail tests + this dossier
   → CR + committee → merge → wp#103 Specified → In progress → Developed

Operator-triggered FTF sweeps :
   1. Phase A smoke run via Airflow launcher (factor=btc_features, variants=phase_a_set, power_mode=medium per dd248118 rec #1, n=50/variant)
   2. Phase B smoke run via Airflow launcher (factor=btc_features, variants=phase_b_set, power_mode=medium per dd248118 rec #1, n=50/variant)
   3. make ftf-extract on both runs → PDFs

Operator analysis :
   4. Phase C temporal contract table (markdown only)
   5. Phase D verdict per §7 decision tree

PR 2 (verdict + re-sweep) : verdict dossier + (if RELAX/TIGHTEN) updated canonical in matrix
   → CR + committee → merge → ADR-79 8-step closure → wp#103 Closed

Re-sweep at deep mode (operator-triggered after PR 2)
   → corrected Track 1 dossier per ADR-80 → wp#43 reopen or stays Closed

9.2 Cross-Epic impact¶

Track 1 wp#43 : currently Closed ABANDONED ; this investigation may transition it to Reopened (if RELAX with measurable lift) or leave it Closed (if CONFIRM and no lift).
Track 11 wp#?? (ensemble diversity) : independent — proceeds unchanged with OHLCV-only feature set.
Track 12 wp#?? (frac diff + interactions) : gate NOT cleared until this investigation produces a verdict ; plan dossier may be drafted in parallel but sweep waits.
F1 plan §6 : update needed in PR 2 to reflect the verdict's impact on Track 1's slot (re-LOCK candidate vs definitive ABANDON) and Track 12's gate state.
MODEL_FACTORS invariant 6 (ADR-79) : factor btc_features stays in MODEL_FACTORS ✓ (per the existing 2026-05-02 results dossier §11.2).

10. References¶

Ref	Purpose
OP wp#103	Story tracking
GH #806	Issue tracking
OP wp#43	Parent Track 1 Story (Closed ABANDONED 2026-05-02 ; may reopen post-verdict)
Parent plan dossier `2026-04-30-track1-btc-features-plan.md` §4.6	Mandatory leakage check spec
Track 1 results dossier `2026-05-02-track1-btc-features-results.md` §5.1 + §5.2	Failure diagnostic + 3 candidate interpretations
FTF sweep run `ftf_20260501_230526_2483f9_ATR0.5_1.5_H4`	Source data for re-extraction (paired t-test cells)
F1 plan `F1_BUY_BOOST_PLAN.md` §5 + §6	Track 1 hypothesis + Track 12 gating
ADR-14	Purging invariant (training time t ≤ t − purge_bars)
ADR-58	FTF factor guardrail + integration test requirement
ADR-79	FTF Story closure 8-step workflow
ADR-80	FTF post-run extraction + dossier mechanics
ADR-81	8-state Story workflow
Memory `project_committee_keys_dead_2026-05-02`	Committee invocation gating

11. Plan-review questions for committee (or operator self-review under waiver)¶

Phase A power : ~~is n=25/variant sufficient~~ — resolved by committee dd248118 rec #1 : Phase A ships at power_mode=medium (n=50/variant) for tighter Cohen-d ranking. Resolved tradeoff : +1h compute accepted for higher confidence in feature attribution.
Phase B granularity : should we add btc_full_purge2 and btc_full_purge15 to densify the curve in the [0, 20] inflection zone, or is the {0, 5, 10, 20, 40} grid sufficient? Tradeoff : 2 more variants × 50 cells = +20 min compute.
Phase D verdict gating on per-asset 4/5 : the parent plan §4.4 requires ≥ 4/5 cryptos individually improve for LOCK ; should the corrected re-sweep apply this gate, or should the investigation Story relax it given UNIUSDC is the known under-performer? Default : keep the 4/5 gate (no relaxation, Story is investigation not LOCK chase).
Pre-registered fallback to CONFIRM if all p ≥ 0.10 — is this the right default? Alternative : default to TIGHTEN (purge=40) on inconclusive results to err on the safe side. Tradeoff : safer but loses signal ; CONFIRM keeps the current ABANDON verdict standing.
Sequencing PR 1 (impl) + PR 2 (verdict) : should we ship as one combined PR after the operator runs the sweeps, or two PRs (impl + verdict)? Two-PR shipping splits review surface and lets the impl land before the operator commits ~1.5h compute ; one-PR is leaner but requires the operator to run sweeps before opening any PR.