Planning Dossier — CVN-N001-EK-S02: D2 — Quantitative pre-study (Phase 1)¶

Story: CVN-N001-EK-S02 · parent Epic CVN-N001-EK OpenProject: wp#272 · GitHub issue: #1173 Type: implementation plan (ADR-68), for committee plan_review. Date: 2026-06-09 · Revision: r4 (restructured to the ADR-0101 canonical plan: problématisation → user stories → hypotheses → state of the art → DoD → consolidation; technical contracts renumbered as §7–§18 "Approach") Story exit gate: requires methodology committee review PASSED before transition to Specified. INFEASIBLE is an allowed, successful outcome.

Revision journal. - r1 (PR #1175) — right governance boundary (analysis-only). - r2 (PR #1176) — hardened engineering/statistical contracts (REWORK, 4.2/5): reference-capacity rule, cost evidence tiers, corrected power wording + simulation contract, exploratory anti-snooping boundary, null-gate selection rule, expanded budgets, INFEASIBLE next-action table, "signed derivation", Threats. - r3 (PR #1177) — sharp P1 edits (PASS-with-edits, 4.75/5): Tier-C completes S02 but blocks S03; conservative capacity tie-breaker; Tier-B bounded; mapping monotonicity; null-invalidity criteria. - r4 (this revision) — restructured to the ADR-0101 canonical plan shape: added Problématisation (§1), User stories (§2), Hypotheses (§3), State of the art (§4), Definition of Done (§5), Consolidation (§6). The committee-reviewed technical contracts are unchanged in substance — renumbered as §7–§18 (Approach / design) with internal cross-references updated.

Executive summary¶

S02 derives the quantitative values required to lock the Phase-0 charter (ADR-0102). It is analysis-only and may legitimately end in INFEASIBLE. It cannot prove signal, choose a model, authorize trading, or launch runs. It separates durable rules (already in the ADR) from the instance values it now derives, each backed by a signed, versioned derivation or a typed INFEASIBLE record. INFEASIBLE is a successful S02 outcome, but it blocks the S03 charter lock unless and until the typed cause is remediated.

Partie I — Cadrage (ADR-0101)¶

1. Problématisation¶

La question, en une phrase. Avant de chercher à prouver que notre signal d'entrée « marche », peut-on dire — de façon défendable et écrite à l'avance — combien il devrait valoir pour être réellement tradable, et si nos données permettent seulement de le mesurer ?

Pourquoi maintenant. Le pilote précédent est ressorti non concluant : les fourchettes étaient compatibles avec une perte comme avec un gain, à tous les niveaux de coût. La tentation naturelle serait de relancer « encore un essai » jusqu'à tomber sur un résultat favorable. C'est précisément le piège : en multipliant les essais sans règle fixée d'avance, on finit toujours par trouver un chiffre qui plaît — sans qu'il signifie quoi que ce soit. Cette Story (S02) refuse ce piège : elle pose d'abord les chiffres-seuils et les règles de mesure, avant tout test du signal.

Les illusions à éviter — quatre pièges qui ont déjà coûté au projet :

Confondre une bonne note d'examen et de l'argent. Un modèle peut avoir d'excellentes métriques statistiques sans générer le moindre gain net une fois les frais payés.
Un coût « moyen » qui rend tout rentable. Si l'on suppose un coût de transaction trop optimiste ou « par défaut », n'importe quel signal paraît tradable. Le coût doit être réel, mesuré à une taille d'ordre précise.
Choisir les règles après avoir vu les résultats. Décider de la taille d'ordre, du point de comparaison (le « null ») ou de la métrique après avoir regardé ce qui arrange — c'est se mentir à soi-même.
Conclure « rentable » sur trop peu d'observations. Quelques trades chanceux ne font pas une preuve.

Ce qu'on mesure réellement. S02 dérive, à partir des données déjà existantes (aucun nouveau calcul lourd, aucun lancement) : (a) le coût réel d'un aller-retour à une taille d'ordre de référence explicitement « recherche, non-déploiement » ; (b) le seuil d'espérance économique au-dessus duquel on rentre dans ses frais (E_econ_min) ; (c) l'effet de prédiction minimal correspondant (E_pred_min) ; (d) la puissance disponible — nos données sont-elles seulement assez nombreuses pour détecter un tel effet ? ; (e) un point de comparaison honnête (le null) que le signal devra battre.

L'honnêteté du verdict. S02 a le droit de conclure « infaisable » — et c'est un succès, pas un échec. Mieux vaut dire « nos données ne permettent pas de trancher » que fabriquer une fausse certitude. L'infaisabilité est même typée (coût, capacité, puissance, mapping, null) pour indiquer exactement quelle action corrige le blocage.

Pourquoi ça compte. Ces chiffres ne sont pas académiques : ils sont la condition pour verrouiller le protocole (S03) puis tester le signal (S04). Sans eux, tout ce qui suit n'est que tâtonnement. S02 transforme « on essaie et on verra » en « on sait ce qu'il faut atteindre, et on sait si c'est mesurable ».

2. Besoins — User stories¶

US1 — En tant qu'owner scientifique, je veux que les seuils économiques soient dérivés avant tout test du signal, afin d'éviter le garden of forking paths. → §7, §10
US2 — En tant que risk owner, je veux un coût P90 tracé à une capacité de référence avec son niveau de preuve (tier), afin de ne jamais confondre un proxy fragile avec un coût verrouillable. → §9
US3 — En tant que méthodologiste, je veux une simulation de puissance reproductible, afin de savoir si le substrat peut détecter l'effet économiquement pertinent. → §11
US4 — En tant qu'opérateur, je veux un verdict INFEASIBLE typé, afin de savoir exactement quelle remédiation engager (instrumenter le coût, élargir l'univers, changer la métrique…). → §15
US5 — En tant que reviewer, je veux que les résultats exploratoires existants ne contaminent pas les choix de calibration, afin de préserver la falsifiabilité du protocole. → §8.1
US6 — En tant que risk owner, je veux qu'un coût de niveau Tier-C ne puisse jamais verrouiller S03, afin d'empêcher une fausse tradabilité. → §9.2
US7 — En tant qu'owner scientifique, je veux une définition arrêtée du Sortino (R1), afin qu'aucune conclusion « sous le plancher » ne soit invoquée sans base comparable. → §13

3. Hypotheses (EN)¶

S02 is calibration, not signal-proof, so its hypotheses concern feasibility / derivability, not the signal. Each carries the null tested and the test method; the verdict on a non-rejected null is a typed INFEASIBLE (§15).

H1 — capacity derivable. A conservative, defensible research-reference order size can be derived from existing defi_top5 liquidity without inspecting outcome performance. Null: no such rule exists without outcome inspection. Test: the pre-specified liquidity rule + tie-breaker (§9.1), rejected alternatives recorded; failure → INFEASIBLE-capacity.
H2 — cost lockable. Existing trade/cost logs support a Tier A/B (lockable) P90 round-trip cost at the reference size. Null: only Tier C/D is supported. Test: evidence tiering + Tier-B error bounding (§9.2); Tier C/D → non-lockable / INFEASIBLE-cost-data.
H3 — mapping monotonic/stable. A documented predictive→economic mapping exists and is monotonic/stable with gross/net expectancy over the action-policy range. Null: no monotonic/stable mapping. Test: §10; if unshown, the predictive metric (e.g. f1_buy) cannot be the primary gate → INFEASIBLE-mapping.
H4 — substrate powered. The current universe/folds are adequately powered: MDE_available ≤ E_pred_min. Null: MDE_available > E_pred_min (underpowered). Test: purged/embargoed block-bootstrap power simulation (§11); if N_min infeasible → INFEASIBLE-power.
H5 — valid null exists. A conservative, valid primary null-gate is constructible (preserving base rate, temporal dependence, purge/embargo, trade-count / action-policy). Null: no defensible null. Test: candidate comparison + invalidity criteria (§12); else INFEASIBLE-null.

Working assumptions surfaced (not tested in S02): defi_top5 as the control group · ATR-H4 triple-barrier labels · GBT model class. These are inherited from the charter / ADR-0102 thesis and are tested in S04, not here.

4. State of the art (EN, references)¶

Each invariant is grounded in established practice and mapped to the hypotheses:

Leakage-aware validation — purged & embargoed cross-validation for overlapping, horizon-dependent labels [1]; grounds H4's dependency model and H5's null (§11, §12).
Backtest overfitting under multiple trials — Probability of Backtest Overfitting / CSCV [2]; motivates the budget/FDR discipline and the anti-snooping boundary (§8.1, §14).
Multiple-testing control — False Discovery Rate [3]; grounds the FDR budget (§14).
Researcher degrees of freedom — the garden of forking paths [4]; grounds §8.1 + pre-registering all thresholds (H1–H5 are fixed before measurement).
Pre-specified experimental design — design before measurement [5]; grounds the "values before test" stance (§1).
Statistical power & MDE — power analysis and the minimum detectable effect [6]; grounds H4 (§11).
Resampling under dependence — moving-block bootstrap [7] and the stationary bootstrap [8]; ground the block-bootstrap power contract (§11.1).
Transaction cost & market impact — optimal execution / impact at size [9]; grounds the capacity-dependence of P90 cost (H2, §9).
Downside-risk performance — the Sortino ratio and downside deviation [10]; grounds the R1 note (§13).

References: [1] López de Prado, Advances in Financial Machine Learning, 2018. [2] Bailey, Borwein, López de Prado & Zhu, "The Probability of Backtest Overfitting", J. Computational Finance, 2014. [3] Benjamini & Hochberg, "Controlling the False Discovery Rate", JRSS-B, 1995. [4] Gelman & Loken, "The garden of forking paths", 2013. [5] NIST/SEMATECH, e-Handbook of Statistical Methods (Design of Experiments). [6] Cohen, Statistical Power Analysis for the Behavioral Sciences, 1988. [7] Künsch, "The jackknife and the bootstrap for general stationary observations", Ann. Statist., 1989. [8] Politis & Romano, "The Stationary Bootstrap", JASA, 1994. [9] Almgren & Chriss, "Optimal execution of portfolio transactions", J. Risk, 2000. [10] Sortino & Price, "Performance measurement in a downside risk framework", J. Investing, 1994.

5. Definition of Done¶

S02 is complete only if:

reference capacity derived by the §9.1 rule (non-deployment) or INFEASIBLE-capacity;
P90 cost measured at Tier A/B (lockable) or Tier-C provisional (non-lockable, labelled) or INFEASIBLE-cost-data;
E_econ_min and E_pred_min derived (distinct) or INFEASIBLE-mapping;
MDE_available, and N_min if underpowered, simulated under the §11.1 contract (block design, deps, purge/embargo, reps, seed, sensitivity, reproducibility metadata) or INFEASIBLE-power;
primary null-gate justified (candidates compared per §12) or INFEASIBLE-null;
Sortino R1 resolved (definition note) or explicitly non-actionable; not used as a gate;
budgets proposed with the full §14 content;
every value carries a signed derivation (§16);
no predictive run / training / Airflow launch performed (analysis-only attested).

Operational readiness (process/governance Story, ADR-0101 Inv 4): each derivation is reproducible from its signed-provenance record (§16); the verdict (proceed-to-S03 / typed INFEASIBLE) routes to a recorded next action (§15); rollback = revert the derivations (no runtime impact); owner handoff = the unlocked charter + this dossier.

6. Consolidation & traceability¶

No dangling thread — every problem (Ch.1) maps to a hypothesis (Ch.3), a user story (Ch.2), an approach section (§7–§18), and the literature (Ch.4):

Problem (§1)	Hypothesis	US	Approach section	Literature
metric ≠ money	H3	US1	§10 mapping (monotonicity)	[1][2]
optimistic/placeholder cost	H2	US2, US6	§9.2 evidence tiers	[9]
rules chosen after results	H1 + all	US5	§8.1 anti-snooping · §9.1 capacity rule	[4][5]
conclude on too few obs	H4	US3	§11 power / MDE	[6][7][8]
honesty of the verdict	all	US4	§15 typed `INFEASIBLE`	—
Sortino comparability	—	US7	§13 Sortino R1	[10]

Decision routing. Each hypothesis whose null is not rejected → the corresponding typed INFEASIBLE (§15) with its required artifact and next action; all nulls rejected (all values derived) → proceed to S03 (charter lock), subject to the Tier-A/B lockability gate (§9.2). Coherence check: the approach (§7–§18) delivers exactly the values the problématisation (§1) requires and tests exactly the hypotheses (§3) the user stories (§2) demand — no section without a thread, no thread without a section.

Partie II — Approach / design (the how)¶

7. Decision boundary¶

S02 is analysis / calibration only. It derives values; it proves nothing about the signal.

Hard constraint (operator, 2026-06-09): no Airflow launcher, no training, no Phase-2 predictive run, no model selection. If any step requires a new training run, an Airflow launch, a cluster job, or a predictive Phase-2 run, S02 STOPs and returns to the operator — it does not launch autonomously. No charter lock (S03); no trading authority.

8. Inputs — allowed / forbidden¶

Allowed (read-only, existing): existing OHLCV cache · existing labels (ATR-H4 triple-barrier) · existing trade / cost logs · existing exploratory results (e.g. the cost-sensitivity report) · derivation notebooks / scripts (read-only over the above).

Forbidden: any new training · any predictive sweep / Phase-2 run · any cluster / Airflow launch · OOS threshold optimisation · meta-labeling · any model comparison that picks a winner.

8.1 Exploratory-results / anti-snooping boundary¶

Existing exploratory results may be used only to identify known failure modes and data-quality threats. They may not be used to choose the reference capacity, primary metric, null-gate, universe, label/horizon, action policy, FDR budget, or any value that would improve the chance of a future PROMOTE. If an exploratory finding influences a tuple coordinate, that influence must be recorded as prior rationale, and the corresponding tuple must be registered and budgeted (ADR-0102 Invariant 5).

9. Cost & capacity derivation (the first blocker)¶

E_econ_min is meaningless without a real P90 round-trip cost at a defined target capacity. History shows cost was never pinned (the cost-sensitivity report swept {30,40,50,70} bps for lack of a real round-trip cost; the §0bis economic probe found a keyed-but-wrong-cost placeholder trap). No abstract / default / placeholder cost is permitted.

9.1 Reference-capacity rule (pre-specified, before cost)¶

Target capacity = a research reference capacity, explicitly non-deployment — it fixes a target order size before any cost estimation, and is not a deployment authorization. The reference order size must be derived by a documented liquidity rule, before cost estimation, not chosen after seeing cost outcomes. The rule MUST specify:

sampling window;
venue(s);
asset-level liquidity metric (e.g. ADV, top-of-book depth, median hourly volume);
sizing mode: per-asset equal-notional · liquidity-scaled · worst-asset-constrained;
participation cap;
minimum tradable notional;
treatment of missing / stale liquidity;
whether P90 cost is computed per asset then aggregated, or directly at portfolio level.

Tie-breaker. If several defensible reference-capacity rules exist, choose the most conservative rule that still satisfies the minimum-tradable-notional constraint. The selected rule and the rejected alternatives must be recorded — the capacity must not be chosen to make the cost favourable.

If no rule can be defended without inspecting outcome performance → INFEASIBLE (reason: capacity).

9.2 P90 cost — evidence tiers¶

P90 cost is not assumed measured; it is graded by the evidence the existing data actually supports:

Tier	Basis	Lockable?
A	directly observed fills at comparable asset / venue / size / regime	Yes
B	observed fills adjusted by a pre-specified liquidity model	Yes
C	order-book / spread proxy with a conservative stress multiplier	No — provisional, analysis-only bound, labelled non-lockable
D	unsupported placeholder	No → `INFEASIBLE` (reason: `cost-data`)

Only Tier A or B may produce a lockable P90 cost (for S03). The derivation states which tier, with sample sizes, asset/venue/size/regime coverage, maker/taker mix, spread+impact+latency treatment, and tail slippage. Bad cost measurement is the fastest path from rigorous protocol to fake tradability — so the tier is explicit and auditable.

Tier B is bounded. Tier B requires a pre-specified adjustment model with documented inputs, calibration window, residual / error analysis, and a conservative stress factor. If the adjustment model cannot bound its error conservatively, Tier B is downgraded to Tier C (a sophisticated-but-fragile proxy is not lockable).

Tier C consequence for S03. A Tier-C provisional cost may complete S02 as an analysis artifact, but it cannot feed an S03 charter lock. If S02 ends with Tier C, the S03 path is blocked until Tier A/B evidence is produced, or the risk owner explicitly approves a non-lockable exploratory charter state (recorded). "S02 done, Tier C labelled" never authorises a lock.

10. `E_econ_min` / `E_pred_min` mapping¶

E_econ_min = economic break-even from the §9 (Tier A/B) conservative P90 cost + reference capacity.
A documented predictive→economic mapping turns a predictive effect into per-trade economic expectancy → E_pred_min = the minimum predictive effect that meets E_econ_min. The two stay distinct (ADR-0102 Invariant 8); a predictive lift is not an economic edge.
Stability / monotonicity. The mapping MUST document whether the predictive metric is monotonic with gross/net expectancy over the observed or simulated action-policy range. If monotonicity or stability cannot be shown, that predictive metric (e.g. f1_buy) cannot be the primary gate metric — the project history shows exactly this AUC/f1→tradability non-transfer.
If no defensible mapping can be constructed from existing data → INFEASIBLE (reason: mapping).

11. Power simulation → MDE / N_min¶

Power feasibility rule (corrected):

Compute MDE_available (the minimum detectable effect at the current universe/folds) under the purged/embargoed dependency model.
Derive E_pred_min from E_econ_min (§10).
If MDE_available > E_pred_min, the current substrate is underpowered for the economically relevant effect.
Estimate N_min = the smallest feasible universe / fold / sample configuration for which MDE(N_min) ≤ E_pred_min.
If N_min is not operationally feasible → INFEASIBLE (reason: power) → widen universe / folds.

N_trades_min ≥ 30 is a minimum sanity floor, not a power guarantee — the MDE simulation decides feasibility (MDE wins over the heuristic).

11.1 Power-simulation contract (reproducible)¶

The simulation MUST specify: block-length selection · whether blocks are by time / asset / fold / cluster · whether cross-asset dependence is preserved · whether resampling preserves label base rate · whether fold boundaries are respected · how purge/embargo are applied inside bootstrap samples · the statistic powered · how CIs / rejection criteria are computed · number of replications · seed policy · sensitivity of the result to block length. The power report carries full reproducibility metadata (§16).

12. Null-gate candidate selection¶

Do not lock "random entry" as the default null-gate — a random-entry baseline is often too weak as the primary null. Random entry is a diagnostic null candidate, not the primary.

Primary null-gate selection rule. Among implementable candidate nulls, choose the most conservative null that: preserves label base rate by asset/fold · preserves temporal dependence at the horizon scale · respects purge/embargo · preserves trade-count / action-policy constraints · is reproducible · does not leak future information. If candidates disagree, conservatism wins unless the more conservative candidate is demonstrably invalid. A more conservative null may be rejected only if it violates the registered action-policy constraints, destroys the label base-rate structure, breaks purge/embargo comparability, or produces an economically non-comparable trade-count / payoff distribution — and the rejection rationale must be recorded (so "this null is too hard, therefore invalid" is not a valid rejection). If no defensible null can be constructed (split/label issue) → INFEASIBLE (reason: null).

A tuple cannot PROMOTE by beating only diagnostic nulls (restated from ADR-0102 because S02 selects the candidate).

13. Sortino R1 definition¶

R1 (Sortino) is an S02 deliverable, resolved before any actionable "sub-floor" statement.

Research convention (default): Sortino on per-period strategy returns · MAR = 0 unless a risk policy defines otherwise · annualisation only if the return periodicity is fixed and documented. Do not freeze the convention without first checking how existing reports compute Sortino (comparability of the ~1.0 floor).
Role boundary: Sortino R1 resolves metric comparability only. It is not an S02 gate and confers no sub-floor or deployment claim. Output = a Sortino definition note (input series · periodicity · MAR · annualisation factor · treatment of zero/downside observations · comparability of the ~1.0 floor) + a decision on whether prior Sortino observations are comparable / non-comparable / non-actionable.

14. Budgets (proposed, not locked)¶

A budget proposal (ADR-0102 Invariant 6) MUST include: family definition · max number of registered tuples · max ONE-ITERATION attempts · FDR method · alpha / FDR level · allocation rule across tuples/stages · what consumes budget · what does not · final-holdout access policy · stop rule when budget is exhausted. These are proposals for the S03 lock, not locked values.

15. `INFEASIBLE` taxonomy — typed reasons on the single verdict¶

ADR-0102 reconciliation. ADR-0102 Invariant 4 defines a single INFEASIBLE state (no DEFERRED-INFEASIBLE, no PARK). The types below are reason annotations on that one verdict, not new verdict states — so this plan does not contradict the ADR.

reason	Trigger	Blocks S03	Substrate work	Charter revision	Kills thesis	Required artifact
`cost-data`	no defensible (Tier A/B) P90 cost	yes	—	—	no	cost-evidence note (tiers attempted)
`capacity`	no defensible reference order size	yes	—	re-define/justify capacity	no	capacity-rule note
`power`	`MDE_available > E_pred_min` & `N_min` infeasible	yes	widen universe / folds	—	no	power report + `N_min` estimate
`mapping`	no defensible predictive→economic mapping	yes	—	change primary metric / mapping basis	no	mapping-attempt note
`null`	no defensible null (split/label)	yes	—	fix split / labels	no	null-candidate comparison

Not allowed for any reason: weaken E_pred_min, change the metric, or use proxy/Tier-C/Tier-D cost to proceed to an S03 lock. A typed INFEASIBLE is a successful S02 outcome.

16. Deliverables¶

The filled charter values (still UNLOCKED): P90 cost (with tier) · E_econ_min · mapping · E_pred_min · MDE / N_min · primary null-gate (justified) · proposed budgets · Sortino R1 note — each with a signed derivation — or the corresponding typed INFEASIBLE record + its required artifact.

"Signed derivation" means: immutable artifact path or MLflow run id · git commit SHA · input dataset versions / hashes · code version · parameters · author · review status · generated timestamp · a reproducible command or notebook execution record. "Signed" without these is just prose.

MLflow is provenance-only here: an artifact / provenance tracker for analysis notebooks — not evidence of a training or predictive run. An "MLflow run id" in S02 never denotes a model-training run.

17. Threats to validity¶

cost logs not representative of the target order size;
P90 unstable due to small sample;
liquidity-regime shift;
block bootstrap misspecified / cross-asset dependence underestimated;
null too weak (diagnostic) or too strong (invalid);
E_pred_min mapping fragile;
prior exploratory results contaminate calibration choices (§8.1);
Sortino comparability unresolved;
the non-deployment reference capacity misread as deployment capacity;
Tier-C provisional cost accidentally treated as lockable;
reference-capacity rule selected to minimise estimated cost rather than conservatively represent deployable liquidity.

Each is addressed by the corresponding contract above; residual ones are recorded with the derivation.

18. Review gates¶

S02 exit: committee review PASSED (this dossier first, then the derivations). INFEASIBLE is an allowed, successful outcome. No charter lock (S03); no run (S04).

Committee questions¶

Is a conservative research-reference order size from defi_top5 liquidity (non-deployment, §9.1 rule) an acceptable capacity basis, or must S02 block on a real AUM/target before any cost?
What evidence tier is required for P90 cost to be lockable in S03 (§9.2) — is Tier B the floor, or only Tier A?
Does the plan adequately prevent existing exploratory results from influencing S02 calibration choices (§8.1)?
Is the purged/embargoed block-bootstrap MDE (§11.1 contract) the right power method given analysis-only over existing data?
Is the primary-null selection rule (§12, not defaulting to random entry) correctly the conservative choice, and is "cannot PROMOTE on diagnostic nulls" correctly restated here?
Does encoding the INFEASIBLE types as reasons on the single verdict (§15) correctly honour ADR-0102 Invariant 4?
Tier-C outcome semantics (§9.2): if S02 produces only Tier-C cost evidence, should the Story be complete but S03 blocked (plan's position, when Tier C is clearly labelled non-lockable), or must S02 itself return INFEASIBLE-cost-data?