PTE envelope sweep — test plan¶

Issue: #608 (F1 mission — phase "lock the envelope") Factor: pte_envelope (11 variants, added in PR #630) PR branch: feat/608-pte-envelope-factor Date: 2026-04-23 Operator: dco

1. Context¶

Two FTF runs on 2026-04-22 triangulated the real lever for the mission:

Run	PTE	Factor	Sortino	f1_buy	action_rate
`ftf_20260422_174216`	`ATR1.5_3.0_H5`	threshold_method	0.68	0.06	0.008
`ftf_20260422_220929`	`ATR0.5_1.5_H4`	horizon	1.69	0.40	0.161

Switching the envelope from ATR1.5_3.0 to ATR0.5_1.5 multiplied Sortino ×2.5, action_rate ×20, return ×3.5. The horizon sweep (H3..H12) on the new envelope came back totally flat — 0/4 metrics significant, all Sortinos 1.5-1.7. Horizon is not the knob.

The mission depends on finding the (SL_mult, TP_mult) pair that simultaneously:

Keeps action_rate ≥ 0.05 (operator criterion: "enough trades to learn")
Maximizes f1_buy − const_F1 (the only F1 advantage that means anything — raw F1 moves mechanically with pos_rate)
Maximizes Sortino at a drawdown we can live with

No envelope sweep has ever been run as a coupled factor. Existing tp_multiplier and sl_multiplier factors sweep one axis at a time, which mixes volume effect and edge effect in the cross-variant variance.

2. Hypothesis¶

H1 — There is a measurable (SL, TP) pair at which the model's advantage = f1_buy − const_F1 is strictly > 0 and Sortino > 1.5, with BH-corrected paired-t significance on at least 2 of {Sortino, expectancy, total_return, win_rate} — the lock criterion from ADR-14 / Issue #595 Phase A.

H2 — The best pair is not sl0.5_tp1.5 (current γ anchor). The γ anchor was inherited from the label-quality scan as a reasonable starting point, not from an empirical sweep. If H2 holds, γ's base_env should be updated post-run.

H3 — The RR ratio (TP/SL) matters more than the absolute scale. Specifically, tight RR (1:1 to 1:2) and wide RR (1:4 to 1:5) should produce different regimes:

Tight RR → high pos_rate, high trade volume, low per-trade edge, fragile to costs.
Wide RR → low pos_rate, fewer trades, higher per-trade edge, more variance.

The sweep will tell which regime wins on this market.

Null hypothesis — Every variant lands within ±0.15 Sortino of the anchor, no BH significance, no advantage above +0.02 anywhere. In that case the PTE is not the lever and we need to revisit the feature/model axes before any other envelope work.

3. Grid — 11 coupled (SL, TP) variants¶

Variant	SL × ATR	TP × ATR	RR	Hypothesis role
`sl0.3_tp1.0`	0.3	1.0	1:3.3	Tight floor — max trade density
`sl0.3_tp1.5`	0.3	1.5	1:5	Very tight SL + ambitious TP
`sl0.5_tp1.0`	0.5	1.0	1:2	Current SL, compact TP
`sl0.5_tp1.5`	0.5	1.5	1:3	γ anchor (baseline)
`sl0.5_tp2.0`	0.5	2.0	1:4	Current SL, wider TP
`sl0.5_tp2.5`	0.5	2.5	1:5	Ambitious RR, same SL
`sl1.0_tp1.0`	1.0	1.0	1:1	Symmetric — test if edge exists without RR advantage
`sl1.0_tp1.5`	1.0	1.5	1:1.5	Modest RR, medium scale
`sl1.0_tp2.0`	1.0	2.0	1:2	Balanced medium
`sl1.0_tp3.0`	1.0	3.0	1:3	Medium SL, wide TP
`sl1.5_tp3.0`	1.5	3.0	1:2	Pre-mission envelope (reference)

Horizon held constant at base_env default (H=4h). Binary mode + f1_binary HPO + ThresholdCalibrator.f1_binary already pinned in base_env via γ (2026-04-22). No other factor varied in this run.

4. Resource estimate¶

Quantity	Value
Variants	11
Cryptos (defi_top5)	5 (AAVEUSDC kept per operator decision)
Folds	5
HPO trials / variant / fold	50
Total pods (factor × crypto)	55
Model-fits total	11 × 5 × 5 = 275
Profile	`standard` (4 CPU / 8Gi per pod)
Airflow `max_active_tasks`	24
Wall-time estimate	~8h (extrapolated from run `ftf_20260422_220929` which did 25 pods in 3h19m at same config)

confirm_long_run=true required because forecast > 3h.

5. Success criteria (decision rule)¶

LOCK — promote the winner to `base_env`¶

A variant V is locked iff all three conditions hold:

Sortino V ranks first AND BH-adjusted p < 0.05 vs the anchor sl0.5_tp1.5 on at least 2 of the 4 primary metrics (Sortino / expectancy / total_return / win_rate), Cohen's d ≥ 0.3 in winner direction. (ADR-14 lock rule.)
advantage > +0.02 averaged across cryptos (f1_buy − const_F1). Captures that the model adds value above naive constant-BUY.
AAVEUSDC Sortino > -1.0 under V. Tolerance: AAVE can stay negative but must not amplify beyond what it showed in run B (-0.6 to -0.8). Hard floor -1.0 catches regimes where the envelope makes AAVE structurally broken.

If all three pass → update ftf_config.base_env via Console:

CVN_SL_MULT = "<winner SL>"
CVN_TP_MULT = "<winner TP>"

NOT-LOCK — keep γ anchor¶

Any variant tied or better than anchor on Sortino WITHOUT BH significance → no promotion, anchor stays. Move to next lever (Δf1 reduction, feature count, or model ensemble).

PIVOT — if null hypothesis wins¶

All 11 variants within ±0.15 Sortino of anchor, 0/11 with advantage > +0.02 → envelope is not the lever. Open separate audit issue on features/label quality before any further envelope work. Document this as a dead-end in the mission plan v2.

6. What we measure¶

Primary (decision-driving)¶

f1_buy, precision_buy, recall_buy, action_rate, Δf1 (overfit gap)
advantage = f1_buy − const_F1 computed post-hoc, added to the analysis notebook — not currently in the PDF report (follow-up to add to report_pdf.py)
Sortino, expectancy_per_trade, total_return_pct, win_rate, max_dd, n_trades
BH-corrected paired-t p-values per (variant × metric)
Cohen's d per (variant × metric)

Secondary (diagnostics)¶

Per-fold stability (variance across folds 3-7 — fold 4 has been a consistent outlier)
Per-crypto breakdown (especially AAVEUSDC trajectory)
Signal funnel (CUSUM block rate, concurrency block rate, total survival)
Calibration quality (Brier, ECE) — will deteriorate at high pos_rate, expected

Red flags (abort criteria during run)¶

Any variant's median action_rate < 0.01 → label too sparse, fold will crash on threshold calibration. Known fragility of the f1_binary path.
Per-pod OOM → bump profile from standard (8Gi) to heavy (24Gi) via power_mode=deep. Don't go full deep unless a pod actually OOMs — deep pulls in defi_full (17 cryptos) which we don't want here.

7. Analysis plan (post-run)¶

Local SQL after the run completes:

SELECT
  variant,
  crypto,
  fold_id,
  f1_buy, precision_buy, recall_buy, action_rate,
  n_trades_val, sortino_val, expectancy_val,
  sortino, total_return_pct, win_rate, max_dd_pct,
  n_trades_backtest
FROM finetune_results
WHERE run_id = '<new_run_id>'
ORDER BY variant, crypto, fold_id;

One-shot analyzer: scripts/analyze_pte_envelope_run.py <path_to_pdf> parses the FTF report and applies the full 3-condition lock rule (§5) automatically. Emits a markdown summary + JSON, prints LOCK / NOT_LOCK / PIVOT to stdout. Runs in seconds, no DB / MLflow needed. Tested against the horizon run (2026-04-22).

Manual pipeline (if the analyzer breaks on a PDF format change):

Download PDF from Console → Runs → extract summary table.
Join f1_buy (Couche A) + Sortino (Couche C) per (variant, crypto, fold).
Compute pos_rate — not yet in finetune_results. Two sources:

a. Preferred (exact): re-run scripts/label_quality_scan.py with the winning (SL_mult, TP_mult) from the sweep. That script replays the triple-barrier on raw OHLCV and returns pos_rate per (crypto, variant). Cost: ~2 min / config on defi_top5.

b. Heuristic (fast, noisy): use the model's observed action_rate as a proxy for pos_rate. At an F1-maximizing threshold the two converge within ~10% on balanced-ish labels (we saw action_rate ≈ 0.16, close to the label-scan's pos_rate ≈ 0.25 on the H4 anchor). Good enough for ranking variants when the true rate isn't yet plumbed through.

Follow-up gated on this run: extend report_pdf.py (or the finetune_results ETL upstream) to persist pos_rate per (variant, crypto, fold) so subsequent runs skip step 3. Currently none of the factors produce this column.

Compute const_F1 = 2 × pos_rate / (1 + pos_rate) per (variant, crypto).
Compute advantage = f1_buy − const_F1 for each cell; aggregate (median across folds × cryptos).
Lock / not-lock / pivot decision per §5.

8. Follow-ups gated on this run¶

If lock → update base_env, re-run a short validation FTF with a secondary factor (candidate next: overfit-reduction factor hpo_regularization_band {loose/tight}) to confirm the locked PTE is robust to model complexity changes.
If not-lock or pivot → AAVEUSDC audit, then overfit-reduction on anchor.
Either way → extend report_pdf.py to compute and display advantage alongside f1_buy by default. Raw f1_buy without the pos_rate reference is misleading and we relearned this twice already.

9. Timeline¶

Step	ETA
PR #630 merged	today, post-CR
Console flip if needed (operator)	today
Trigger FTF w/ `factor=pte_envelope`	today
Run completes	+8h wall
Analysis + decision	+9h total

10. Stories (retro-registered in OP — 2026-06-09)¶

Cet Epic (expérience 2026-04-23/24) n'avait jamais été tracé en OpenProject. Enregistré a posteriori : Epic wp#258 (GH #1146), parent Need CVN-N001.

Story	Titre	GH · OP	Statut
CVN-N001-EC-S01	`pte_envelope` factor — coupled (SL,TP) sweep impl	#1147 · wp#259	Closed (PR #630 mergé)
CVN-N001-EC-S02	sweep run 11 variants + décision LOCK/NOT-LOCK/PIVOT	#1148 · wp#260	Closed (décision PTE `ATR0.5_1.5_H4`, ~2026-04-24)

Follow-ups §8 non-poursuivis (jamais démarrés ; programme pivoté vers le gel ML_USELESS) — non créés en Story : affichage advantage dans report_pdf.py · validation FTF hpo_regularization_band / audit AAVEUSDC. Réouverture via nouvelle Story si le travail PTE reprend.