Track 6 — Focal loss : results dossier (ABANDON)¶

Date : 2026-04-29 Story : CVN-N001-EE-S02 (OP wp#41) Track : 6 of F1_BUY_BOOST_PLAN.md — focal loss for XGBoost binary classification Plan dossier : 2026-04-28-track6-focal-loss-plan.md (committee 4ef337af PASSED OK after v1+v2 REJECTED) PR review dossier : 2026-04-28-track6-focal-loss-pr-review.md (committee 13fd89c9 PASSED OK) MLOps readiness : mlops_readiness.md — 6 sections complete per ADR-70 Implementation PR : #767 (squash db33e7dd, 2026-04-29) Sister hotfix PR : #775 (sympy missing, blocked first sweep — see OPERATIONS.md §17.4) FTF run : ftf_20260429_121011_282ec7_ATR0.5_1.5_H4 — 125 rows, 0 errors, status=completed

TL;DR — verdict¶

ABANDON for all 4 focal variants (mild_focus, standard, aggressive_focus, class_balanced). The hypothesis "focal loss concentrates training on the minority class and lifts f1_buy" is rejected by 5/5 cryptos with large effect size in the wrong direction (Cohen's d ∈ [-1.6, -1.2]).

none (γ=0, equivalent to standard binary cross-entropy) stays as the production champion. Console state unchanged ; no flip in ftf_config.base_env.

1. Sweep configuration¶

Param	Value
Factor	`focal_loss` (Track 6)
Variants (5)	`none` (baseline, γ=0, α=0.5) ; `mild_focus` (γ=1, α=0.25) ; `standard` (γ=2, α=0.25) ; `aggressive_focus` (γ=4, α=0.25) ; `class_balanced` (γ=2, α=0.75)
Cryptos (5)	AAVEUSDC, ARBUSDC, LDOUSDC, OPUSDC, UNIUSDC (defi_top5)
Folds	5 per crypto (purged k-fold per ADR-14)
Total useful rows	5 × 5 × 5 = 125 ✅ (full coverage)
Power mode	`standard` (50 HPO trials per fold)
Strategy	`ATR0.5_1.5_H4`

2. Per-crypto × variant `f1_buy` (mean across 5 folds)¶

Crypto	none (baseline)	mild_focus	standard	aggressive_focus	class_balanced
AAVEUSDC	0.4560	0.3951	0.3541	0.3658	0.3845
ARBUSDC	0.4417	0.3911	0.3592	0.3569	0.3620
LDOUSDC	0.3892	0.3165	0.2910	0.3238	0.3053
OPUSDC	0.3679	0.3095	0.2986	0.2947	0.2772
UNIUSDC	0.4330	0.3270	0.3517	0.3487	0.3311

none is the highest f1_buy on every single crypto. No focal variant ever wins.

3. Paired delta vs baseline (variant - none, paired by crypto × fold)¶

n=25 paired samples per variant. Bootstrap CI95 (10,000 resamples, seed=42).

Variant	mean Δf1_buy	CI95 low	CI95 high	Cohen's d	raw paired t p	BH p (m=4)
`mild_focus`	-0.0697	-0.0874	-0.0526	-1.541	6.10e-08	1.22e-07
`standard`	-0.0866	-0.1092	-0.0655	-1.510	8.63e-08	1.15e-07
`aggressive_focus`	-0.0796	-0.0988	-0.0611	-1.598	3.24e-08	1.30e-07
`class_balanced`	-0.0855	-0.1146	-0.0616	-1.240	2.09e-06	2.09e-06

Every variant's CI95 excludes 0 in the wrong direction. Cohen's d ≤ -1.2 = very large effect. BH-corrected p-values < 1.3e-07 — overwhelming statistical evidence focal loss regresses f1_buy at our dataset / labelling regime.

4. Per-track gate verdict (F1_BUY_BOOST_PLAN.md §6)¶

The 6 official gates (every gate must clear for lock) :

#	Gate	Threshold	Verdict
1	F1_buy lift ≥ +0.015 with CI95 excluding 0	every variant ∈ [-0.10, -0.05] with CI95 excluding 0 in the WRONG direction	FAIL × 4
2	Joint metric (Δexpectancy ≥ 0 AND Δsortino ≥ 0 AND Δmax_drawdown ≤ +1 %)	Δsortino negative on all 4 variants ; Δmax_dd > 1 % on 3/4	FAIL × 4
3	Stability — per-fold f1_buy variance ≤ 0.05	max variance 0.021 (class_balanced on OPUSDC)	PASS × 4
4	Per-asset — f1_buy improves on ≥ 4/5 cryptos	0/5 improve on every variant	FAIL × 4
5	Sample size — ≥ 50 BUY trades / fold	mean n_trades ≈ 34 (vs 47.4 baseline)	FAIL × 4
6	MLOps readiness	`mlops_readiness.md` complete (PR #767)	PASS × 4

Verdict per variant : FAIL on 4 of 6 gates → all 4 focal variants ABANDON.

5. Supporting metrics (context, not gating)¶

Joint metric breakdown (mean across cryptos)¶

Variant	Δexpectancy	Δsortino	Δmax_drawdown
`mild_focus`	+10.7 ✓	-0.13 ✗	+0.68 ✓
`standard`	+18.6 ✓	-0.05 ✗	+1.28 ✗
`aggressive_focus`	+23.9 ✓	-0.13 ✗	+1.21 ✗
`class_balanced`	+17.3 ✓	-0.28 ✗	+1.49 ✗

Δexpectancy > 0 on all variants is misleading — the modest expectancy lift is dwarfed by the large f1_buy regression and the negative sortino delta. The trade quality is worse, not better.

Why focal loss regresses on this data¶

Three plausible explanations (consistent with the Track 5 label_smoothing pattern documented in 2026-04-29-track5-label-smoothing-results.md §6) :

Class balance is already addressed upstream : the CVN_CLASS_BALANCING=1 setting (sklearn compute_class_weight) already applies α-weighting at the loss level. Adding focal's (1-p_t)^γ modulator on top compounds two corrections that the data does not need.
Rare-event vs calibration tension : f1_buy requires both recall (catch the BUY signal) and precision (don't fire on noise). Focal pushes the gradient onto hard examples, which on noisy crypto data are the noisy labels themselves — so the model down-weights the cleanest signal in favour of training noise.
HPO drift across runs : the focal HPO objective was f1_binary (per CVN_HPO_OBJECTIVE) but the variant produced a different operating point per fold ; none benefits from the most stable HPO landscape.

These are hypotheses ; the gate decision does not depend on which is correct.

6. Decisions¶

6.1 Lock variant¶

No lock. Keep CVN_LOSS_FUNCTION=binary:logistic (baseline) as production. Console state unchanged. No flip required in ftf_config.base_env.

6.2 OP Story closure¶

wp#41 (CVN-N001-EE-S02) → status Closed with verdict ABANDONED. Comment links this dossier + the implementation PR #767.

6.3 Joint variant (Track 5 × Track 6)¶

Skipped. A joint mild_smoothing × mild_focus variant would compound two abandoned interventions ; the prior on its outcome is essentially zero. The +25 row joint sweep would be wasted compute.

6.4 Track 6 Story Phase¶

Phase 4 (FTF sweep) and Phase 5 (gate decision) of TEMPLATE_story_phases_ml_ftf.md are now complete with verdict abandon. Closes the Story.

6.5 Follow-ups¶

CVN-N011-EA-S10 — gRPC fork deadlock fix landed (PR #777 squash 0f6220fc). Cleanlab branch of Track 5 unblocked, operator can re-trigger after deploy validates.
CVN-N011-EA-S11 — post-mortem of the missing-dep regression that delayed this Track (sympy hotfix PR #775). Open, P2.
Tracks 7-13 — sequencing per F1 plan §6 risk #4. Two consecutive ABANDON results (Track 5 mild + Track 6 focal) indicate that loss-function manipulation is not the productive lever on this data ; the next viable track families are data (Track 1, 12) or architecture (Track 11). Operator decision pending.

7. Linked context¶

Plan dossier : 2026-04-28-track6-focal-loss-plan.md
PR review dossier : 2026-04-28-track6-focal-loss-pr-review.md
Track 5 (sister ABANDON) : 2026-04-29-track5-label-smoothing-results.md
Implementation PR : #767
Sister hotfix : #775 (sympy)
Operations incident log : OPERATIONS.md §17.4 (sympy regression that blocked the first focal sweep attempt)
Mission overview : ML Boost — F1_buy 75 mission