Track 6 — Focal loss : results dossier (ABANDON)¶
Date : 2026-04-29
Story : CVN-N001-EE-S02 (OP wp#41)
Track : 6 of F1_BUY_BOOST_PLAN.md — focal loss for XGBoost binary classification
Plan dossier : 2026-04-28-track6-focal-loss-plan.md (committee 4ef337af PASSED OK after v1+v2 REJECTED)
PR review dossier : 2026-04-28-track6-focal-loss-pr-review.md (committee 13fd89c9 PASSED OK)
MLOps readiness : mlops_readiness.md — 6 sections complete per ADR-70
Implementation PR : #767 (squash db33e7dd, 2026-04-29)
Sister hotfix PR : #775 (sympy missing, blocked first sweep — see OPERATIONS.md §17.4)
FTF run : ftf_20260429_121011_282ec7_ATR0.5_1.5_H4 — 125 rows, 0 errors, status=completed
TL;DR — verdict¶
ABANDON for all 4 focal variants (mild_focus, standard, aggressive_focus, class_balanced). The hypothesis "focal loss concentrates training on the minority class and lifts f1_buy" is rejected by 5/5 cryptos with large effect size in the wrong direction (Cohen's d ∈ [-1.6, -1.2]).
none (γ=0, equivalent to standard binary cross-entropy) stays as the production champion. Console state unchanged ; no flip in ftf_config.base_env.
1. Sweep configuration¶
| Param | Value |
|---|---|
| Factor | focal_loss (Track 6) |
| Variants (5) | none (baseline, γ=0, α=0.5) ; mild_focus (γ=1, α=0.25) ; standard (γ=2, α=0.25) ; aggressive_focus (γ=4, α=0.25) ; class_balanced (γ=2, α=0.75) |
| Cryptos (5) | AAVEUSDC, ARBUSDC, LDOUSDC, OPUSDC, UNIUSDC (defi_top5) |
| Folds | 5 per crypto (purged k-fold per ADR-14) |
| Total useful rows | 5 × 5 × 5 = 125 ✅ (full coverage) |
| Power mode | standard (50 HPO trials per fold) |
| Strategy | ATR0.5_1.5_H4 |
2. Per-crypto × variant f1_buy (mean across 5 folds)¶
| Crypto | none (baseline) | mild_focus | standard | aggressive_focus | class_balanced |
|---|---|---|---|---|---|
| AAVEUSDC | 0.4560 | 0.3951 | 0.3541 | 0.3658 | 0.3845 |
| ARBUSDC | 0.4417 | 0.3911 | 0.3592 | 0.3569 | 0.3620 |
| LDOUSDC | 0.3892 | 0.3165 | 0.2910 | 0.3238 | 0.3053 |
| OPUSDC | 0.3679 | 0.3095 | 0.2986 | 0.2947 | 0.2772 |
| UNIUSDC | 0.4330 | 0.3270 | 0.3517 | 0.3487 | 0.3311 |
none is the highest f1_buy on every single crypto. No focal variant ever wins.
3. Paired delta vs baseline (variant - none, paired by crypto × fold)¶
n=25 paired samples per variant. Bootstrap CI95 (10,000 resamples, seed=42).
| Variant | mean Δf1_buy | CI95 low | CI95 high | Cohen's d | raw paired t p | BH p (m=4) |
|---|---|---|---|---|---|---|
mild_focus |
-0.0697 | -0.0874 | -0.0526 | -1.541 | 6.10e-08 | 1.22e-07 |
standard |
-0.0866 | -0.1092 | -0.0655 | -1.510 | 8.63e-08 | 1.15e-07 |
aggressive_focus |
-0.0796 | -0.0988 | -0.0611 | -1.598 | 3.24e-08 | 1.30e-07 |
class_balanced |
-0.0855 | -0.1146 | -0.0616 | -1.240 | 2.09e-06 | 2.09e-06 |
Every variant's CI95 excludes 0 in the wrong direction. Cohen's d ≤ -1.2 = very large effect. BH-corrected p-values < 1.3e-07 — overwhelming statistical evidence focal loss regresses f1_buy at our dataset / labelling regime.
4. Per-track gate verdict (F1_BUY_BOOST_PLAN.md §6)¶
The 6 official gates (every gate must clear for lock) :
| # | Gate | Threshold | Verdict |
|---|---|---|---|
| 1 | F1_buy lift ≥ +0.015 with CI95 excluding 0 | every variant ∈ [-0.10, -0.05] with CI95 excluding 0 in the WRONG direction | FAIL × 4 |
| 2 | Joint metric (Δexpectancy ≥ 0 AND Δsortino ≥ 0 AND Δmax_drawdown ≤ +1 %) | Δsortino negative on all 4 variants ; Δmax_dd > 1 % on 3/4 | FAIL × 4 |
| 3 | Stability — per-fold f1_buy variance ≤ 0.05 | max variance 0.021 (class_balanced on OPUSDC) | PASS × 4 |
| 4 | Per-asset — f1_buy improves on ≥ 4/5 cryptos | 0/5 improve on every variant | FAIL × 4 |
| 5 | Sample size — ≥ 50 BUY trades / fold | mean n_trades ≈ 34 (vs 47.4 baseline) | FAIL × 4 |
| 6 | MLOps readiness | mlops_readiness.md complete (PR #767) |
PASS × 4 |
Verdict per variant : FAIL on 4 of 6 gates → all 4 focal variants ABANDON.
5. Supporting metrics (context, not gating)¶
Joint metric breakdown (mean across cryptos)¶
| Variant | Δexpectancy | Δsortino | Δmax_drawdown |
|---|---|---|---|
mild_focus |
+10.7 ✓ | -0.13 ✗ | +0.68 ✓ |
standard |
+18.6 ✓ | -0.05 ✗ | +1.28 ✗ |
aggressive_focus |
+23.9 ✓ | -0.13 ✗ | +1.21 ✗ |
class_balanced |
+17.3 ✓ | -0.28 ✗ | +1.49 ✗ |
Δexpectancy > 0 on all variants is misleading — the modest expectancy lift is dwarfed by the large f1_buy regression and the negative sortino delta. The trade quality is worse, not better.
Why focal loss regresses on this data¶
Three plausible explanations (consistent with the Track 5 label_smoothing pattern documented in 2026-04-29-track5-label-smoothing-results.md §6) :
- Class balance is already addressed upstream : the
CVN_CLASS_BALANCING=1setting (sklearncompute_class_weight) already applies α-weighting at the loss level. Adding focal's(1-p_t)^γmodulator on top compounds two corrections that the data does not need. - Rare-event vs calibration tension :
f1_buyrequires both recall (catch the BUY signal) and precision (don't fire on noise). Focal pushes the gradient onto hard examples, which on noisy crypto data are the noisy labels themselves — so the model down-weights the cleanest signal in favour of training noise. - HPO drift across runs : the focal HPO objective was
f1_binary(perCVN_HPO_OBJECTIVE) but the variant produced a different operating point per fold ;nonebenefits from the most stable HPO landscape.
These are hypotheses ; the gate decision does not depend on which is correct.
6. Decisions¶
6.1 Lock variant¶
No lock. Keep CVN_LOSS_FUNCTION=binary:logistic (baseline) as production. Console state unchanged. No flip required in ftf_config.base_env.
6.2 OP Story closure¶
wp#41 (CVN-N001-EE-S02) → status Closed with verdict ABANDONED. Comment links this dossier + the implementation PR #767.
6.3 Joint variant (Track 5 × Track 6)¶
Skipped. A joint mild_smoothing × mild_focus variant would compound two abandoned interventions ; the prior on its outcome is essentially zero. The +25 row joint sweep would be wasted compute.
6.4 Track 6 Story Phase¶
Phase 4 (FTF sweep) and Phase 5 (gate decision) of TEMPLATE_story_phases_ml_ftf.md are now complete with verdict abandon. Closes the Story.
6.5 Follow-ups¶
CVN-N011-EA-S10— gRPC fork deadlock fix landed (PR #777 squash0f6220fc). Cleanlab branch of Track 5 unblocked, operator can re-trigger after deploy validates.CVN-N011-EA-S11— post-mortem of the missing-dep regression that delayed this Track (sympy hotfix PR #775). Open, P2.- Tracks 7-13 — sequencing per F1 plan §6 risk #4. Two consecutive ABANDON results (Track 5 mild + Track 6 focal) indicate that loss-function manipulation is not the productive lever on this data ; the next viable track families are data (Track 1, 12) or architecture (Track 11). Operator decision pending.
7. Linked context¶
- Plan dossier :
2026-04-28-track6-focal-loss-plan.md - PR review dossier :
2026-04-28-track6-focal-loss-pr-review.md - Track 5 (sister ABANDON) :
2026-04-29-track5-label-smoothing-results.md - Implementation PR : #767
- Sister hotfix : #775 (sympy)
- Operations incident log :
OPERATIONS.md§17.4 (sympy regression that blocked the first focal sweep attempt) - Mission overview : ML Boost — F1_buy 75 mission