0063 ftf mission mode is binary buy not buy 3 class is legacy
ADR-63 — FTF Mission Mode is Binary BUY/NOT_BUY; 3-class is Legacy¶
Status: Active as of 2026-04-22 (issue #608 F1=0.80 mission).
Context: The FTF framework historically trains 3-class classifiers (SELL / HOLD / BUY). Operator directive 2026-04-22 mandates a focused push on f1_buy ≥ 0.80 on binary BUY/NOT_BUY. Running 3-class models in parallel dilutes the compute budget and makes variant comparisons ambiguous — a factor that improves the HOLD boundary doesn't translate to better BUY decisions. The binary path now has end-to-end infrastructure (label remap, HPO branch, ThresholdCalibrator), so the default should match the mission.
Decision: The FTF baseline (ftf_config.base_env in PostgreSQL, ADR-59) pins the mission defaults:
CVN_BINARY_CLASSIFICATION=1
CVN_HPO_OBJECTIVE=f1_binary
CVN_HPO_N_TRIALS=50
CVN_THRESHOLD_METHOD=f1_binary
3-class mode is legacy — kept available via the classification_mode.3class ablation factor for audit comparisons, but never the default.
Invariants:
- Binary is the default: new FTF runs without explicit factor overrides use
CVN_BINARY_CLASSIFICATION=1. - HPO objective matches mode: when binary mode is on,
CVN_HPO_OBJECTIVE=f1_binaryis the default; the hyperoptimizer raises fail-fast (ADR-25) if an incompatible combo is selected (PR β, commit bc523dc3). - Threshold method matches mode: post-HPO threshold picking uses
CVN_THRESHOLD_METHOD=f1_binary(ThresholdCalibrator, PR δ-2) — aligned with the optimization metric. - n_trials is bumped to 50 (from the previous 15) — the f1 landscape has more local optima than logloss; committee rec 11.
- 3-class runs are marked
legacy=true: any factor variant that forcesCVN_BINARY_CLASSIFICATION=0is treated as an audit comparison, not a contender for promotion.
Rollback: to revert to 3-class as default, flip the base_env values via Console UI (no code change). The pipeline supports both modes — only the default moves.
Alternatives rejected:
- Keep 3-class (SELL/HOLD/BUY) as the default: the mission metric is f1_buy, not multi-class accuracy. A factor that improves the HOLD boundary doesn't translate to better BUY decisions, so 3-class compute dilutes the budget without moving the needle on the target.
- Hybrid routing (run both binary and 3-class per factor variant): doubles compute per FTF run (~6h instead of 3h) with no added signal for the mission — the 3-class results never drive a decision. Cheaper to run 3-class as an explicit audit variant on demand.
- Keep HPO n_trials at 30: the f1 landscape has more local optima than logloss (committee rec 11). 30 trials often converge to a flat plateau around mid-range theta; 50 trials surfaces the f1-maximizing region consistently. The added cost is ~10 minutes per fold, acceptable vs. the risk of under-sampling.
- Keep
fbeta_buyas the HPO objective (mathematically equivalent to f1_binary whenCVN_BUY_BETA=1.0): same metric, different name.f1_binaryis the explicit, mission-aligned name — reduces ambiguity in reports and PDFs (ADR-57: ADRs and labels in English, matching the metric, not aliased).
Files: ftf_config.base_env (PostgreSQL row id=1), managed via Console UI. No git change required (ADR-59).