ADR-0079 — FTF sweep : results analysis, dossier, and Story closure¶
Status: active
Date: 2026-05-01
Introduced by: CVN-N001-EE-S03 (Track 9 closure) — operator request 2026-05-01 after 3 consecutive ABANDON outcomes (Tracks 5 / 6 / 9) demonstrated a recurring multi-step workflow that was undocumented and at risk of drift.
Supersedes: none (extends ADR-58 + ADR-69 + ADR-77 ; reuses ADR-68 + ADR-70 contracts)
Context¶
Per ADR-58, every pipeline change is gated by an FTF factor + variant matrix. A factor sweep runs the matrix on the standard 5-cryptos × 5-folds protocol per F1 plan §6, persists results in PostgreSQL finetune_results, and produces a PDF report (CVNTrade FTF Report Engine, commun.finetune.report_pdf).
Today's pattern (3 sweeps closed : Tracks 5, 6, 9 — all ABANDON) shows a recurring multi-step workflow :
- Sweep completes (Airflow DAG
dag_finetune__factor_sweep+ Console UI shows "completed") - Operator pulls PDF report
- Operator analyses PDF against F1 plan §6 gates + the report's lock rule (≥ 2 metrics with BH p<0.05 AND |Cohen's d| ≥ 0.3)
- Operator decides : LOCK (Console flip on
ftf_config.base_env) / KEEP_AVAILABLE (no flip, factor stays) / ABANDON (no flip, factor stays for re-evaluation) - Story is closed in OpenProject (per ADR-69)
- F1 plan outcomes table is updated
Failure modes observed in practice :
- Drift between dossier and PDF : Tracks 5 + 6 dossiers each pulled the data from PG manually, with mild variance in column selection — inconsistent across dossiers, future operators couldn't reproduce.
- PDF disposability : the FTF report PDF lives in
results/(gitignored) on the operator's machine ; if the operator's machine is wiped OR the run is purged from MLflow, the analysis can't be cross-checked. - Verdict ambiguity : Track 5 + Track 6 used "ABANDONED" with strong negative effect sizes ; Track 9 had weaker insignificant results — the difference between ABANDON / KEEP_AVAILABLE wasn't documented.
- OP closure lag : without an explicit checklist, Story status
In testing→Closedwith a structured comment was inconsistent across closures (drift from ADR-76 SSoT contract). - Cross-Track lessons table : F1 plan §6 outcomes table updates were ad-hoc ; closing the Story without updating the table broke the project-state surface that future Story-pickers rely on.
This ADR codifies the workflow as invariants the operator + AI both follow so the next closure is mechanical.
Decision¶
Every FTF-sweep-driven Story closure follows the canonical 8-step workflow below, with mandatory artefacts at each step. The workflow is enforced by the sign-off checklist at the bottom of every results dossier (template + invariant I7).
Canonical 8-step workflow¶
| Step | Output | Owner | Artefact |
|---|---|---|---|
| 1. Sweep launch | run_id (ftf_<YYYYMMDD>_<HHMMSS>_<short_hash>_<strategy>) |
operator | Airflow DAG run ; Console UI entry ; PG row in finetune_runs |
| 2. Sweep completion | 125 rows in finetune_results (5 × 5 × 5 standard protocol) ; 0 errors |
autonomous | Console UI status completed ; PDF report generated |
| 3. PDF retrieval | Local PDF copy | operator | Console "Download PDF" button → documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf (committed per I3) |
| 4. Analysis dossier | Results markdown filled in from PDF tables | operator + AI | documentation/missions/<mission>/<YYYY-MM-DD>-track<N>-<slug>-results.md (template per I2) |
| 5. Verdict | LOCK / KEEP_AVAILABLE / ABANDON per the decision tree (I5) |
operator | dossier §11 (or equivalent) ; rationale referenced per gate |
| 6. F1 plan update | Outcomes table row + cross-track lesson (if applicable) | operator + AI | documentation/F1_BUY_BOOST_PLAN.md §6 outcomes block + tracking table §10 |
| 7. OP Story closure | Story status In testing → Closed ; structured comment with run_id + dossier link + verdict |
operator | OP wp UI (per ADR-69 + ADR-76) |
| 8. Memory entry (optional) | feedback_*.md only if non-obvious lesson |
operator + AI | ~/.claude/projects/.../memory/feedback_*.md |
Verdict decision tree (I5)¶
Lock rule cleared?
(≥ 2 metrics BH p<0.05 AND |d|≥0.3, in winner direction)
│
┌───────────┴───────────┐
▼ ▼
YES NO
│ │
▼ ▼
6 official gates Effect sizes meaningful?
all PASSED? (any |d| ≥ 0.3, any direction)
│ │
┌───────┴───────┐ ┌────────┴────────┐
▼ ▼ ▼ ▼
YES NO YES NO
│ │ │ │
▼ ▼ ▼ ▼
LOCK KEEP_AVAIL KEEP_AVAIL ABANDON
(flip) (factor (factor stays, (factor stays
stays, re-eval gated in matrix per
no flip) on data growth) Track 5/6/9
pattern,
implementation
stays in tree)
The key distinction :
- LOCK : statistical significance + practical relevance + all 6 gates pass → Console flip on
ftf_config.base_env(atomic per-crypto promotion per ADR-42 ; the locked theta itself is OOS-calibrated per ADR-15) - KEEP_AVAILABLE : effect exists but doesn't clear the lock rule today ; factor stays in
MODEL_FACTORSfor re-evaluation if the dataset grows OR a paired Story shifts the underlying signal - ABANDON : no statistically meaningful effect (lock rule fail AND no |d| ≥ 0.3) → factor stays in
MODEL_FACTORSper Track 5/6/9 precedent (implementation tree-resident, FTF-discoverable for future operators)
Critical : ABANDON does NOT mean "remove the factor from the FTF matrix". The implementation stays in tree, the FTF factor stays in MODEL_FACTORS, the runbook (if any) stays in mkdocs nav. A future operator can re-trigger the sweep without re-implementation cost. Removing factors from MODEL_FACTORS is a separate decision (cleanup PR, not Story-closure scope).
Dossier template structure (I2)¶
Every results dossier follows this structure (mirrors Tracks 5, 6, 9) :
# <Track N> — <Factor name> results & gate decision
**Story**: <cvn_id> ([wp#NN](OP link)) — Track N of <mission>
**Date**: <YYYY-MM-DD>
**Plan dossier**: [<reviews/...>] — committee `<id>` PASSED/REJECTED
**Implementation PR**: [#NNN](GH link) (squash <sha>, merged <date>)
**Sweep run_id**: `ftf_<YYYYMMDD>_<HHMMSS>_<short_hash>_<strategy>`
**Sweep status**: `completed` | `partial` | `failed` ; <N> results ; <K> errors ; <duration>
**Cryptos**: <list>
**Strategy / PTE**: `<ATR_SL_TP_H>`
**Folds**: <N> ; **Trials**: <N> ; **Cost**: <bps> bps
**FTF report PDF**: [`reports/ftf_report_<run_id>.pdf`](reports/<filename>)
**Verdict**: <LOCK | KEEP_AVAILABLE | ABANDON> — <one-line rationale>
**Lock decision**: <Console flip details OR "no flip">
## 1. Sweep state (variant × row count × coverage)
## 2. Performance summary (PDF "Couche C" table — Sortino, Std, Trades, Win Rate, Max DD, Return %)
## 3. Pairwise BH-corrected comparisons (PDF "Pairwise" table)
## 4. Multi-metric significance matrix (PDF "Multi-metric" table — 4 metrics × all pairs)
## 5. ML metrics — Couche A (PDF "ML Metrics" table — f1_buy + precision + recall + AUC + Brier + ECE)
## 6. Signal funnel — Couche B (PDF "Funnel" table — raw_buy + filter survival)
## 7. Per-crypto performance (PDF "Per-crypto" table — Sortino × 5 cryptos × variants)
## 8. Stability by fold (PDF heatmap — Sortino per fold)
## 9. Gate evaluation (F1 plan §6 — 6 official gates + Track-specific gate)
## 10. Why <factor> worked / didn't (post-hoc hypotheses ; pick the ones the data supports)
## 11. Decisions (LOCK / KEEP_AVAILABLE / ABANDON + lock decision)
## 12. Sprint version + OP closure (OP comment template + sprint-version closure check)
## Sign-off checklist (gate before OP wp closure)
- [x] §1-§9 populated from PDF
- [x] §10 hypothesis pick
- [x] §11 verdict + lock decision
- [x] §12 OP comment drafted
- [ ] OP Story closed (operator action)
- [ ] Sprint version closure evaluated
- [ ] F1 plan §6 outcomes table updated
PDF location convention (I3)¶
Every results-dossier PDF is committed under documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf.
- Why committed : the PDF is the immutable run artefact ; without it, the dossier's tables can't be cross-checked if PG / MLflow runs are purged.
- Why this path : keeps the PDF adjacent to the dossier that analyses it. Predictable for future readers.
- Why not S3 : these are small (~200KB), human-readable artefacts that benefit from being in-repo (PR diff visible, mkdocs serveable, version-pinned with the dossier).
- gitignore exception :
.gitignorecarries an explicit!documentation/missions/**/reports/*.pdfrule (overrides the generic*.pdf+reports/filters).
Invariants¶
- Invariant 1 (workflow completeness) : Every FTF-sweep-driven Story closure executes all 8 steps in the canonical workflow above. Skipping a step requires an explicit waiver in the OP wp comment with rationale.
- Invariant 2 (dossier template) : Every results dossier under
documentation/missions/<mission>/follows the §1-§12 + sign-off checklist structure above. Sections may be empty (e.g., §3.5 per-regime fallback may be N/A) but MUST be present. - Invariant 3 (PDF location) : Every FTF report PDF analysed in a dossier is committed under
documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf. The dossier links to it via relative markdown link. - Invariant 4 (numerical fidelity) : Numerical tables in the dossier (§2-§9) are populated FROM the PDF report (not from independent PG queries) so the dossier is a faithful summary, not a parallel analysis. PG queries are used only to verify the row count (125) and to surface fields the PDF omits (e.g., per-regime fallback events from Loki).
- Invariant 5 (verdict decision tree) : The verdict (LOCK / KEEP_AVAILABLE / ABANDON) is rendered per the decision tree above. Specifically : ABANDON requires both lock-rule fail AND no |d| ≥ 0.3 anywhere ; KEEP_AVAILABLE applies when an effect exists but doesn't clear the lock rule ; LOCK requires lock rule cleared AND all 6 official gates pass.
- Invariant 6 (factor permanence post-ABANDON) : An ABANDON verdict does NOT remove the factor from
MODEL_FACTORSincommun/finetune/ablation_matrix.py. The implementation stays in tree, the FTF factor remains discoverable for future operators. Removing a factor is a separate decision, gated by a separate PR + ADR-58 review. - Invariant 7 (sign-off gate) : OP wp Story status MUST NOT transition
In testing→Closeduntil the sign-off checklist in the dossier is fully checked, OP comment is posted with run_id + dossier link + verdict, AND F1 plan §6 outcomes table is updated. - Invariant 8 (cross-Track lesson) : When 2+ consecutive ABANDONs happen in the same tier (label/loss/calibration/data/architecture), the F1 plan §6 cross-Track lesson block is updated with the combined finding. This is what makes the F1 plan a living artefact, not a snapshot.
- Invariant 9 (split-PR Stories —
In testinggate) : When a Story's implementation is deliberately split across multiple PRs (e.g., contract-surface PR first + follow-up wiring PR per the Track 1 + Track 11 pattern documented in their respectivemlops_readiness.md§7), the OP Story status MUST stayIn progressuntil all contract-surface PRs have merged AND the FTF sweep is launchable today (= setting the factor's env vars produces different model behaviours, not identical baselines). Specifically : merging only theruntime contract surfacePR (factor + guardrail + standalone module + tests) WITHOUT the wiring PR (callsite from EnrichmentAPI / autotrainer / inference loader) does NOT advance the Story toIn testing. The mlops_readiness §7 "Pre-LOCK gate" sub-checklist is the canonical anchor — when it lists pending follow-up PRs, the Story staysIn progress. Discovered 2026-05-01 : wp#43 (Track 1 BTC features) was prematurely advanced toIn testingafter PR #792 merged, but the sweep was not launchable becausecompute_btc_featureshad no callsite anywhere in the EnrichmentAPI / autotrainer / ETL paths. Operator rolled back toIn progress; this invariant codifies the rule for future split-PR Stories (Track 11 wp#45 is the next candidate post-PR-#793 merge — wp#45 staysIn progressuntil its block A follow-up PR ships). - Invariant 10 (auto-enforced SSoT — F1 plan §10 ↔ OpenProject) : the OP Story status MUST mirror
documentation/F1_BUY_BOOST_PLAN.md§10 tracking table within 5 min of any main-branch merge that updates that table, AND within 1 hour for any out-of-band drift (e.g., operator UI edit, import script side-effect). This is automated, not operator-discretionary :scripts/op_story_sync.pyparses the §10 table as source-of-truth and patches OP via its REST API ; the workflow.github/workflows/op-story-sync.ymlruns the syncer on every push to main + hourly cron (safety net). Discovered 2026-05-01 : after Track 9 closure (#794 merge), 6 Stories (wp#44/46/47/48/49/50) were staleIn progressdespite F1 plan §10 marking themNew; the operator surfaced this as a recurring discipline failure (feedback_no_discipline_workflows.mdmemory : "automate or don't ship"). This invariant + the syncer eliminate the recurring drift. Pre-condition :OPENPROJECT_API_KEYGitHub secret set ; the workflow degrades gracefully (warning, not failure) if missing. - Invariant 11 (supersession exception conditions — added 2026-05-02 post Track 11 anti-precedent) : Invariant 9 may be waived for a partial-coverage sweep ONLY when the aggregating variant mathematically dominates the missing component variants — i.e., the aggregating variant's null result rules out the existence of signal in the components by construction. Mathematical dominance means : if any component had positive signal contribution
δ, the aggregating variant's metric would shift byf(δ) > 0for knownf, so a null shift forcesδ = 0. Examples that DO satisfy dominance : awith_X_and_Yvariant that subsumeswith_Xandwith_Ywhen adding features can only increase OR keep the model class capacity (not strictly true in practice — even feature-superset can hurt via overfitting, so even this needs care). Examples that do NOT satisfy dominance : uniform averages of orthogonal model predictions (the average dilutes a strong individual signal toward the mean of the components ; a null average is compatible with one strong + two weak components in ANY direction). The Track 11 closure of 2026-05-01 cited a uniform-averaging supersession ; that argument is invalid and the closure was retracted 2026-05-02 (CVN-N001-EE-S06 wp#45 reopen). Cited as the anti-precedent for this invariant. Operators considering a supersession waiver MUST write the dominance proof inline in the dossier ; "the aggregator's result is null therefore the components must be null" is NOT a proof for averaging / pooling / mean-style aggregators.
Consequences¶
Positive¶
- Mechanical Story closure : Future Track closures take ~30 min instead of ~2h (Tracks 5/6 took longer because the workflow was being invented in flight).
- Auditable verdicts : The decision tree (I5) makes ABANDON / KEEP_AVAILABLE / LOCK reproducible across operators ; future re-eval can compare apples to apples.
- Data permanence : PDFs in-repo means the analysis is verifiable years later even if MLflow runs are purged.
- Discoverable lessons : F1 plan §6 outcomes table is the index ; cross-Track lessons surface naturally as the table fills.
- OP / docs / code SSoT alignment : ADR-69 (OP = orchestrator), ADR-77 (mkdocs SSoT), ADR-58 (FTF + guardrail), ADR-68 (committee), ADR-70 (MLOps readiness) all compose cleanly through the workflow ; no contradictions.
Negative¶
- Repo bloat : Each FTF PDF is ~200KB. At a sweep-per-week cadence over a year, ~10 MB. Acceptable (the docs base is < 50 MB excluding
site/), but worth flagging for future cleanup if mission count grows materially. - Workflow rigidity : The 8-step workflow is mandatory ; ad-hoc "quick closures" that skip steps (e.g., closing without a dossier because "the result is obviously trivial") are NOT permitted. The waiver mechanism (Invariant 1) handles edge cases.
- Operator-AI handoff steps : Steps 5 + 7 require operator (decision authority + OP write access). Steps 1, 2 are autonomous. Steps 3, 4, 6, 8 are operator + AI collaboration. The split is fine but the handoffs need the operator to explicitly trigger the next-step (no auto-promotion).
Cross-references¶
- ADR-15 : Atomic per-crypto promotion (the LOCK path's mechanism)
- ADR-42 : Atomic per-crypto promotion (same)
- ADR-58 : Every FTF factor must have a guardrail + integration test (governs which factors enter the sweep)
- ADR-59 : All params in PostgreSQL
ftf_config(the LOCK target — Console flip writes here) - ADR-68 : Expert Committee = default review channel (gates sweep launch via
plan_review) - ADR-69 : OpenProject = project orchestrator (Story
In testing→Closedsemantics) - ADR-70 : MLOps readiness template mandatory (every Story must complete the template before sweep launch)
- ADR-71 : Trading kill-switch (cross-cutting prerequisite, not specific to FTF)
- ADR-76 : OpenProject = SSoT for project memory (OP closure is the canonical state)
- ADR-77 : MkDocs + Structurizr = SSoT for documentation (results dossier renders in mkdocs)
Implementation evidence (canonical examples)¶
The 3 results dossiers below served as the empirical basis for this ADR :
- Track 5 — label smoothing (ABANDON) — first dossier, established the §1-§7 structure
- Track 6 — focal loss (ABANDON) — replicated structure ; cross-Track lesson #1 (training-signal manipulation tier ABANDONED)
- Track 9 — per-regime threshold (ABANDON) — first dossier following the canonical 8-step workflow ; PDF committed per I3 ; cross-Track lesson #2 (calibration tier joins the abandoned cluster)
Future LOCK or KEEP_AVAILABLE outcomes will validate the decision tree (I5) — the 3 ABANDON precedents only exercise one branch.
Open questions / forward debt¶
- What about non-FTF Stories ? This ADR scopes to Stories whose closure depends on an FTF sweep. Non-FTF Stories (e.g., docs-only, infra-only) follow the lighter workflow described in
STORY_WORKFLOW.md. A future ADR may unify if the patterns converge. - Sweep retries : if a sweep partially fails (some folds error), the dossier convention is to call it "partial" in the metadata block. The decision-tree treatment of partial sweeps is currently operator-judgement ; a future revision may codify a minimum coverage threshold (e.g., ≥ 90 % of variant × crypto × fold cells must succeed).
- Multi-factor sweeps : the workflow assumes single-factor sweeps. Multi-factor sweeps (interaction studies) need a separate dossier convention — out of scope here, deferred to when first multi-factor sweep ships.
- PDF version pinning : the FTF Report Engine itself can change format. The dossier currently doesn't pin the report engine version. If a future report change breaks back-references in old dossiers, we'd need a migration. For now, the report format has been stable since Track 5.
- Split-PR Story discoverability : Invariant 9 requires the operator to know the Story is split before transitioning OP status — relies on the mlops_readiness §7 "Pre-LOCK gate" sub-checklist as the canonical anchor. Forward debt : a CI check that scans
documentation/stories/*/mlops_readiness.mdfor unchecked "Pre-LOCK gate" items and fails if the corresponding OP Story isIn testingwould automate the discipline. Current pattern relies on operator memory ; perfeedback_no_discipline_workflows.mdmemory, this is the kind of cross-system sync that drifts and should be automated. Tracked as a follow-up Story (CVN-N012-EC-S03 OP Board integration is the natural home).