Skip to content

ADR-0079 — FTF sweep : results analysis, dossier, and Story closure

Status: active Date: 2026-05-01 Introduced by: CVN-N001-EE-S03 (Track 9 closure) — operator request 2026-05-01 after 3 consecutive ABANDON outcomes (Tracks 5 / 6 / 9) demonstrated a recurring multi-step workflow that was undocumented and at risk of drift. Supersedes: none (extends ADR-58 + ADR-69 + ADR-77 ; reuses ADR-68 + ADR-70 contracts)


Context

Per ADR-58, every pipeline change is gated by an FTF factor + variant matrix. A factor sweep runs the matrix on the standard 5-cryptos × 5-folds protocol per F1 plan §6, persists results in PostgreSQL finetune_results, and produces a PDF report (CVNTrade FTF Report Engine, commun.finetune.report_pdf).

Today's pattern (3 sweeps closed : Tracks 5, 6, 9 — all ABANDON) shows a recurring multi-step workflow :

  1. Sweep completes (Airflow DAG dag_finetune__factor_sweep + Console UI shows "completed")
  2. Operator pulls PDF report
  3. Operator analyses PDF against F1 plan §6 gates + the report's lock rule (≥ 2 metrics with BH p<0.05 AND |Cohen's d| ≥ 0.3)
  4. Operator decides : LOCK (Console flip on ftf_config.base_env) / KEEP_AVAILABLE (no flip, factor stays) / ABANDON (no flip, factor stays for re-evaluation)
  5. Story is closed in OpenProject (per ADR-69)
  6. F1 plan outcomes table is updated

Failure modes observed in practice :

  • Drift between dossier and PDF : Tracks 5 + 6 dossiers each pulled the data from PG manually, with mild variance in column selection — inconsistent across dossiers, future operators couldn't reproduce.
  • PDF disposability : the FTF report PDF lives in results/ (gitignored) on the operator's machine ; if the operator's machine is wiped OR the run is purged from MLflow, the analysis can't be cross-checked.
  • Verdict ambiguity : Track 5 + Track 6 used "ABANDONED" with strong negative effect sizes ; Track 9 had weaker insignificant results — the difference between ABANDON / KEEP_AVAILABLE wasn't documented.
  • OP closure lag : without an explicit checklist, Story status In testingClosed with a structured comment was inconsistent across closures (drift from ADR-76 SSoT contract).
  • Cross-Track lessons table : F1 plan §6 outcomes table updates were ad-hoc ; closing the Story without updating the table broke the project-state surface that future Story-pickers rely on.

This ADR codifies the workflow as invariants the operator + AI both follow so the next closure is mechanical.

Decision

Every FTF-sweep-driven Story closure follows the canonical 8-step workflow below, with mandatory artefacts at each step. The workflow is enforced by the sign-off checklist at the bottom of every results dossier (template + invariant I7).

Canonical 8-step workflow

Step Output Owner Artefact
1. Sweep launch run_id (ftf_<YYYYMMDD>_<HHMMSS>_<short_hash>_<strategy>) operator Airflow DAG run ; Console UI entry ; PG row in finetune_runs
2. Sweep completion 125 rows in finetune_results (5 × 5 × 5 standard protocol) ; 0 errors autonomous Console UI status completed ; PDF report generated
3. PDF retrieval Local PDF copy operator Console "Download PDF" button → documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf (committed per I3)
4. Analysis dossier Results markdown filled in from PDF tables operator + AI documentation/missions/<mission>/<YYYY-MM-DD>-track<N>-<slug>-results.md (template per I2)
5. Verdict LOCK / KEEP_AVAILABLE / ABANDON per the decision tree (I5) operator dossier §11 (or equivalent) ; rationale referenced per gate
6. F1 plan update Outcomes table row + cross-track lesson (if applicable) operator + AI documentation/F1_BUY_BOOST_PLAN.md §6 outcomes block + tracking table §10
7. OP Story closure Story status In testingClosed ; structured comment with run_id + dossier link + verdict operator OP wp UI (per ADR-69 + ADR-76)
8. Memory entry (optional) feedback_*.md only if non-obvious lesson operator + AI ~/.claude/projects/.../memory/feedback_*.md

Verdict decision tree (I5)

                    Lock rule cleared?
                    (≥ 2 metrics BH p<0.05 AND |d|≥0.3, in winner direction)
                ┌───────────┴───────────┐
                ▼                       ▼
              YES                      NO
               │                        │
               ▼                        ▼
       6 official gates         Effect sizes meaningful?
       all PASSED?              (any |d| ≥ 0.3, any direction)
               │                        │
       ┌───────┴───────┐       ┌────────┴────────┐
       ▼               ▼       ▼                 ▼
      YES             NO     YES                NO
       │               │      │                  │
       ▼               ▼      ▼                  ▼
    LOCK         KEEP_AVAIL  KEEP_AVAIL       ABANDON
    (flip)        (factor    (factor stays,   (factor stays
                  stays,     re-eval gated     in matrix per
                  no flip)   on data growth)   Track 5/6/9
                                                 pattern,
                                                 implementation
                                                 stays in tree)

The key distinction :

  • LOCK : statistical significance + practical relevance + all 6 gates pass → Console flip on ftf_config.base_env (atomic per-crypto promotion per ADR-42 ; the locked theta itself is OOS-calibrated per ADR-15)
  • KEEP_AVAILABLE : effect exists but doesn't clear the lock rule today ; factor stays in MODEL_FACTORS for re-evaluation if the dataset grows OR a paired Story shifts the underlying signal
  • ABANDON : no statistically meaningful effect (lock rule fail AND no |d| ≥ 0.3) → factor stays in MODEL_FACTORS per Track 5/6/9 precedent (implementation tree-resident, FTF-discoverable for future operators)

Critical : ABANDON does NOT mean "remove the factor from the FTF matrix". The implementation stays in tree, the FTF factor stays in MODEL_FACTORS, the runbook (if any) stays in mkdocs nav. A future operator can re-trigger the sweep without re-implementation cost. Removing factors from MODEL_FACTORS is a separate decision (cleanup PR, not Story-closure scope).

Dossier template structure (I2)

Every results dossier follows this structure (mirrors Tracks 5, 6, 9) :

# <Track N> — <Factor name> results & gate decision

**Story**: <cvn_id> ([wp#NN](OP link)) — Track N of <mission>
**Date**: <YYYY-MM-DD>
**Plan dossier**: [<reviews/...>] — committee `<id>` PASSED/REJECTED
**Implementation PR**: [#NNN](GH link) (squash <sha>, merged <date>)
**Sweep run_id**: `ftf_<YYYYMMDD>_<HHMMSS>_<short_hash>_<strategy>`
**Sweep status**: `completed` | `partial` | `failed` ; <N> results ; <K> errors ; <duration>
**Cryptos**: <list>
**Strategy / PTE**: `<ATR_SL_TP_H>`
**Folds**: <N> ; **Trials**: <N> ; **Cost**: <bps> bps
**FTF report PDF**: [`reports/ftf_report_<run_id>.pdf`](reports/<filename>)
**Verdict**: <LOCK | KEEP_AVAILABLE | ABANDON> — <one-line rationale>
**Lock decision**: <Console flip details OR "no flip">

## 1. Sweep state          (variant × row count × coverage)
## 2. Performance summary  (PDF "Couche C" table — Sortino, Std, Trades, Win Rate, Max DD, Return %)
## 3. Pairwise BH-corrected comparisons   (PDF "Pairwise" table)
## 4. Multi-metric significance matrix    (PDF "Multi-metric" table — 4 metrics × all pairs)
## 5. ML metrics — Couche A               (PDF "ML Metrics" table — f1_buy + precision + recall + AUC + Brier + ECE)
## 6. Signal funnel — Couche B            (PDF "Funnel" table — raw_buy + filter survival)
## 7. Per-crypto performance              (PDF "Per-crypto" table — Sortino × 5 cryptos × variants)
## 8. Stability by fold                   (PDF heatmap — Sortino per fold)
## 9. Gate evaluation                     (F1 plan §6 — 6 official gates + Track-specific gate)
## 10. Why <factor> worked / didn't       (post-hoc hypotheses ; pick the ones the data supports)
## 11. Decisions                          (LOCK / KEEP_AVAILABLE / ABANDON + lock decision)
## 12. Sprint version + OP closure        (OP comment template + sprint-version closure check)

## Sign-off checklist (gate before OP wp closure)
- [x] §1-§9 populated from PDF
- [x] §10 hypothesis pick
- [x] §11 verdict + lock decision
- [x] §12 OP comment drafted
- [ ] OP Story closed (operator action)
- [ ] Sprint version closure evaluated
- [ ] F1 plan §6 outcomes table updated

PDF location convention (I3)

Every results-dossier PDF is committed under documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf.

  • Why committed : the PDF is the immutable run artefact ; without it, the dossier's tables can't be cross-checked if PG / MLflow runs are purged.
  • Why this path : keeps the PDF adjacent to the dossier that analyses it. Predictable for future readers.
  • Why not S3 : these are small (~200KB), human-readable artefacts that benefit from being in-repo (PR diff visible, mkdocs serveable, version-pinned with the dossier).
  • gitignore exception : .gitignore carries an explicit !documentation/missions/**/reports/*.pdf rule (overrides the generic *.pdf + reports/ filters).

Invariants

  • Invariant 1 (workflow completeness) : Every FTF-sweep-driven Story closure executes all 8 steps in the canonical workflow above. Skipping a step requires an explicit waiver in the OP wp comment with rationale.
  • Invariant 2 (dossier template) : Every results dossier under documentation/missions/<mission>/ follows the §1-§12 + sign-off checklist structure above. Sections may be empty (e.g., §3.5 per-regime fallback may be N/A) but MUST be present.
  • Invariant 3 (PDF location) : Every FTF report PDF analysed in a dossier is committed under documentation/missions/<mission>/reports/ftf_report_<run_id>.pdf. The dossier links to it via relative markdown link.
  • Invariant 4 (numerical fidelity) : Numerical tables in the dossier (§2-§9) are populated FROM the PDF report (not from independent PG queries) so the dossier is a faithful summary, not a parallel analysis. PG queries are used only to verify the row count (125) and to surface fields the PDF omits (e.g., per-regime fallback events from Loki).
  • Invariant 5 (verdict decision tree) : The verdict (LOCK / KEEP_AVAILABLE / ABANDON) is rendered per the decision tree above. Specifically : ABANDON requires both lock-rule fail AND no |d| ≥ 0.3 anywhere ; KEEP_AVAILABLE applies when an effect exists but doesn't clear the lock rule ; LOCK requires lock rule cleared AND all 6 official gates pass.
  • Invariant 6 (factor permanence post-ABANDON) : An ABANDON verdict does NOT remove the factor from MODEL_FACTORS in commun/finetune/ablation_matrix.py. The implementation stays in tree, the FTF factor remains discoverable for future operators. Removing a factor is a separate decision, gated by a separate PR + ADR-58 review.
  • Invariant 7 (sign-off gate) : OP wp Story status MUST NOT transition In testingClosed until the sign-off checklist in the dossier is fully checked, OP comment is posted with run_id + dossier link + verdict, AND F1 plan §6 outcomes table is updated.
  • Invariant 8 (cross-Track lesson) : When 2+ consecutive ABANDONs happen in the same tier (label/loss/calibration/data/architecture), the F1 plan §6 cross-Track lesson block is updated with the combined finding. This is what makes the F1 plan a living artefact, not a snapshot.
  • Invariant 9 (split-PR Stories — In testing gate) : When a Story's implementation is deliberately split across multiple PRs (e.g., contract-surface PR first + follow-up wiring PR per the Track 1 + Track 11 pattern documented in their respective mlops_readiness.md §7), the OP Story status MUST stay In progress until all contract-surface PRs have merged AND the FTF sweep is launchable today (= setting the factor's env vars produces different model behaviours, not identical baselines). Specifically : merging only the runtime contract surface PR (factor + guardrail + standalone module + tests) WITHOUT the wiring PR (callsite from EnrichmentAPI / autotrainer / inference loader) does NOT advance the Story to In testing. The mlops_readiness §7 "Pre-LOCK gate" sub-checklist is the canonical anchor — when it lists pending follow-up PRs, the Story stays In progress. Discovered 2026-05-01 : wp#43 (Track 1 BTC features) was prematurely advanced to In testing after PR #792 merged, but the sweep was not launchable because compute_btc_features had no callsite anywhere in the EnrichmentAPI / autotrainer / ETL paths. Operator rolled back to In progress ; this invariant codifies the rule for future split-PR Stories (Track 11 wp#45 is the next candidate post-PR-#793 merge — wp#45 stays In progress until its block A follow-up PR ships).
  • Invariant 10 (auto-enforced SSoT — F1 plan §10 ↔ OpenProject) : the OP Story status MUST mirror documentation/F1_BUY_BOOST_PLAN.md §10 tracking table within 5 min of any main-branch merge that updates that table, AND within 1 hour for any out-of-band drift (e.g., operator UI edit, import script side-effect). This is automated, not operator-discretionary : scripts/op_story_sync.py parses the §10 table as source-of-truth and patches OP via its REST API ; the workflow .github/workflows/op-story-sync.yml runs the syncer on every push to main + hourly cron (safety net). Discovered 2026-05-01 : after Track 9 closure (#794 merge), 6 Stories (wp#44/46/47/48/49/50) were stale In progress despite F1 plan §10 marking them New ; the operator surfaced this as a recurring discipline failure (feedback_no_discipline_workflows.md memory : "automate or don't ship"). This invariant + the syncer eliminate the recurring drift. Pre-condition : OPENPROJECT_API_KEY GitHub secret set ; the workflow degrades gracefully (warning, not failure) if missing.
  • Invariant 11 (supersession exception conditions — added 2026-05-02 post Track 11 anti-precedent) : Invariant 9 may be waived for a partial-coverage sweep ONLY when the aggregating variant mathematically dominates the missing component variants — i.e., the aggregating variant's null result rules out the existence of signal in the components by construction. Mathematical dominance means : if any component had positive signal contribution δ, the aggregating variant's metric would shift by f(δ) > 0 for known f, so a null shift forces δ = 0. Examples that DO satisfy dominance : a with_X_and_Y variant that subsumes with_X and with_Y when adding features can only increase OR keep the model class capacity (not strictly true in practice — even feature-superset can hurt via overfitting, so even this needs care). Examples that do NOT satisfy dominance : uniform averages of orthogonal model predictions (the average dilutes a strong individual signal toward the mean of the components ; a null average is compatible with one strong + two weak components in ANY direction). The Track 11 closure of 2026-05-01 cited a uniform-averaging supersession ; that argument is invalid and the closure was retracted 2026-05-02 (CVN-N001-EE-S06 wp#45 reopen). Cited as the anti-precedent for this invariant. Operators considering a supersession waiver MUST write the dominance proof inline in the dossier ; "the aggregator's result is null therefore the components must be null" is NOT a proof for averaging / pooling / mean-style aggregators.

Consequences

Positive

  • Mechanical Story closure : Future Track closures take ~30 min instead of ~2h (Tracks 5/6 took longer because the workflow was being invented in flight).
  • Auditable verdicts : The decision tree (I5) makes ABANDON / KEEP_AVAILABLE / LOCK reproducible across operators ; future re-eval can compare apples to apples.
  • Data permanence : PDFs in-repo means the analysis is verifiable years later even if MLflow runs are purged.
  • Discoverable lessons : F1 plan §6 outcomes table is the index ; cross-Track lessons surface naturally as the table fills.
  • OP / docs / code SSoT alignment : ADR-69 (OP = orchestrator), ADR-77 (mkdocs SSoT), ADR-58 (FTF + guardrail), ADR-68 (committee), ADR-70 (MLOps readiness) all compose cleanly through the workflow ; no contradictions.

Negative

  • Repo bloat : Each FTF PDF is ~200KB. At a sweep-per-week cadence over a year, ~10 MB. Acceptable (the docs base is < 50 MB excluding site/), but worth flagging for future cleanup if mission count grows materially.
  • Workflow rigidity : The 8-step workflow is mandatory ; ad-hoc "quick closures" that skip steps (e.g., closing without a dossier because "the result is obviously trivial") are NOT permitted. The waiver mechanism (Invariant 1) handles edge cases.
  • Operator-AI handoff steps : Steps 5 + 7 require operator (decision authority + OP write access). Steps 1, 2 are autonomous. Steps 3, 4, 6, 8 are operator + AI collaboration. The split is fine but the handoffs need the operator to explicitly trigger the next-step (no auto-promotion).

Cross-references

  • ADR-15 : Atomic per-crypto promotion (the LOCK path's mechanism)
  • ADR-42 : Atomic per-crypto promotion (same)
  • ADR-58 : Every FTF factor must have a guardrail + integration test (governs which factors enter the sweep)
  • ADR-59 : All params in PostgreSQL ftf_config (the LOCK target — Console flip writes here)
  • ADR-68 : Expert Committee = default review channel (gates sweep launch via plan_review)
  • ADR-69 : OpenProject = project orchestrator (Story In testingClosed semantics)
  • ADR-70 : MLOps readiness template mandatory (every Story must complete the template before sweep launch)
  • ADR-71 : Trading kill-switch (cross-cutting prerequisite, not specific to FTF)
  • ADR-76 : OpenProject = SSoT for project memory (OP closure is the canonical state)
  • ADR-77 : MkDocs + Structurizr = SSoT for documentation (results dossier renders in mkdocs)

Implementation evidence (canonical examples)

The 3 results dossiers below served as the empirical basis for this ADR :

Future LOCK or KEEP_AVAILABLE outcomes will validate the decision tree (I5) — the 3 ABANDON precedents only exercise one branch.

Open questions / forward debt

  1. What about non-FTF Stories ? This ADR scopes to Stories whose closure depends on an FTF sweep. Non-FTF Stories (e.g., docs-only, infra-only) follow the lighter workflow described in STORY_WORKFLOW.md. A future ADR may unify if the patterns converge.
  2. Sweep retries : if a sweep partially fails (some folds error), the dossier convention is to call it "partial" in the metadata block. The decision-tree treatment of partial sweeps is currently operator-judgement ; a future revision may codify a minimum coverage threshold (e.g., ≥ 90 % of variant × crypto × fold cells must succeed).
  3. Multi-factor sweeps : the workflow assumes single-factor sweeps. Multi-factor sweeps (interaction studies) need a separate dossier convention — out of scope here, deferred to when first multi-factor sweep ships.
  4. PDF version pinning : the FTF Report Engine itself can change format. The dossier currently doesn't pin the report engine version. If a future report change breaks back-references in old dossiers, we'd need a migration. For now, the report format has been stable since Track 5.
  5. Split-PR Story discoverability : Invariant 9 requires the operator to know the Story is split before transitioning OP status — relies on the mlops_readiness §7 "Pre-LOCK gate" sub-checklist as the canonical anchor. Forward debt : a CI check that scans documentation/stories/*/mlops_readiness.md for unchecked "Pre-LOCK gate" items and fails if the corresponding OP Story is In testing would automate the discipline. Current pattern relies on operator memory ; per feedback_no_discipline_workflows.md memory, this is the kind of cross-system sync that drifts and should be automated. Tracked as a follow-up Story (CVN-N012-EC-S03 OP Board integration is the natural home).