0064 ftf preflight is a first class phase
ADR-64 — FTF Preflight is a First-Class Phase (No External Data-Prep Scripts)¶
Status: Decided (2026-04-24) · Amended 2026-04-25 by ADR-67
Amendment summary (2026-04-25, ADR-67): preflight is no longer the unique site of FI compute. The pluggable feature-selection framework (
commun/finetune/feature_selection/) hosts the ranking DAGs (variance, covariance, fi, shap, future selectors). Preflight remains one of two scopes (global) — the per-fold scope is invoked from inside the variants loop byregime_trainervia the same dispatcher. The principles below (preflight is first-class, hot-path stays untouched, idempotent + cache-aware, fail-fast on missing artefacts) still hold; the FI implementation surface moved fromsteps/fi_reference.py(now a thin wrapper) tofeature_selection/dispatch.get_selection(method='fi', scope='global', ...). See ADR-67 for the framework design.
Context: Before this ADR, the FTF (fine-tune framework) runner expected certain derived artefacts to exist in the cache before any fi_* variant could run — notably the OOF feature-importance reference. The write path was scripts/compute_fi_reference.py, a standalone CLI the operator had to remember to invoke manually before every FTF sweep that needed it. The consequences piled up:
- Operational footgun: forget the script, or run it on the wrong (symbol, strategy, timeframe), and the FTF crashes mid-sweep with an ADR-25 fail-fast pointing at the CLI — hours of wall-clock wasted.
- Stale artefacts: no staleness policy. A 6-month-old FI reference would silently be used against a fresh training window.
- Orphan MLflow runs: the CLI had no concept of the active FTF run; lineage was disconnected. FI decisions couldn't be traced back to the run that consumed them.
- Not FTF-toggleable (violates ADR-56): the CLI was not an ablation factor, had no env-var switch, no A/B path.
- No guardrail (violates ADR-58): no integration test for the cache-write → cache-read contract; the CLI was tested in isolation only.
- Duplicated computation logic: whoever maintains
load_fi_referencealso needs to keep the CLI in sync. Two sources of truth for the FI contract. - ADR-59 violation: operator toggles (staleness, enable/disable, force-recompute) lived only in CLI flags, not in
ftf_config.
Decision: Data-prep steps whose output is a prerequisite for one or more FTF variants MUST be implemented as PreflightStep subclasses under src/commun/finetune/preflight/steps/, registered via @register, and invoked by AblationRunner before the variants loop. External CLIs for one-shot data prep are forbidden — if a step runs in response to a variant, it belongs in the preflight.
Invariants:
- Pluggable contract: every step implements the 5-method
PreflightStepABC (is_required,cache_key,is_cached,compute_and_persist,anti_leakage_invariant). - ADR-61 compliance: the computational body of
compute_and_persistSHOULD be expressed as a Hamilton DAG when the step has more than a trivial one-shot transform. Thehamilton_execadapter insrc/commun/finetune/preflight/hamilton_exec.pyemits the execution lineage as an MLflow artefact (preflight_lineage). - Cache-aware + staleness policy: every step declares a deterministic
cache_keyand anis_cachedpredicate that respects the operator'sCVN_PREFLIGHT_STALENESS_POLICY(ignore / warn / reject — default reject, 2026-04-24 decision). - Operator toggles in ftf_config (ADR-59): the framework master switch, per-step switch, staleness policy, and force-recompute toggle are all keys of
ftf_config.base_env(surfaced as Console dropdowns — PR #667). No CLI flags. - MLflow observability: the runner opens a dedicated MLflow run per preflight step invocation, tagged
purpose=preflight,step_type=<name>,crypto=<symbol>. Steps log params, metrics, and lineage under that run. - Anti-leakage invariant documented: each step returns a one-sentence description of how it avoids label / future leakage; logged at run time under
event=preflight_startso the guarantee is auditable. - Guardrail + integration test (ADR-58): every step ships with a test that exercises both cache hit and miss paths and asserts the ADR-25 fail-fast on missing dependencies.
- No silent fallback (ADR-25): a step that cannot complete raises. The runner does not hide preflight failures — the variants loop never runs against a degraded precondition.
Operator-facing contract:
- The runner invokes preflight before the variants loop. Nothing to run by hand.
- The FTF is idempotent by design: re-running a factor re-enters the preflight phase; the per-step staleness policy decides whether the artefact is reused or recomputed.
- To force a fresh preflight pass: set
CVN_PREFLIGHT_FORCE_RECOMPUTE=1in ftf_config (Console → Baseline Config). - To disable the preflight phase entirely (not recommended — the FTF will crash on any variant that needs its output): set
CVN_PREFLIGHT_ENABLED=0. - To disable a single step (e.g. opt out of FI preflight while keeping others): set
CVN_PREFLIGHT_STEP_<NAME>_ENABLED=0.
Scope — what IS and is NOT a preflight step:
| Activity | Preflight? | Where it lives |
|---|---|---|
| OOF feature-importance reference | Yes | steps/fi_reference.py (PR #663) |
| Label-quality validation (planned) | Yes | steps/label_quality.py (planned) |
| SHAP reference table (planned) | Yes | steps/shap_reference.py (planned) |
| Regime stratification table (planned) | Yes | steps/regime_split.py (planned) |
| OHLCV ingestion | No — Airflow ETL | dags/dag_ingest__*.py |
| Per-variant training | No — the variants loop | ablation_runner.py |
| Per-variant inference | No — the variants loop | ablation_runner.py |
| Report generation | No — post-variants | report_pdf.py, report_stats.py |
Alternatives rejected:
- Keep external CLIs: perpetuates operational burden, duplicates logic, violates ADR-56 / ADR-58 / ADR-59. Non-starter.
- Airflow DAG per prep step: overkill for in-process preparation that depends on the exact ftf_config + factor being run. Adds Airflow overhead (pod startup, scheduling) for code that lives in the same process as the variants loop. Also loses the direct MLflow run context the runner already owns.
- Inline prep inside
run_factor: bloats the runner, couples the variants loop to every prep step, cannot be disabled from ftf_config without touching code. Also breaks theis_cachedseparation (prep would run even on warm-cache reruns). - External notebook workflow: zero audit trail, can't be CI-checked, doesn't compose with the FTF MLflow run.
Consequences:
scripts/compute_fi_reference.pyis removed. The error message inload_fi_referencenow points at the preflight step + the two Console toggles (CVN_PREFLIGHT_ENABLED,CVN_PREFLIGHT_STEP_FI_REFERENCE_ENABLED).- Any future data-prep requirement for a new variant must land as a
PreflightStep— no shortcuts. documentation/epics/CVN-N001-EF-preflight-framework.mdholds the full design; this ADR is the enforceable shortcut.
Files: src/commun/finetune/preflight/base.py, src/commun/finetune/preflight/registry.py, src/commun/finetune/preflight/hamilton_exec.py, src/commun/finetune/preflight/steps/fi_reference.py, src/commun/finetune/preflight/steps/fi_reference_dag.py, scripts/ftf_config_ui.py (toggles).