Skip to content

0064 ftf preflight is a first class phase

ADR-64 — FTF Preflight is a First-Class Phase (No External Data-Prep Scripts)

Status: Decided (2026-04-24) · Amended 2026-04-25 by ADR-67

Amendment summary (2026-04-25, ADR-67): preflight is no longer the unique site of FI compute. The pluggable feature-selection framework (commun/finetune/feature_selection/) hosts the ranking DAGs (variance, covariance, fi, shap, future selectors). Preflight remains one of two scopes (global) — the per-fold scope is invoked from inside the variants loop by regime_trainer via the same dispatcher. The principles below (preflight is first-class, hot-path stays untouched, idempotent + cache-aware, fail-fast on missing artefacts) still hold; the FI implementation surface moved from steps/fi_reference.py (now a thin wrapper) to feature_selection/dispatch.get_selection(method='fi', scope='global', ...). See ADR-67 for the framework design.

Context: Before this ADR, the FTF (fine-tune framework) runner expected certain derived artefacts to exist in the cache before any fi_* variant could run — notably the OOF feature-importance reference. The write path was scripts/compute_fi_reference.py, a standalone CLI the operator had to remember to invoke manually before every FTF sweep that needed it. The consequences piled up:

  • Operational footgun: forget the script, or run it on the wrong (symbol, strategy, timeframe), and the FTF crashes mid-sweep with an ADR-25 fail-fast pointing at the CLI — hours of wall-clock wasted.
  • Stale artefacts: no staleness policy. A 6-month-old FI reference would silently be used against a fresh training window.
  • Orphan MLflow runs: the CLI had no concept of the active FTF run; lineage was disconnected. FI decisions couldn't be traced back to the run that consumed them.
  • Not FTF-toggleable (violates ADR-56): the CLI was not an ablation factor, had no env-var switch, no A/B path.
  • No guardrail (violates ADR-58): no integration test for the cache-write → cache-read contract; the CLI was tested in isolation only.
  • Duplicated computation logic: whoever maintains load_fi_reference also needs to keep the CLI in sync. Two sources of truth for the FI contract.
  • ADR-59 violation: operator toggles (staleness, enable/disable, force-recompute) lived only in CLI flags, not in ftf_config.

Decision: Data-prep steps whose output is a prerequisite for one or more FTF variants MUST be implemented as PreflightStep subclasses under src/commun/finetune/preflight/steps/, registered via @register, and invoked by AblationRunner before the variants loop. External CLIs for one-shot data prep are forbidden — if a step runs in response to a variant, it belongs in the preflight.

Invariants:

  • Pluggable contract: every step implements the 5-method PreflightStep ABC (is_required, cache_key, is_cached, compute_and_persist, anti_leakage_invariant).
  • ADR-61 compliance: the computational body of compute_and_persist SHOULD be expressed as a Hamilton DAG when the step has more than a trivial one-shot transform. The hamilton_exec adapter in src/commun/finetune/preflight/hamilton_exec.py emits the execution lineage as an MLflow artefact (preflight_lineage).
  • Cache-aware + staleness policy: every step declares a deterministic cache_key and an is_cached predicate that respects the operator's CVN_PREFLIGHT_STALENESS_POLICY (ignore / warn / reject — default reject, 2026-04-24 decision).
  • Operator toggles in ftf_config (ADR-59): the framework master switch, per-step switch, staleness policy, and force-recompute toggle are all keys of ftf_config.base_env (surfaced as Console dropdowns — PR #667). No CLI flags.
  • MLflow observability: the runner opens a dedicated MLflow run per preflight step invocation, tagged purpose=preflight, step_type=<name>, crypto=<symbol>. Steps log params, metrics, and lineage under that run.
  • Anti-leakage invariant documented: each step returns a one-sentence description of how it avoids label / future leakage; logged at run time under event=preflight_start so the guarantee is auditable.
  • Guardrail + integration test (ADR-58): every step ships with a test that exercises both cache hit and miss paths and asserts the ADR-25 fail-fast on missing dependencies.
  • No silent fallback (ADR-25): a step that cannot complete raises. The runner does not hide preflight failures — the variants loop never runs against a degraded precondition.

Operator-facing contract:

  • The runner invokes preflight before the variants loop. Nothing to run by hand.
  • The FTF is idempotent by design: re-running a factor re-enters the preflight phase; the per-step staleness policy decides whether the artefact is reused or recomputed.
  • To force a fresh preflight pass: set CVN_PREFLIGHT_FORCE_RECOMPUTE=1 in ftf_config (Console → Baseline Config).
  • To disable the preflight phase entirely (not recommended — the FTF will crash on any variant that needs its output): set CVN_PREFLIGHT_ENABLED=0.
  • To disable a single step (e.g. opt out of FI preflight while keeping others): set CVN_PREFLIGHT_STEP_<NAME>_ENABLED=0.

Scope — what IS and is NOT a preflight step:

Activity Preflight? Where it lives
OOF feature-importance reference Yes steps/fi_reference.py (PR #663)
Label-quality validation (planned) Yes steps/label_quality.py (planned)
SHAP reference table (planned) Yes steps/shap_reference.py (planned)
Regime stratification table (planned) Yes steps/regime_split.py (planned)
OHLCV ingestion No — Airflow ETL dags/dag_ingest__*.py
Per-variant training No — the variants loop ablation_runner.py
Per-variant inference No — the variants loop ablation_runner.py
Report generation No — post-variants report_pdf.py, report_stats.py

Alternatives rejected:

  • Keep external CLIs: perpetuates operational burden, duplicates logic, violates ADR-56 / ADR-58 / ADR-59. Non-starter.
  • Airflow DAG per prep step: overkill for in-process preparation that depends on the exact ftf_config + factor being run. Adds Airflow overhead (pod startup, scheduling) for code that lives in the same process as the variants loop. Also loses the direct MLflow run context the runner already owns.
  • Inline prep inside run_factor: bloats the runner, couples the variants loop to every prep step, cannot be disabled from ftf_config without touching code. Also breaks the is_cached separation (prep would run even on warm-cache reruns).
  • External notebook workflow: zero audit trail, can't be CI-checked, doesn't compose with the FTF MLflow run.

Consequences:

  • scripts/compute_fi_reference.py is removed. The error message in load_fi_reference now points at the preflight step + the two Console toggles (CVN_PREFLIGHT_ENABLED, CVN_PREFLIGHT_STEP_FI_REFERENCE_ENABLED).
  • Any future data-prep requirement for a new variant must land as a PreflightStep — no shortcuts.
  • documentation/epics/CVN-N001-EF-preflight-framework.md holds the full design; this ADR is the enforceable shortcut.

Files: src/commun/finetune/preflight/base.py, src/commun/finetune/preflight/registry.py, src/commun/finetune/preflight/hamilton_exec.py, src/commun/finetune/preflight/steps/fi_reference.py, src/commun/finetune/preflight/steps/fi_reference_dag.py, scripts/ftf_config_ui.py (toggles).