Skip to content

ADR-65 — Airflow DAG Params Are Run-Level Context Only; Config Lives in ftf_config

Status: Decided (2026-04-24)

Context: On 2026-04-24 a 3-hour FTF ablation on defi_top5 was wasted because it ran with the deprecated PTE envelope ATR1.5_3.0_H5 instead of the operator's current policy ATR0.5_1.5_H4. Root cause: dags/dag_finetune__pte.py declared its own pte Param with a hardcoded default of "ATR1.5_3.0_H5", and the downstream AblationRunner.__init__ carried the same hardcoded default one layer deeper. The operator's intended PTE was live in the Console (ftf_config.base_env), but the DAG never read ftf_config for the PTE — the Python default silently won at every trigger that didn't manually override the field.

This is a textbook dual-source-of-truth bug: the same decision (which PTE to train on) was representable in both Python code and the PostgreSQL configuration row, with no rule governing precedence. ADR-59 already declared ftf_config the source of truth for pipeline parameters, but it did not specify the boundary between run-level context (legitimate DAG params) and versioned configuration (what must live in ftf_config). That gap let DAG authors keep adding "convenience" defaults that accrue into a shadow configuration layer — effectively a second Console, written in Python, edited by patches, versioned by git, reviewed as code. The moment the two drift, the Python one wins and the operator's intent is silently ignored.

The no-discipline rule (see memory feedback_no_discipline_workflows.md) forbids any workflow that depends on the operator remembering to override a Python default. An ablation run is a costly action; its configuration surface must be enforced, not conventional.

Decision: Airflow DAG Params are limited to run-level context — the subset of inputs that describe which instance of the experiment is being triggered. Anything that does not change between two re-runs of the same experiment belongs in ftf_config, never in a DAG Param default nor in a Python __init__ default of any class the DAG instantiates.

Allowed DAG Params (the conventional run-level set):

  • factor — which factor / phase to ablate
  • crypto_group — universe for this run
  • phase — protocol label (manual, 0, 1a, 1b…)
  • power_mode — standard / deep
  • confirm_long_run — safety flag for long jobs

DAGs that are not FTF-shaped may add their own run-level equivalents (e.g. a discovery DAG might legitimately expose group + crypto). The test is operational, not syntactic: if re-running the same logical experiment with the same run-level knobs produces the same result for a given value of X, then X is configuration and belongs in ftf_config.

Invariants:

  • Parse-time load — every DAG that consumes configuration MUST load base_env at module parse time (via commun.finetune.ablation_matrix._load_base_env() or equivalent) and materialise the required keys as module-level constants. The DAG body uses those constants; it does not re-read base_env per task unless the DAG is explicitly about configuration editing.
  • Fail-loud on missing keys (ADR-25) — if a required config key is absent from base_env, the DAG raises ValueError at parse time with the key name. No silent fallback to a Python default, no "default of defaults" in ftf_baseline.json that would re-introduce the dual-source-of-truth problem. The JSON baseline is a local-dev convenience only; production pods must hit PostgreSQL.
  • No hardcoded defaults anywhere downstream — runner classes (e.g. AblationRunner, future analogues) MUST declare configuration parameters as required (strategy: str, not strategy: str = "ATR..."). The DAG passes the value explicitly, sourced from base_env; the runner validates and uses it.
  • Deprecation guardrails are hard-fail — when a specific configuration value becomes deprecated (e.g. the PTE ATR1.5_3.0_H5), the runner raises on that value. The guardrail lives in code next to the usage site (not in a wiki), and a unit test pins the behaviour so accidental reintroduction fails CI.
  • Console as the only write path — the UI the operator uses to change a config value is the Streamlit Console (scripts/ftf_config_ui.py). Every config key backed by a dropdown in PARAM_OPTIONS gets Console support automatically. Adding a new config key = add a dropdown entry; the free-text fallback is a convenience, not a long-term home for structured enums.
  • Migration compatibility — when moving a former Param to ftf_config, the PR must populate the new key in PostgreSQL (via a Console update documented in the PR) before or at the time the DAG change merges. No PR may merge that leaves a production DAG failing to parse because its newly-required key is absent.

Operator-facing contract:

  • Triggering a DAG never requires the operator to re-enter configuration values that live in ftf_config. The trigger form shows only run-level knobs.
  • Changing a configuration value is done in one place: Console → Baseline Config. The next DAG parse picks it up.
  • If the operator needs to run the same experiment with a different PTE / fold count / history window, they edit the Console, not the DAG trigger form. One-off overrides require an ADR exception documented in the PR body.

Scope — what IS and is NOT a run-level param:

Value Run-level? Where it lives
Which factor to ablate Yes DAG Param factor
Crypto group / universe for this run Yes DAG Param crypto_group
Protocol phase label Yes DAG Param phase
Power mode (standard/deep) Yes DAG Param power_mode
Safety flag for long jobs Yes DAG Param confirm_long_run
Which PTE envelope to train on No ftf_config.base_env.CVN_DEFAULT_PTE
Number of CV folds No ftf_config.base_env.CVN_DEFAULT_N_FOLDS
HPO trials per variant No ftf_config.base_env.CVN_DEFAULT_N_TRIALS
Total history in months No ftf_config.base_env.CVN_DEFAULT_HISTORY_MONTHS
Horizon grid, SL/TP grid (discovery DAG) No ftf_config.base_env.CVN_DEFAULT_*
Timeframe No ftf_config.base_env.CVN_TIMEFRAME (existing)

Alternatives rejected:

  • Hybrid defaults (Python default plus ftf_config lookup, latest-wins): keeps the dual source of truth; the moment the two differ a bug is born. The exact failure mode that triggered this ADR.
  • ftf_baseline.json as the authoritative default for production: JSON file is edited via PRs, not the Console; every config change becomes a code deploy. The whole point of ADR-59 is to make configuration a same-day, Console-driven change. The JSON file survives only as a local-dev fallback when PostgreSQL is unreachable.
  • Python defaults marked deprecated but kept: behavior would look clean but cost one slip-up per newcomer. ADR-25 forbids silent fallbacks; a default with a deprecation comment is a silent fallback by another name.

Implementation scope and follow-up:

This ADR is declared in PR #674 alongside the P0 enforcement for dag_finetune__pte.py + AblationRunner (issue #673). Remaining DAGs that violate the rule (most visibly launch__discovery.py with horizons / sl_range / tp_range / timeframe as params) are tracked as P1 follow-up. The ADR is written narrow — the boundary between run-level and config — so that P1 enforcement is mechanical: for each DAG, audit every Param against the boundary test, move the violators, ship. No further design work required.

Consequences:

  • Adding a config option becomes slightly heavier: Console dropdown + baseline JSON fallback + DAG loader key + a sentence in the PR body explaining the key's purpose. This is the intended tax — shipping a silent Python default is now impossible.
  • Operator workflow becomes predictable: if you want to change how future runs behave, you edit the Console. If you want to change what this run targets, you fill the DAG trigger form. No intermediate state where "the default I set isn't really the default".
  • Local development requires config/ftf_baseline.json to carry a superset of the keys the DAGs consume, so tests and local runs don't hit PostgreSQL. CI enforces consistency between the baseline JSON and the runtime-required keys.

References:

  • Parent decision: ADR-59 (all pipeline parameters in PostgreSQL, editable via Console only).
  • No-discipline principle: memory feedback_no_discipline_workflows.md (rejected: "discipline" / conventional fixes).
  • Incident report: issue #673 body (3h of compute wasted on deprecated PTE via silent Python default).
  • Related: ADR-25 (no silent fallbacks), ADR-56 (every pipeline change FTF-testable), ADR-58 (every FTF factor must have a guardrail).