0056 every pipeline change must be ftf testable a b testable by d

ADR-56 — Every Pipeline Change Must Be FTF-Testable (A/B Testable by Design)¶

Status: Decided (2026-04-14)

Context: The training pipeline has multiple structural changes planned (CUSUM decoupling, class balancing, HPO objective, feature cap, binary classification). Without A/B testing infrastructure, it is impossible to measure the isolated impact of each change. Historically, changes were applied monolithically, making it impossible to attribute performance improvements to specific fixes.

Decision: Every pipeline behavior change MUST be controlled by an environment variable and have a corresponding FTF ablation factor, enabling rigorous A/B testing of each change in isolation (ceteris paribus).

Invariants: - No hardcoded behavior changes: Every pipeline modification that affects model training, feature engineering, labeling, filtering, or HPO MUST be gated by a CVN_* environment variable with the current behavior as default - FTF factor required: Every new env var MUST have a corresponding AblationFactor in ablation_matrix.py with at least 2 variants (current baseline + proposed change) - Baseline preservation: The BASE_ENV in ablation_matrix.py MUST always represent the current production configuration. Changes to BASE_ENV require committee approval (ADR-52) - One variable at a time: FTF ablation MUST test ONE factor at a time (ceteris paribus). Combined changes are tested only after individual factors are validated - Statistical validation: A change is locked (via lock_winner()) only if it shows statistically significant improvement (BH-corrected p < 0.05) on Sortino or the primary trading metric - Rollback by env var: Any locked change can be reverted by changing the env var back to baseline in Helm values, without code changes

Pattern:

# In pipeline code:
if os.environ.get("CVN_MY_NEW_PARAM", "baseline_value") == "new_value":
    # New behavior
else:
    # Current behavior (baseline)

# In ablation_matrix.py:
AblationFactor(
    name="my_new_param",
    factor_type="training",
    category="model",
    description="Description of what changes.",
    env_vars={
        "baseline": {"CVN_MY_NEW_PARAM": "baseline_value"},
        "new_value": {"CVN_MY_NEW_PARAM": "new_value"},
    },
)

Current factors following this pattern:

Env var	FTF Factor	Variants
`CVN_CALIBRATION_METHOD`	`calibration`	none, isotonic, platt
`CVN_TIMEFRAME`	`timeframe`	5m, 15m, 30m, 1h
`CVN_TRAIN_WINDOW_MONTHS`	`fold_size`	6m, 9m, 12m, 18m
`CVN_MAX_FEATURES`	`n_features`	top_30, top_50, top_100, full
`CVN_BINARY_CLASSIFICATION`	`classification_mode`	3class, binary_balanced, binary_precision
`CVN_CLASS_BALANCING`	`class_balancing`	OFF, ON
`CVN_HPO_OBJECTIVE`	`hpo_objective`	fbeta_buy, precision_recall_auc, f1_macro
`CVN_CUSUM_THRESHOLD_H`	`cusum_threshold`	2.0, 3.0, 5.0

Planned (Sprint 1):

Env var	FTF Factor	Variants
`CVN_CUSUM_TRAINING_MODE`	`cusum_training_mode`	disabled, relaxed_1_5, legacy_3_0
`CVN_HPO_OBJECTIVE=sortino_net`	(add to `hpo_objective`)	+ sortino_net variant

Alternatives rejected: - Feature flags in code without env vars: Not testable via FTF, not controllable via Helm - Config files instead of env vars: Breaks the single-source-of-truth (Helm ConfigMap) - Direct code changes without A/B testing: Impossible to attribute performance changes

Files: src/commun/finetune/ablation_matrix.py, infra/helm/airflow/values-prod.yaml, all pipeline files using os.environ.get()