Skip to content

0056 every pipeline change must be ftf testable a b testable by d

ADR-56 — Every Pipeline Change Must Be FTF-Testable (A/B Testable by Design)

Status: Decided (2026-04-14)

Context: The training pipeline has multiple structural changes planned (CUSUM decoupling, class balancing, HPO objective, feature cap, binary classification). Without A/B testing infrastructure, it is impossible to measure the isolated impact of each change. Historically, changes were applied monolithically, making it impossible to attribute performance improvements to specific fixes.

Decision: Every pipeline behavior change MUST be controlled by an environment variable and have a corresponding FTF ablation factor, enabling rigorous A/B testing of each change in isolation (ceteris paribus).

Invariants: - No hardcoded behavior changes: Every pipeline modification that affects model training, feature engineering, labeling, filtering, or HPO MUST be gated by a CVN_* environment variable with the current behavior as default - FTF factor required: Every new env var MUST have a corresponding AblationFactor in ablation_matrix.py with at least 2 variants (current baseline + proposed change) - Baseline preservation: The BASE_ENV in ablation_matrix.py MUST always represent the current production configuration. Changes to BASE_ENV require committee approval (ADR-52) - One variable at a time: FTF ablation MUST test ONE factor at a time (ceteris paribus). Combined changes are tested only after individual factors are validated - Statistical validation: A change is locked (via lock_winner()) only if it shows statistically significant improvement (BH-corrected p < 0.05) on Sortino or the primary trading metric - Rollback by env var: Any locked change can be reverted by changing the env var back to baseline in Helm values, without code changes

Pattern:

# In pipeline code:
if os.environ.get("CVN_MY_NEW_PARAM", "baseline_value") == "new_value":
    # New behavior
else:
    # Current behavior (baseline)

# In ablation_matrix.py:
AblationFactor(
    name="my_new_param",
    factor_type="training",
    category="model",
    description="Description of what changes.",
    env_vars={
        "baseline": {"CVN_MY_NEW_PARAM": "baseline_value"},
        "new_value": {"CVN_MY_NEW_PARAM": "new_value"},
    },
)

Current factors following this pattern:

Env var FTF Factor Variants
CVN_CALIBRATION_METHOD calibration none, isotonic, platt
CVN_TIMEFRAME timeframe 5m, 15m, 30m, 1h
CVN_TRAIN_WINDOW_MONTHS fold_size 6m, 9m, 12m, 18m
CVN_MAX_FEATURES n_features top_30, top_50, top_100, full
CVN_BINARY_CLASSIFICATION classification_mode 3class, binary_balanced, binary_precision
CVN_CLASS_BALANCING class_balancing OFF, ON
CVN_HPO_OBJECTIVE hpo_objective fbeta_buy, precision_recall_auc, f1_macro
CVN_CUSUM_THRESHOLD_H cusum_threshold 2.0, 3.0, 5.0

Planned (Sprint 1):

Env var FTF Factor Variants
CVN_CUSUM_TRAINING_MODE cusum_training_mode disabled, relaxed_1_5, legacy_3_0
CVN_HPO_OBJECTIVE=sortino_net (add to hpo_objective) + sortino_net variant

Alternatives rejected: - Feature flags in code without env vars: Not testable via FTF, not controllable via Helm - Config files instead of env vars: Breaks the single-source-of-truth (Helm ConfigMap) - Direct code changes without A/B testing: Impossible to attribute performance changes

Files: src/commun/finetune/ablation_matrix.py, infra/helm/airflow/values-prod.yaml, all pipeline files using os.environ.get()