0056 every pipeline change must be ftf testable a b testable by d
ADR-56 — Every Pipeline Change Must Be FTF-Testable (A/B Testable by Design)¶
Status: Decided (2026-04-14)
Context: The training pipeline has multiple structural changes planned (CUSUM decoupling, class balancing, HPO objective, feature cap, binary classification). Without A/B testing infrastructure, it is impossible to measure the isolated impact of each change. Historically, changes were applied monolithically, making it impossible to attribute performance improvements to specific fixes.
Decision: Every pipeline behavior change MUST be controlled by an environment variable and have a corresponding FTF ablation factor, enabling rigorous A/B testing of each change in isolation (ceteris paribus).
Invariants:
- No hardcoded behavior changes: Every pipeline modification that affects model training, feature engineering, labeling, filtering, or HPO MUST be gated by a CVN_* environment variable with the current behavior as default
- FTF factor required: Every new env var MUST have a corresponding AblationFactor in ablation_matrix.py with at least 2 variants (current baseline + proposed change)
- Baseline preservation: The BASE_ENV in ablation_matrix.py MUST always represent the current production configuration. Changes to BASE_ENV require committee approval (ADR-52)
- One variable at a time: FTF ablation MUST test ONE factor at a time (ceteris paribus). Combined changes are tested only after individual factors are validated
- Statistical validation: A change is locked (via lock_winner()) only if it shows statistically significant improvement (BH-corrected p < 0.05) on Sortino or the primary trading metric
- Rollback by env var: Any locked change can be reverted by changing the env var back to baseline in Helm values, without code changes
Pattern:
# In pipeline code:
if os.environ.get("CVN_MY_NEW_PARAM", "baseline_value") == "new_value":
# New behavior
else:
# Current behavior (baseline)
# In ablation_matrix.py:
AblationFactor(
name="my_new_param",
factor_type="training",
category="model",
description="Description of what changes.",
env_vars={
"baseline": {"CVN_MY_NEW_PARAM": "baseline_value"},
"new_value": {"CVN_MY_NEW_PARAM": "new_value"},
},
)
Current factors following this pattern:
| Env var | FTF Factor | Variants |
|---|---|---|
CVN_CALIBRATION_METHOD |
calibration |
none, isotonic, platt |
CVN_TIMEFRAME |
timeframe |
5m, 15m, 30m, 1h |
CVN_TRAIN_WINDOW_MONTHS |
fold_size |
6m, 9m, 12m, 18m |
CVN_MAX_FEATURES |
n_features |
top_30, top_50, top_100, full |
CVN_BINARY_CLASSIFICATION |
classification_mode |
3class, binary_balanced, binary_precision |
CVN_CLASS_BALANCING |
class_balancing |
OFF, ON |
CVN_HPO_OBJECTIVE |
hpo_objective |
fbeta_buy, precision_recall_auc, f1_macro |
CVN_CUSUM_THRESHOLD_H |
cusum_threshold |
2.0, 3.0, 5.0 |
Planned (Sprint 1):
| Env var | FTF Factor | Variants |
|---|---|---|
CVN_CUSUM_TRAINING_MODE |
cusum_training_mode |
disabled, relaxed_1_5, legacy_3_0 |
CVN_HPO_OBJECTIVE=sortino_net |
(add to hpo_objective) |
+ sortino_net variant |
Alternatives rejected: - Feature flags in code without env vars: Not testable via FTF, not controllable via Helm - Config files instead of env vars: Breaks the single-source-of-truth (Helm ConfigMap) - Direct code changes without A/B testing: Impossible to attribute performance changes
Files: src/commun/finetune/ablation_matrix.py, infra/helm/airflow/values-prod.yaml, all pipeline files using os.environ.get()