0058 every ftf factor must have a guardrail
ADR-58 — Every FTF Factor Must Have a Guardrail¶
Status: Decided (2026-04-15)
Context: Two FTF sessions were lost (21h compute, ~€8.50) because a new factor (cusum_training_mode) was added without guardrails. The misconfiguration ran undetected for hours. Every factor introduces a parameter space that can produce unexpected behavior — without guardrails, the only detection mechanism is a human noticing something is wrong hours later.
Decision: Every FTF ablation factor MUST have an associated guardrail in the pre-flight check. When existing factors are modified (new variants, changed thresholds, renamed env vars), their guardrails MUST be reviewed and updated.
Invariants:
- New factor: Adding a factor to
ablation_matrix.pyMUST include a corresponding check insrc/commun/finetune/guardrails.py. The PR adding the factor MUST include the guardrail. No factor without guardrail. - Modified factor: Changing variants, env vars, or thresholds of an existing factor MUST trigger a review of its guardrail. The PR MUST update the guardrail if the change affects sample count, feature count, memory, or runtime.
- Guardrail behavior: Each guardrail MUST follow the "block defaults, allow explicit" principle (ADR-56). Accidental misconfiguration is blocked. Explicit FTF choices are allowed with logging.
- Guardrail test: Each guardrail MUST have at least one integration test verifying it blocks the accidental case AND allows the explicit case.
- Guardrail thresholds: Thresholds (sample count, memory, time) MUST be documented in
documentation/architecture/FTF_GUARDRAILS.mdand reviewed quarterly or after any incident.
Checklist for adding a new factor:
1. [ ] Factor defined in ablation_matrix.py (ADR-56)
2. [ ] Env var(s) with clear semantics (one meaning per var)
3. [ ] Guardrail in guardrails.py:
- [ ] Check for accidental high/low values
- [ ] Block on default, allow on explicit
- [ ] Memory estimate if factor affects data size
4. [ ] Integration test:
- [ ] Test: guardrail blocks accidental case
- [ ] Test: guardrail allows explicit FTF case
5. [ ] Documentation in architecture/FTF_GUARDRAILS.md
6. [ ] Committee review if factor introduces new risk category
Checklist for modifying an existing factor:
1. [ ] Review guardrail thresholds — still appropriate?
2. [ ] Review guardrail logic — does the check still match the factor behavior?
3. [ ] Update integration test if variants or env vars changed
4. [ ] Update architecture/FTF_GUARDRAILS.md if thresholds changed
Alternatives rejected: - Guardrails optional per factor: Leads to gaps. The incident proved that "we'll add it later" means "never". - Guardrails as warnings only: Warnings are ignored under time pressure. Blocking is the only reliable mechanism for accidental misconfigurations. - Central guardrail without per-factor checks: Too coarse. Each factor has different risk profiles (memory, time, sample count).
Files: src/commun/finetune/guardrails.py, src/commun/finetune/ablation_matrix.py, documentation/architecture/FTF_GUARDRAILS.md, tests/unit/test_guardrails.py