ADR-0086 — CI tier promotion gate : any-failure-blocks for foundation Epic¶
Status: accepted (committee plan_review session 94aa2881 PASSED 2026-05-06 ; ratified by the same gate as the S03 architecture dossier ; operator decision C on wp#118)
Date: 2026-05-06
Introduced by: CVN-N015-EA-S03 / GH issue #838 / OP wp#118
Companion document: documentation/reviews/2026-05-06-cvn-n015-ea-s03-architecture-plan.md §1 row C + §4
Context¶
The CVN-N015 strategy (S01 §3 cadence matrix) defines 3 CI tiers : fast (unit + property + contract, p95 ≤ 2 min), medium / integration (cache + integration + dag_smoke, p95 ≤ 10 min), nightly (data_quality + performance + system_e2e, p95 ≤ 30 min — populated by Epics EB-EI).
Each tier promotes to the next. The semantics of that promotion gate matter : - strict (any-failure-blocks) : a single failing test blocks merge / deploy / LOCK / Story closure ; no thresholds, no exceptions - threshold-based : the tier passes if N % of tests pass (e.g., 95 %) ; flakes don't block, but the threshold is a soft signal that quietly drifts
Threshold-based gates are tempting for stochastic suites (ML behaviour, system-E2E with real-world non-determinism). But for the foundation Epic — unit / property / contract / cache / integration / DAG smoke — the suite is deterministic by construction. A failing test means a real bug ; a flake under -n auto means a fixture isolation bug (per ADR-0085's discipline rule). Threshold-based gates introduce a "is it flake or regression" ambiguity that quietly drifts into "the suite is yellow but we ship anyway" — exactly the discipline drift the strategy doc was created to prevent.
Decision¶
Foundation Epic CI tiers use any-failure-blocks semantics. A failure in any test in the fast tier blocks merge, blocks the medium tier from running on the same SHA. A failure in the medium tier blocks the nightly tier. Strict — no thresholds, no "5 % failure tolerated" exceptions in the foundation Epic.
Real flakes are handled out-of-band by the F5 flaky-test detector (S02 F5) :
- A test that flips PASSED/FAILED on the same SHA on rerun is auto-flagged
- A test with > 5 % flake rate over the last 7 days gets an automatic GH issue
- Infra-flakes (image pull failure, network timeout) are caught by pytest-rerunfailures config (1 retry at the workflow level) — separate from test-code flakes
- Flakes get a dedicated bug fix path, NOT a graceful gate
Threshold-based gates have their place in OTHER Epics :
- ML behaviour (Epic EE) : drift detection inherently noisy — a threshold-based gate is appropriate
- system-E2E (Epic EG) : real-world non-determinism (network jitter, broker spread variance) — threshold-based with an explicit @pytest.mark.allow_threshold marker is appropriate
Each Epic picks its own promotion semantics in its own architecture Story. This ADR locks only the foundation Epic at strict.
xdist parallelism contract : every workflow runs pytest -n auto. A failure under -n auto that doesn't reproduce under -n 1 is a fixture isolation bug per ADR-0085 — gets reported AND blocks the merge until fixed.
Consequences¶
Positive :
- Strict gates make every failure load-bearing — no "yellow suite, ship anyway" drift
- The flaky-test detector F5 has a clear single responsibility (catch real flakes, file issues) without competing with the gate semantics
- Sets the right pattern for Epic EE / EG to adopt threshold-based intentionally, with explicit allow_threshold markers — no sneaking in
- Operator decisions remain interpretable : a green CI = real signal, a red CI = real bug
Negative / risks :
- Strict + transient infra failure (e.g., GitHub Actions cache corruption, ephemeral DNS issue) could block a merge unnecessarily ; mitigated by pytest-rerunfailures 1-retry at the workflow level for infra-flakes only
- Single-operator means no second pair of eyes if a transient failure is misclassified as a real bug ; mitigated by F5 auto-issue + the operator's ability to override with explicit [skip-strict] PR label (NOT in scope of this ADR — would be an amendment Story if the override mechanism becomes necessary)
Cross-references :
- S01 strategy doc §3 cadence matrix + §4 gate hierarchy table
- S03 architecture dossier §1 row C + §4 CI tier mapping
- S02 F5 flaky-test detector (the out-of-band flake handling)
- ADR-0084 foundation test stack pick (pytest-rerunfailures for infra-flakes)
- ADR-0085 fixture scope discipline (the -n auto failure rule that catches isolation bugs as bugs, not flakes)
- ADR-77 (MkDocs SSoT)