ADR-0083 — Test taxonomy + gate hierarchy¶
Status: accepted (committee plan_review session 53d76f0f PASSED 2026-05-05 ; ratified together with the companion strategy doc per the same gate ; operator decisions on wp#116 : A=b, B=b, C=b, D=b, E=c, F=a)
Date: 2026-05-05
Introduced by: CVN-N015-EA-S01 / GH issue #836 / OP wp#116
Supersedes: none — first formal ADR on test taxonomy. Companion document : documentation/strategy/CVN-N015-test-strategy.md.
Context¶
CVNTrade ships ML-based crypto trading on a single-operator setup. Test discipline today is informal :
pytestruns the unit + integration suites, with no formal taxonomy- A few smoke checks live ad-hoc inside DAG launchers (#592 architectural review surfaced this)
- Data quality is mostly visual via Grafana ; no Great Expectations suite codifies the contract
- ML behaviour (drift / fairness / perturbation robustness) has no automated coverage outside the FTF sweep
- "Integration" and "system-E2E" are used interchangeably — Story scoping arguments routinely re-derive them
CVN-N015 (#592 the umbrella Need ; 9 Epics EA-EI) formalises the validation pyramid. The forcing function for THIS ADR : without a canonical taxonomy at the head of the umbrella, every Epic risks re-deriving conflicting definitions ("contract" in ED meaning schema, "contract" in EH meaning Pact-style consumer-driven). Operator-driven decisions on cadence + gate hierarchy + ownership lock the scope BEFORE downstream Stories spend cycles re-debating it.
The 2026-05-05 operator decision session (wp#116 comment 701) locked 6 axes : 16 → 12 types (drop chaos / exploratory / load / regression) ; PR-only fast tier ; tiered gates ; explicit canonical performance budgets ; hybrid UAT (Markdown backend + Playwright Console) ; single DRI honest re : single-operator org.
Decision¶
Adopt the 12-test-type taxonomy + tiered gate hierarchy + canonical performance budgets defined in documentation/strategy/CVN-N015-test-strategy.md as the canonical reference for testing in CVNTrade.
The 12 types (in scope) :
- unit — pure-function logic, no I/O
- property — Hypothesis-style invariants
- contract — schema + boundary contracts
- cache — L1/L2/L3 cache key + invalidation correctness
- integration — multi-component flows in-process
- DAG smoke — Airflow
dag.test()per DAG - data quality — Great Expectations OSS suites on production-like data
- ML behaviour — Evidently OSS + Giskard on candidate models
- performance — p95/p99 budgets per code path
- system-E2E — paper-kernel + kill-switch + risk gates with Testcontainers
- UAT — operator-driven, hybrid Markdown + Playwright
- post-deploy smoke — k8s liveness + 1-prediction call + Grafana panel populated
Out-of-scope (explicit, with rationale) : chaos (no SRE team), exploratory (informal already), load (throughput / saturation under sustained concurrent traffic — distinct from performance p95/p99 per-call latency ; deferred in v1, re-introduced when workload regime warrants ; see strategy doc §9 row load for the conditions), regression (= unit with regression marker). See strategy doc §9.
Cadence (tiered) : fast tier on PRs touching code only ; medium tier on PRs that touch the relevant subsystem ; slow tier nightly OR per Story closure OR pre-deploy. No nightly safety net for fast tier (main protected by PR gates).
Gate hierarchy (tiered, NOT strict) : critical (unit + property + contract + integration + DAG smoke) blocks merge ; data quality blocks deploy ; ML behaviour + performance block LOCK ; UAT blocks Story closure ; post-deploy smoke triggers rollback. See strategy doc §4 matrix.
Performance budgets (canonical) : inference_p99 < 200ms, enrichment_p95 < 50ms, FTF cell p95 < 60s, train DAG run < 4h. See strategy doc §5 full table.
UAT format (hybrid) : Markdown scenarios for backend flows + Playwright recordings for Console UI flows. Operator-driven, NOT CI-automated.
Test-type ownership : single DRI @dococeven for all 12 types (honest re : single-operator org). See strategy doc §7 — table is the natural place to introduce per-type ownership when team grows.
Invariants¶
- Single source of truth :
documentation/strategy/CVN-N015-test-strategy.mdis the SSoT (per ADR-77 docs SSoT). Discrepancies between this ADR's invariant list and the strategy doc → strategy doc wins ; this ADR is the binding contract on the rules, the strategy doc is the operational reference. - No silent re-derivation : downstream Stories (S02-S09 and beyond) MUST reference the strategy doc to scope their own deliverables. A Story that re-derives a contradictory taxonomy MUST either amend this ADR (separate Story) OR scope down. CR catches drift via the docs site strict-mode check.
- 12 types, no growth without ADR amendment : adding a 13th type requires amending this ADR (raise an amendment Story). Removing a type requires the same. The 12-list is closed by design.
- Tiered gates, not strict : the gate hierarchy table (strategy doc §4) is the canonical bind. Strict-mode gating (every test blocks every transition) is rejected as anti-pattern.
- Canonical performance budgets : the budget table (strategy doc §5) is the bind. New code paths get a budget entry before they merge ; budget changes require a budget Story (NOT a one-off PR adjustment).
- UAT is operator-driven, not CI :
tests/uat/scenarios/*.md+tests/uat/playwright/are run by the operator before Story closure. CI does NOT execute them. - Test-type ownership row updated when team grows : when the second engineer joins, the strategy doc §7 ownership table is updated in the same PR that onboards them. Single-DRI is the honest current state, not a long-term invariant.
- Out-of-scope types stay out : chaos / exploratory / load / regression do NOT get reintroduced informally. Reintroducing requires amending this ADR.
Alternatives rejected¶
- All 16 types in scope (status quo industry-default) — would re-introduce theatre (chaos with no SRE team to operate, UAT-as-CI-attempt for a single operator, load testing as a duplicate of performance). Operator decision A=b explicitly trims to the realistic 12.
- Strict gate hierarchy (every test blocks its corresponding state transition) — creates the test-induced traffic jam : a 4h FTF sweep cannot block every PR merge. Operator decision C=b explicitly chose tiered.
- No canonical budgets (defer to per-Story ad-hoc) — leads to the "performance budget evaporates as soon as it's measured" anti-pattern. Operator decision D=b explicitly locks them now even if approximate ; refine via dedicated budget Story when nightly drift fires.
- Drop UAT entirely (consistent with single-operator scope) — operator decision E=c explicitly KEEPS UAT, hybrid format. The Markdown scenarios serve as the operational documentation of what was validated ; Playwright catches Console UI regressions that no other type covers.
- Per-type DRI stubs (backup person + escalation path stubs even if all stubs point at @dococeven) — creates ownership debt that doesn't reflect reality and adds noise to OP / runbooks. Operator decision F=a explicitly chose honest single-DRI.
- Define test types only in the strategy doc, no ADR — would mean future amendments are doc-PR-only with no ADR-58-style invariant gating ; a Story could redefine a type silently. The ADR pairing locks the invariants ; the strategy doc holds the operational matrix.
Consequences¶
Positive :
- Locks scoping for CVN-N015 Stories EA-EI — every downstream Story knows its test-type bucket without re-debating
- Closes the "system vs integration" debate — strategy doc §10 glossary is canonical
- Honest single-operator scope — drops 4 types of theatre + accepts single-DRI ; reduces process debt
- Performance budgets prevent regression-in-budget — every new code path gets a budget entry, drift visible in nightly check
- Tiered gates prevent test-traffic-jam — costly tests don't block cheap transitions
Negative :
- One more ADR to track (now 83 total, +1 to CLAUDE.md) — minor
- Performance budgets are first-pass approximations — early nightly drift may produce false-positive churn until the canonical numbers settle. Mitigated : refine via budget Story on signal, not pre-emptively.
- Hybrid UAT (Markdown + Playwright) requires both formats maintained — Markdown for backend, Playwright for Console. Maintenance cost when Console UI redesigns. Mitigated : Playwright recordings re-recorded on Console UI changes (1-time per redesign).
- Out-of-scope types may need re-introduction later — chaos / load especially when team grows. Re-introducing = amending this ADR (1 Story).
Neutral :
- CR and committee enforce ADR-58 + ADR-77 + ADR-82 to keep this strategy alive — the strategy doc lives under MkDocs strict mode (per ADR-77), so broken cross-references surface at build time. The companion ADR's invariant 1 ("strategy doc is SSoT") locks the bind.
pytest -m regressionmarker convention codified in EA-S02 — operational, NOT an invariant of this ADR. Belongs to the fixture/factory Story.
Rollback¶
Soft rollback : drop the "MUST" requirement on downstream Stories referencing the strategy doc → "SHOULD" with operator approval. Strategy doc + ADR stay as documentation but do not gate Story scoping.
Hard rollback : delete this ADR + strategy doc. CVN-N015 Stories revert to ad-hoc taxonomy per Epic. The 12 types stay implicit (every test currently exists or doesn't ; no code change), only the formalism evaporates.
The performance budget table can be rolled back independently : delete the §5 row + the corresponding performance test markers. Existing tests stay green ; budgets become advisory.
References¶
- Parent need :
CVN-N015(industrial test infrastructure, GH #592) - Story that introduced this :
CVN-N015-EA-S01(test strategy definition, GH #836, OP wp#116) - Strategy doc (companion) :
documentation/strategy/CVN-N015-test-strategy.md - Operator decisions : OP wp#116 comment 701 (2026-05-05)
- Related ADRs : ADR-04 (cache contracts) ; ADR-14 (ML behaviour scope) ; ADR-23 (contract test scope) ; ADR-58 (FTF unit-test contract) ; ADR-77 (docs SSoT, this strategy lives under it) ; ADR-79 (FTF gate semantics referenced by ML behaviour row) ; ADR-81 (8 state transitions aligned with gate hierarchy) ; ADR-82 (committee log for this strategy_review)
- Downstream Stories that reference this : CVN-N015-EA-S02 onwards (S02 is the next pickup ; documented in EA Epic dossier)