Skip to content

ADR-0083 — Test taxonomy + gate hierarchy

Status: accepted (committee plan_review session 53d76f0f PASSED 2026-05-05 ; ratified together with the companion strategy doc per the same gate ; operator decisions on wp#116 : A=b, B=b, C=b, D=b, E=c, F=a) Date: 2026-05-05 Introduced by: CVN-N015-EA-S01 / GH issue #836 / OP wp#116 Supersedes: none — first formal ADR on test taxonomy. Companion document : documentation/strategy/CVN-N015-test-strategy.md.


Context

CVNTrade ships ML-based crypto trading on a single-operator setup. Test discipline today is informal :

  • pytest runs the unit + integration suites, with no formal taxonomy
  • A few smoke checks live ad-hoc inside DAG launchers (#592 architectural review surfaced this)
  • Data quality is mostly visual via Grafana ; no Great Expectations suite codifies the contract
  • ML behaviour (drift / fairness / perturbation robustness) has no automated coverage outside the FTF sweep
  • "Integration" and "system-E2E" are used interchangeably — Story scoping arguments routinely re-derive them

CVN-N015 (#592 the umbrella Need ; 9 Epics EA-EI) formalises the validation pyramid. The forcing function for THIS ADR : without a canonical taxonomy at the head of the umbrella, every Epic risks re-deriving conflicting definitions ("contract" in ED meaning schema, "contract" in EH meaning Pact-style consumer-driven). Operator-driven decisions on cadence + gate hierarchy + ownership lock the scope BEFORE downstream Stories spend cycles re-debating it.

The 2026-05-05 operator decision session (wp#116 comment 701) locked 6 axes : 16 → 12 types (drop chaos / exploratory / load / regression) ; PR-only fast tier ; tiered gates ; explicit canonical performance budgets ; hybrid UAT (Markdown backend + Playwright Console) ; single DRI honest re : single-operator org.


Decision

Adopt the 12-test-type taxonomy + tiered gate hierarchy + canonical performance budgets defined in documentation/strategy/CVN-N015-test-strategy.md as the canonical reference for testing in CVNTrade.

The 12 types (in scope) :

  1. unit — pure-function logic, no I/O
  2. property — Hypothesis-style invariants
  3. contract — schema + boundary contracts
  4. cache — L1/L2/L3 cache key + invalidation correctness
  5. integration — multi-component flows in-process
  6. DAG smoke — Airflow dag.test() per DAG
  7. data quality — Great Expectations OSS suites on production-like data
  8. ML behaviour — Evidently OSS + Giskard on candidate models
  9. performance — p95/p99 budgets per code path
  10. system-E2E — paper-kernel + kill-switch + risk gates with Testcontainers
  11. UAT — operator-driven, hybrid Markdown + Playwright
  12. post-deploy smoke — k8s liveness + 1-prediction call + Grafana panel populated

Out-of-scope (explicit, with rationale) : chaos (no SRE team), exploratory (informal already), load (throughput / saturation under sustained concurrent traffic — distinct from performance p95/p99 per-call latency ; deferred in v1, re-introduced when workload regime warrants ; see strategy doc §9 row load for the conditions), regression (= unit with regression marker). See strategy doc §9.

Cadence (tiered) : fast tier on PRs touching code only ; medium tier on PRs that touch the relevant subsystem ; slow tier nightly OR per Story closure OR pre-deploy. No nightly safety net for fast tier (main protected by PR gates).

Gate hierarchy (tiered, NOT strict) : critical (unit + property + contract + integration + DAG smoke) blocks merge ; data quality blocks deploy ; ML behaviour + performance block LOCK ; UAT blocks Story closure ; post-deploy smoke triggers rollback. See strategy doc §4 matrix.

Performance budgets (canonical) : inference_p99 < 200ms, enrichment_p95 < 50ms, FTF cell p95 < 60s, train DAG run < 4h. See strategy doc §5 full table.

UAT format (hybrid) : Markdown scenarios for backend flows + Playwright recordings for Console UI flows. Operator-driven, NOT CI-automated.

Test-type ownership : single DRI @dococeven for all 12 types (honest re : single-operator org). See strategy doc §7 — table is the natural place to introduce per-type ownership when team grows.


Invariants

  • Single source of truth : documentation/strategy/CVN-N015-test-strategy.md is the SSoT (per ADR-77 docs SSoT). Discrepancies between this ADR's invariant list and the strategy doc → strategy doc wins ; this ADR is the binding contract on the rules, the strategy doc is the operational reference.
  • No silent re-derivation : downstream Stories (S02-S09 and beyond) MUST reference the strategy doc to scope their own deliverables. A Story that re-derives a contradictory taxonomy MUST either amend this ADR (separate Story) OR scope down. CR catches drift via the docs site strict-mode check.
  • 12 types, no growth without ADR amendment : adding a 13th type requires amending this ADR (raise an amendment Story). Removing a type requires the same. The 12-list is closed by design.
  • Tiered gates, not strict : the gate hierarchy table (strategy doc §4) is the canonical bind. Strict-mode gating (every test blocks every transition) is rejected as anti-pattern.
  • Canonical performance budgets : the budget table (strategy doc §5) is the bind. New code paths get a budget entry before they merge ; budget changes require a budget Story (NOT a one-off PR adjustment).
  • UAT is operator-driven, not CI : tests/uat/scenarios/*.md + tests/uat/playwright/ are run by the operator before Story closure. CI does NOT execute them.
  • Test-type ownership row updated when team grows : when the second engineer joins, the strategy doc §7 ownership table is updated in the same PR that onboards them. Single-DRI is the honest current state, not a long-term invariant.
  • Out-of-scope types stay out : chaos / exploratory / load / regression do NOT get reintroduced informally. Reintroducing requires amending this ADR.

Alternatives rejected

  • All 16 types in scope (status quo industry-default) — would re-introduce theatre (chaos with no SRE team to operate, UAT-as-CI-attempt for a single operator, load testing as a duplicate of performance). Operator decision A=b explicitly trims to the realistic 12.
  • Strict gate hierarchy (every test blocks its corresponding state transition) — creates the test-induced traffic jam : a 4h FTF sweep cannot block every PR merge. Operator decision C=b explicitly chose tiered.
  • No canonical budgets (defer to per-Story ad-hoc) — leads to the "performance budget evaporates as soon as it's measured" anti-pattern. Operator decision D=b explicitly locks them now even if approximate ; refine via dedicated budget Story when nightly drift fires.
  • Drop UAT entirely (consistent with single-operator scope) — operator decision E=c explicitly KEEPS UAT, hybrid format. The Markdown scenarios serve as the operational documentation of what was validated ; Playwright catches Console UI regressions that no other type covers.
  • Per-type DRI stubs (backup person + escalation path stubs even if all stubs point at @dococeven) — creates ownership debt that doesn't reflect reality and adds noise to OP / runbooks. Operator decision F=a explicitly chose honest single-DRI.
  • Define test types only in the strategy doc, no ADR — would mean future amendments are doc-PR-only with no ADR-58-style invariant gating ; a Story could redefine a type silently. The ADR pairing locks the invariants ; the strategy doc holds the operational matrix.

Consequences

Positive :

  • Locks scoping for CVN-N015 Stories EA-EI — every downstream Story knows its test-type bucket without re-debating
  • Closes the "system vs integration" debate — strategy doc §10 glossary is canonical
  • Honest single-operator scope — drops 4 types of theatre + accepts single-DRI ; reduces process debt
  • Performance budgets prevent regression-in-budget — every new code path gets a budget entry, drift visible in nightly check
  • Tiered gates prevent test-traffic-jam — costly tests don't block cheap transitions

Negative :

  • One more ADR to track (now 83 total, +1 to CLAUDE.md) — minor
  • Performance budgets are first-pass approximations — early nightly drift may produce false-positive churn until the canonical numbers settle. Mitigated : refine via budget Story on signal, not pre-emptively.
  • Hybrid UAT (Markdown + Playwright) requires both formats maintained — Markdown for backend, Playwright for Console. Maintenance cost when Console UI redesigns. Mitigated : Playwright recordings re-recorded on Console UI changes (1-time per redesign).
  • Out-of-scope types may need re-introduction later — chaos / load especially when team grows. Re-introducing = amending this ADR (1 Story).

Neutral :

  • CR and committee enforce ADR-58 + ADR-77 + ADR-82 to keep this strategy alive — the strategy doc lives under MkDocs strict mode (per ADR-77), so broken cross-references surface at build time. The companion ADR's invariant 1 ("strategy doc is SSoT") locks the bind.
  • pytest -m regression marker convention codified in EA-S02 — operational, NOT an invariant of this ADR. Belongs to the fixture/factory Story.

Rollback

Soft rollback : drop the "MUST" requirement on downstream Stories referencing the strategy doc → "SHOULD" with operator approval. Strategy doc + ADR stay as documentation but do not gate Story scoping.

Hard rollback : delete this ADR + strategy doc. CVN-N015 Stories revert to ad-hoc taxonomy per Epic. The 12 types stay implicit (every test currently exists or doesn't ; no code change), only the formalism evaporates.

The performance budget table can be rolled back independently : delete the §5 row + the corresponding performance test markers. Existing tests stay green ; budgets become advisory.

References

  • Parent need : CVN-N015 (industrial test infrastructure, GH #592)
  • Story that introduced this : CVN-N015-EA-S01 (test strategy definition, GH #836, OP wp#116)
  • Strategy doc (companion) : documentation/strategy/CVN-N015-test-strategy.md
  • Operator decisions : OP wp#116 comment 701 (2026-05-05)
  • Related ADRs : ADR-04 (cache contracts) ; ADR-14 (ML behaviour scope) ; ADR-23 (contract test scope) ; ADR-58 (FTF unit-test contract) ; ADR-77 (docs SSoT, this strategy lives under it) ; ADR-79 (FTF gate semantics referenced by ML behaviour row) ; ADR-81 (8 state transitions aligned with gate hierarchy) ; ADR-82 (committee log for this strategy_review)
  • Downstream Stories that reference this : CVN-N015-EA-S02 onwards (S02 is the next pickup ; documented in EA Epic dossier)