ADR-0087 — Story-phase test integration : tests are first-class artefacts of every ADR-81 transition + explicit committee verdict at every gate¶
Status: accepted (committee plan_review session 94aa2881 PASSED 2026-05-06 ; ratified together with the S03 architecture dossier per the same gate ; structurally extends ADR-68 committee scope and ADR-81 8-state Story workflow)
Date: 2026-05-06
Introduced by: CVN-N015-EA-S03 / GH issue #838 / OP wp#118
Ratifies: S01 strategy doc §11 strategic invariant + S02 requirements F7 + U7 + S03 architecture §6 mechanism
Extends: ADR-68 (committee = default review channel) ; ADR-81 (8-state Story workflow)
Context¶
CVNTrade has historically treated tests as "the phase that follows code" — a backlog of tickets the operator chases after each Story merges. Test debt accumulates silently ; closed Stories ship without a record of WHAT was tested ; an audit reviewer 6 months later cannot reconstruct whether a code path was tested at the time of an incident (S02 U6).
S01's test strategy (PR #852 §11) names the fix : tests are first-class artefacts of every ADR-81 transition, NOT a separate phase. Every gate the operator passes through carries a test deliverable ; every committee session validates that deliverable's presence + quality ; every closure pins the (test_sha, dataset_sha, run_id) triples that proved each acceptance criterion immutably.
This ADR ratifies that invariant. Without it, the strategy doc is just narrative ; with it, the test factory becomes the memory of correctness for every Story the project ships.
Decision¶
Invariant 1 — gate-by-gate test artefacts¶
Every ADR-81 Story transition has a defined test artefact requirement :
| ADR-81 transition | Test artefact required | Stewarding tool | Gate enforcement |
|---|---|---|---|
New → In specification |
none | – | – |
In specification → Specified |
"Test strategy" subsection in plan dossier (per type, with numerical bars + planned dataset families) | committee plan_review validates |
committee verdict (see Invariant 3) |
Specified → In progress |
none (start coding) | – | – |
In progress → Developed |
All test code written + green locally + discoverable via @pytest.mark.story("<cvn_id>") + datasets versioned (DVC for > 10 MB, content-addressed in-repo for ≤ 10 MB per ADR-0088) |
pytest --collect-only --story <cvn_id> returns ≥ 1 test per acceptance criterion |
guardrail CI G5 |
Developed → In testing |
Full CI pipeline green (fast + medium tiers + relevant nightly suites) on PR head SHA + test artefacts ship with the PR (test code + cases + datasets + draft manifest) | committee pr_review issues 4-dimension verdict (see Invariant 3) |
CI status check + PR template box + committee verdict |
In testing → Tested |
UAT operator-validated (per S01 §6) + test-run report committed | – | OP audit comment + commit |
Tested → Closed |
Test manifest documentation/stories/<cvn_id>/tests/manifest.yaml committed pinning (test_sha, dataset_sha, run_id) per acceptance criterion ; immutable thereafter |
– | CI guardrail G6 (manifest immutability) |
Invariant 2 — per-Story directory layout¶
Every Story MUST own a folder at documentation/stories/<cvn_id>/tests/ with this structure :
documentation/stories/<cvn_id>/tests/
├── strategy.md # Story's test strategy — extracted from plan dossier at Specified
├── test_run_<sha>.md # one file per CI run that gated a transition
├── manifest.yaml # committed at Tested → Closed ; immutable thereafter
└── datasets/ # local manifest of (dataset_name, dataset_sha) pairs
The folder is created at In specification. The manifest.yaml is created at Tested → Closed and is immutable from that point (G6 CI guardrail rejects any commit modifying a Closed Story's manifest ; amendments require an explicit amendment Story per S01's invariant).
Invariant 3 — explicit committee verdict on tests (extends ADR-68)¶
Both plan_review and pr_review sessions MUST issue an explicit verdict on tests as a first-class review item, not implicitly absorbed into the broader code/dossier verdict.
plan_review verdict body MUST contain a tests_strategy field :
INSUFFICIENT blocks In specification → Specified.
pr_review verdict body MUST contain a tests: object with 4 sub-verdicts :
tests:
coverage_per_acceptance_criterion: PASSED | INSUFFICIENT
adversarial_edge_coverage: PASSED | INSUFFICIENT
datasets_versioned_reproducible: PASSED | INSUFFICIENT
manifest_maps_tests_to_criteria: PASSED | INSUFFICIENT
Any INSUFFICIENT blocks merge regardless of the rest of the verdict. This is not a soft signal — it's a hard gate.
The expert-test role within the committee owns these verdicts (when team grows ; until then, the existing 5-expert panel covers them as a mandatory checklist item). Committee CLI (scripts/expert_committee.py) is extended in S04 to surface these sub-verdicts as a mandatory section in the prompt.
Invariant 4 — @pytest.mark.story discoverability via S04 plugin¶
Every test carries a @pytest.mark.story("<cvn_id>") decorator. Filtering uses a pytest plugin (S04 deliverable) that adds a --story <cvn_id> CLI flag :
# tests/conftest.py (S04 will ship this)
def pytest_addoption(parser):
parser.addoption("--story", action="store", default=None)
def pytest_collection_modifyitems(config, items):
story = config.getoption("--story")
if not story:
return
items[:] = [item for item in items
if any(m.name == "story" and story in m.args
for m in item.iter_markers())]
Then pytest --collect-only --story CVN-N015-EA-S04 returns the full test surface for the Story. Used by G5 CI guardrail to verify ≥ 1 test per acceptance criterion.
(Note : pytest's native -m "story_<cvn_id>" does NOT work — -m filters on marker names, not arguments. The plugin is the canonical mechanism.)
Consequences¶
Positive :
- Tests stop being a "phase that follows code" — they're the contract every Story commits to
- Audit reviewer 6 months from now can make reproduce CVN_ID=<cvn_id> and re-execute every assertion the Story claimed (S02 U6 + NF10)
- Committee verdicts on tests are explicit + structured — no implicit "looks fine" approval that quietly drifts
- Per-Story manifest pins the proof of every acceptance criterion at closure time — closed Stories become auditable artefacts, not just merged code
- Sets the "memory of correctness" property that makes the test factory worth building (S03 §6.6 audit story)
Negative / risks :
- Per-Story folder + manifest add operator overhead at every gate transition ; mitigated by S04 make targets (make test-fast, make test-story, make reproduce, make test-manifest) that automate the steps
- Committee CLI extension (S04 deliverable) creates a dependency on the JSON-schema for verdict bodies ; mitigated by the schema living in committee/sessions/_verdict_schema.yaml with amendments via amendment Story
- G5 + G6 CI guardrails add CI complexity ; mitigated by their narrow scope (single git diff against the OP Story state) + clear failure messages
Cross-references : - S01 strategy doc §11 (the strategic invariant this ADR ratifies) - S02 requirements F7 + U7 + the 4 sub-verdicts contract - S03 architecture dossier §6 (the mechanism specification) - ADR-68 committee scope (extended by Invariant 3) - ADR-81 8-state Story workflow (extended by Invariant 1) - ADR-0088 test cases + datasets versioned + provenance-tracked (the data-side contract) - ADR-23 features version-pinned (the precedent ADR for "version-pinning is a hard contract") - ADR-77 (MkDocs SSoT)