Skip to content

ADR-0087 — Story-phase test integration : tests are first-class artefacts of every ADR-81 transition + explicit committee verdict at every gate

Status: accepted (committee plan_review session 94aa2881 PASSED 2026-05-06 ; ratified together with the S03 architecture dossier per the same gate ; structurally extends ADR-68 committee scope and ADR-81 8-state Story workflow) Date: 2026-05-06 Introduced by: CVN-N015-EA-S03 / GH issue #838 / OP wp#118 Ratifies: S01 strategy doc §11 strategic invariant + S02 requirements F7 + U7 + S03 architecture §6 mechanism Extends: ADR-68 (committee = default review channel) ; ADR-81 (8-state Story workflow)


Context

CVNTrade has historically treated tests as "the phase that follows code" — a backlog of tickets the operator chases after each Story merges. Test debt accumulates silently ; closed Stories ship without a record of WHAT was tested ; an audit reviewer 6 months later cannot reconstruct whether a code path was tested at the time of an incident (S02 U6).

S01's test strategy (PR #852 §11) names the fix : tests are first-class artefacts of every ADR-81 transition, NOT a separate phase. Every gate the operator passes through carries a test deliverable ; every committee session validates that deliverable's presence + quality ; every closure pins the (test_sha, dataset_sha, run_id) triples that proved each acceptance criterion immutably.

This ADR ratifies that invariant. Without it, the strategy doc is just narrative ; with it, the test factory becomes the memory of correctness for every Story the project ships.

Decision

Invariant 1 — gate-by-gate test artefacts

Every ADR-81 Story transition has a defined test artefact requirement :

ADR-81 transition Test artefact required Stewarding tool Gate enforcement
New → In specification none
In specification → Specified "Test strategy" subsection in plan dossier (per type, with numerical bars + planned dataset families) committee plan_review validates committee verdict (see Invariant 3)
Specified → In progress none (start coding)
In progress → Developed All test code written + green locally + discoverable via @pytest.mark.story("<cvn_id>") + datasets versioned (DVC for > 10 MB, content-addressed in-repo for ≤ 10 MB per ADR-0088) pytest --collect-only --story <cvn_id> returns ≥ 1 test per acceptance criterion guardrail CI G5
Developed → In testing Full CI pipeline green (fast + medium tiers + relevant nightly suites) on PR head SHA + test artefacts ship with the PR (test code + cases + datasets + draft manifest) committee pr_review issues 4-dimension verdict (see Invariant 3) CI status check + PR template box + committee verdict
In testing → Tested UAT operator-validated (per S01 §6) + test-run report committed OP audit comment + commit
Tested → Closed Test manifest documentation/stories/<cvn_id>/tests/manifest.yaml committed pinning (test_sha, dataset_sha, run_id) per acceptance criterion ; immutable thereafter CI guardrail G6 (manifest immutability)

Invariant 2 — per-Story directory layout

Every Story MUST own a folder at documentation/stories/<cvn_id>/tests/ with this structure :

documentation/stories/<cvn_id>/tests/
├── strategy.md          # Story's test strategy — extracted from plan dossier at Specified
├── test_run_<sha>.md    # one file per CI run that gated a transition
├── manifest.yaml        # committed at Tested → Closed ; immutable thereafter
└── datasets/            # local manifest of (dataset_name, dataset_sha) pairs

The folder is created at In specification. The manifest.yaml is created at Tested → Closed and is immutable from that point (G6 CI guardrail rejects any commit modifying a Closed Story's manifest ; amendments require an explicit amendment Story per S01's invariant).

Invariant 3 — explicit committee verdict on tests (extends ADR-68)

Both plan_review and pr_review sessions MUST issue an explicit verdict on tests as a first-class review item, not implicitly absorbed into the broader code/dossier verdict.

plan_review verdict body MUST contain a tests_strategy field :

tests_strategy: PASSED | INSUFFICIENT

INSUFFICIENT blocks In specification → Specified.

pr_review verdict body MUST contain a tests: object with 4 sub-verdicts :

tests:
  coverage_per_acceptance_criterion: PASSED | INSUFFICIENT
  adversarial_edge_coverage: PASSED | INSUFFICIENT
  datasets_versioned_reproducible: PASSED | INSUFFICIENT
  manifest_maps_tests_to_criteria: PASSED | INSUFFICIENT

Any INSUFFICIENT blocks merge regardless of the rest of the verdict. This is not a soft signal — it's a hard gate.

The expert-test role within the committee owns these verdicts (when team grows ; until then, the existing 5-expert panel covers them as a mandatory checklist item). Committee CLI (scripts/expert_committee.py) is extended in S04 to surface these sub-verdicts as a mandatory section in the prompt.

Invariant 4 — @pytest.mark.story discoverability via S04 plugin

Every test carries a @pytest.mark.story("<cvn_id>") decorator. Filtering uses a pytest plugin (S04 deliverable) that adds a --story <cvn_id> CLI flag :

# tests/conftest.py (S04 will ship this)
def pytest_addoption(parser):
    parser.addoption("--story", action="store", default=None)

def pytest_collection_modifyitems(config, items):
    story = config.getoption("--story")
    if not story:
        return
    items[:] = [item for item in items
                if any(m.name == "story" and story in m.args
                       for m in item.iter_markers())]

Then pytest --collect-only --story CVN-N015-EA-S04 returns the full test surface for the Story. Used by G5 CI guardrail to verify ≥ 1 test per acceptance criterion.

(Note : pytest's native -m "story_<cvn_id>" does NOT work — -m filters on marker names, not arguments. The plugin is the canonical mechanism.)

Consequences

Positive : - Tests stop being a "phase that follows code" — they're the contract every Story commits to - Audit reviewer 6 months from now can make reproduce CVN_ID=<cvn_id> and re-execute every assertion the Story claimed (S02 U6 + NF10) - Committee verdicts on tests are explicit + structured — no implicit "looks fine" approval that quietly drifts - Per-Story manifest pins the proof of every acceptance criterion at closure time — closed Stories become auditable artefacts, not just merged code - Sets the "memory of correctness" property that makes the test factory worth building (S03 §6.6 audit story)

Negative / risks : - Per-Story folder + manifest add operator overhead at every gate transition ; mitigated by S04 make targets (make test-fast, make test-story, make reproduce, make test-manifest) that automate the steps - Committee CLI extension (S04 deliverable) creates a dependency on the JSON-schema for verdict bodies ; mitigated by the schema living in committee/sessions/_verdict_schema.yaml with amendments via amendment Story - G5 + G6 CI guardrails add CI complexity ; mitigated by their narrow scope (single git diff against the OP Story state) + clear failure messages

Cross-references : - S01 strategy doc §11 (the strategic invariant this ADR ratifies) - S02 requirements F7 + U7 + the 4 sub-verdicts contract - S03 architecture dossier §6 (the mechanism specification) - ADR-68 committee scope (extended by Invariant 3) - ADR-81 8-state Story workflow (extended by Invariant 1) - ADR-0088 test cases + datasets versioned + provenance-tracked (the data-side contract) - ADR-23 features version-pinned (the precedent ADR for "version-pinning is a hard contract") - ADR-77 (MkDocs SSoT)