CVN-N015-EA-S03 — Architecture + stack pick + integration design¶

Date : 2026-05-06 Story : CVN-N015-EA-S03 (OP wp#118) GH issue : #838 Parent Epic : CVN-N015-EA — Test stack foundation (OP wp#107 / GH #827) Depends on : S01 strategy (wp#116, PR #852) + S02 requirements (wp#117, PR #856) — both in flight, dossiers referenceable via PR branches Blocks : S04-S09 implementation Stories Operator decisions locked : 2026-05-06 — A=stack pinned to current stable (pytest 8.x + pytest-xdist 3.x + testcontainers-python 4.x), B=function scope default + session opt-in for Testcontainers, C=any-failure-blocks Status : Specified (committee plan_review session 94aa2881 PASSED 2026-05-06 ; OP wp#118 transitioned to Specified 2026-05-06 + In progress on PR open + awaiting merge)

0. Intent + scope¶

S03 is the architecture decision Story for the foundation Epic. It picks the concrete stack, freezes the directory layout, sets the fixture-scope discipline, decides the Testcontainers conventions, and locks the CI tier promotion semantics. Output : a single dossier + 5 new ADRs (0084-0088) ratified at accepted ; ADR-0083 is already at accepted per S01 §11.5 (S03 cross-refs the §11.4 test-verdict scope extension). See §5 for the full table.

S04-S09 implementation Stories execute against S03's choices without further architectural deliberation — if a S04+ Story discovers a wrong choice, it raises an amendment Story rather than locally re-deciding (per the same invariant S01 set for the strategy doc).

Out of scope (covered elsewhere) : - Test taxonomy + cadence matrix → S01 strategy doc - Functional + non-functional requirements → S02 dossier - Specific fixture / factory implementations → S04 - Per-service Testcontainers helpers → S05+ - Flaky-test detector implementation → S06 (uses #756 Story)

1. Operator decisions — locked 2026-05-06¶

#	Decision	Locked value	Rationale
A	Library versions	pytest 8.x (latest stable minor) ; pytest-xdist 3.x ; testcontainers-python 4.x ; time_machine (per S02 decision C) ; pytest-randomly + pytest-rerunfailures	All OSS, all support Python 3.12 (S02 NF8). Pin policy : freeze to the latest stable minor on merge of this Story ; bump only on a deliberate amendment Story (no automatic dependabot float — predictability beats freshness for a test foundation).
B	Default fixture scope	`function` scope by default ; `session` scope opt-in explicitly for Testcontainers + heavy MLflow/PG fixtures	Pattern « safe-by-default, perf opt-in ». Function-scope eliminates an entire class of cross-test isolation bugs ; the wall-clock budgets B from S02 (2 / 10 / 30 min) are met via `pytest-xdist -n auto` parallelism (S02 NF4), NOT via shared session state. Session-scope warm-up cost is amortised over the parallel workers anyway. Explicit opt-in for Testcontainers because spinning Postgres / MLflow / etc. per test would blow the integration budget.
C	Promotion gate semantics between tiers	any-failure-blocks for the foundation Epic (unit / property / contract / cache / integration / DAG smoke)	Foundation = deterministic by construction. Threshold-based gates introduce a "is it flake or regression" ambiguity that quietly drifts into "the suite is yellow but we ship anyway", which is exactly what the strategy is designed to prevent. Real flakes are out-of-band-handled by the F5 flaky-test detector (auto-issue + 5 % flake rate threshold per F5) — they don't need a graceful gate. Threshold-based gates have their place in other Epics (ML behaviour, system-E2E) where stochasticity is intrinsic ; those Epics will pick their own promotion semantics.
D	Test dataset versioning tooling (S02 F8 implementation pick)	DVC (Data Version Control) for datasets > 10 MB ; content-addressed in-repo for datasets ≤ 10 MB (file hashed via `sha256` and named `<name>.<sha8>.parquet`)	DVC is the de facto OSS standard, integrates natively with git (each dataset has a `.dvc` pointer file in git that pins a content hash + storage backend), and supports any S3-compatible backend — we point it at our existing MinIO Testcontainers helper for tests + at the prod MinIO instance for production-grade golden datasets. The 10 MB threshold matches the practical git-friendliness limit (above which `git status` slows visibly on macOS). For ≤ 10 MB, in-repo with content-addressed naming gives us atomic git-tracked provenance without external infra. Roll-our-own ruled out — DVC handles the cache + reproduce + push/pull semantics that we'd otherwise re-build. git-lfs ruled out as primary — bandwidth costs become non-trivial at scale and the LFS pointers don't carry the lineage metadata DVC tracks.

2. Directory layout — frozen¶

tests/
├── conftest.py                    # project-root fixtures (pg_engine, mlflow_server, time_freeze, seed_all)
├── factories/                     # per-shape pure data builders (S02 F3)
│   ├── __init__.py
│   ├── ohlcv.py                   # make_ohlcv_window(seed, n_bars, freq, crypto)
│   ├── model_artefacts.py         # make_xgb_model(seed) / make_lgb_model / make_cb_model
│   ├── ftf_results.py             # make_finetune_run / make_finetune_results
│   └── signals.py                 # make_signal(side, confidence, filter_trace)
├── fixtures/                      # cross-cutting fixtures wired against factories
│   ├── __init__.py
│   ├── containers.py              # Testcontainers helpers : pg / redis / minio / mlflow / airflow_scheduler
│   ├── time.py                    # time_machine pytest fixture (frozen_time(when))
│   └── seed.py                    # seed_all fixture (random / numpy / xgboost / pytorch)
├── cases/                         # data-driven test cases per Story (S02 F8)
│   └── <cvn_id>/                  # one folder per Story
│       └── *.yaml                 # YAML test-case files consumed via pytest.parametrize
├── datasets/                      # versioned test datasets (S02 F8 + decision D)
│   ├── small/                     # ≤ 10 MB ; content-addressed in-repo (<name>.<sha8>.parquet)
│   └── large/                     # > 10 MB ; DVC-tracked (.dvc pointer files in git, content in MinIO)
├── unit/                          # pytest -m unit
├── property/                      # pytest -m property (hypothesis-based)
├── contract/                      # pytest -m contract (boundary schema validation)
├── cache/                         # pytest -m cache (key correctness + invalidation)
├── integration/                   # pytest -m integration (multi-component, in-process)
│   └── services/                  # touches Testcontainers
├── dag_smoke/                     # pytest -m dag_smoke (per-DAG dag.test())
└── e2e/                           # pytest -m e2e (will populate via system-E2E Epic — placeholder dir)

documentation/stories/
└── <cvn_id>/                      # one folder per Story (created at In specification)
    └── tests/                     # Story-phase test artefacts (per S01 §11.2)
        ├── strategy.md            # Story's "test strategy" — extracted from plan dossier at Specified
        ├── test_run_<sha>.md      # one file per CI run that gated a transition
        ├── manifest.yaml          # committed at Tested → Closed ; immutable thereafter
        └── datasets/              # local manifest of (dataset_name, dataset_sha) pairs used by this Story

Discipline rules (enforced by lint OR pytest --collect-only smoke at CI fast-tier) : - A test file under tests/<type>/ MUST carry the matching pytest marker on every test (@pytest.mark.<type>). - Every test MUST also carry @pytest.mark.story("<cvn_id>") (S02 F7) — enforced by a fast-tier CI lint rule. - A factory MUST be a pure function or factory_boy-style class — no I/O, no fixtures. - A fixture MUST live under tests/fixtures/ OR tests/conftest.py ; per-package conftest.py is allowed only for narrow fixtures used by ≤ 1 test file's siblings. - No fixture > 50 LoC (S02 F4 acceptance signal). - No fixture has > 3 parameters (S02 F4 acceptance signal). - A test reading a dataset MUST read from tests/datasets/ (S02 F8 acceptance signal — CI fails if a test opens a file outside this tree as test input). - A test case YAML under tests/cases/<cvn_id>/ MUST validate against the test-case schema tests/cases/_schema.yaml (S04 ships the schema).

3. Service-virtualization conventions — Testcontainers¶

Service	Container helper	Default scope	Warm-cache start budget
Postgres	`tests/fixtures/containers.py::pg_container`	`session`	< 2 s (image pre-pulled in CI cache)
Redis	`tests/fixtures/containers.py::redis_container`	`session`	< 1 s
MinIO	`tests/fixtures/containers.py::minio_container`	`session`	< 2 s
MLflow	`tests/fixtures/containers.py::mlflow_server`	`session`	< 5 s (cold-start dominated by gunicorn boot)
Airflow scheduler	`tests/fixtures/containers.py::airflow_scheduler`	`session`	< 8 s (heaviest — entire DAG bag parse)

Total cold-start under warm cache : < 18 s for the 5 services in parallel (xdist worker startup happens once per worker, not once per test). Fits inside S02 NF2 budget (≤ 10 min integration tier) with ~ 9 min of headroom for actual test execution.

Conventions : - Each helper returns a tuple (container, client) where client is the canonical Python client for the service (e.g., SQLAlchemy engine, redis-py client, MLflow Client, etc.). - Each helper accepts a reuse: bool = False kwarg (default False). When True, uses Testcontainers --reuse flag to keep the container alive across pytest invocations on dev laptops (NF6 fresh-clone path doesn't use reuse). - Each helper has a docstring linking to the matching ADR (per ADR-77 SSoT discipline). - Container images are pinned to specific tags (no latest).

4. CI tier mapping → workflow files¶

Tier (per S01 strategy §3)	When	Marker selector	Wall-clock budget (S02 NF1-NF3)	Workflow file
fast	every PR touching code	`pytest -m "unit or property or contract"`	p95 ≤ 2 min in CI ; p95 ≤ 30 s on laptop	`.github/workflows/ci-fast.yml` (already exists ; this Story specifies the marker contract)
medium / integration	every PR touching subsystem code	`pytest -m "cache or integration or dag_smoke"`	p95 ≤ 10 min	`.github/workflows/ci-integration.yml` (new in S04)
nightly	scheduled @ 02:00 UTC + every push to `main`	`pytest -m "data_quality or performance or system_e2e"` (added by EB-EI Epics)	p95 ≤ 30 min	`.github/workflows/ci-nightly.yml` (new in S04)

Promotion gate (decision C any-failure-blocks) : a failure in the fast tier blocks the medium tier from running on the same SHA. A failure in the medium tier blocks the nightly tier. This is strict — no thresholds, no "5 % failure tolerated" exceptions in the foundation Epic.

xdist parallelism contract (S02 NF4) : every workflow runs pytest -n auto. A failure under -n auto that doesn't reproduce under -n 1 is a fixture isolation bug, not a flake — gets reported via the F5 flaky-test detector AND blocks the merge until fixed.

5. ADR ratification — 5 new ADRs (0084-0088) land with this Story ; ADR-0083 already accepted per S01¶

This dossier ships 5 new ADRs at status accepted. ADR-0083 was already ratified at accepted by the same gate that ratified the S01 strategy doc (committee plan_review 53d76f0f PASSED) — S03 cross-refs but does NOT re-ratify it.

ADR #	Title	Source	Status after merge
0083	Test taxonomy + gate hierarchy	S01 strategy doc (PR #852)	`accepted` already (per S01 §11.5 ratification ; this Story extends with the §11.4 test-verdict scope reference)
0084	Foundation Epic test stack pick (pytest 8.x + xdist 3.x + Testcontainers 4.x + DVC 3.x for datasets > 10 MB)	this Story	`accepted`
0085	Fixture scope discipline (function default, session opt-in for containers)	this Story	`accepted`
0086	CI tier promotion gate — any-failure-blocks for foundation Epic	this Story	`accepted`
0087	Story-phase test integration : tests are first-class artefacts of every ADR-81 transition + explicit committee verdict at every gate (ratifies S01 §11 strategic invariant + S02 U7 + F7 + extends ADR-68 committee scope)	this Story	`accepted`
0088	Test cases + datasets versioned + provenance-tracked (test-as-code with reproducibility guarantee) (ratifies S02 F8 + NF10 + NF11 ; pins decision D = DVC + content-addressed in-repo)	this Story	`accepted`

Each new ADR is a single-page document keyed on the locked decisions (A/B/C/D above) and the strategic invariants from S01/S02. The 5 new ADRs ship in the same PR as this dossier (per ADR-77 SSoT — strategy / requirements / architecture artefacts ship as a coherent unit, not split across multiple PRs).

Why ADR-0087 + ADR-0088 are structuring : they bind the process (ADR-81 8-state Story workflow + ADR-68 committee channel) to the artefacts (tests + datasets + manifests) at every gate. Without them, the test stack is just plumbing ; with them, the test stack is the memory of correctness for every Story the project ever ships. They make the test factory auditable, reproducible, and impressive to anyone who reads the manifest of a Closed Story 6 months later — every assertion the Story made about "we tested for X" can be re-executed bit-for-bit from (git_sha, dataset_sha) alone.

6. Story-phase test integration mechanism — how the matrix is enforced¶

The S01 §11 strategic invariant ("tests are first-class artefacts of every ADR-81 transition") becomes operational via the following programmatic surface :

6.1 `@pytest.mark.story` discoverability¶

Every test carries a @pytest.mark.story("<cvn_id>") decorator. The <cvn_id> is the OP Story id (e.g., CVN-N015-EA-S04). Mechanism :

import pytest

@pytest.mark.story("CVN-N015-EA-S04")
@pytest.mark.unit
def test_seed_all_resets_random_consistently():
    ...

Filtering mechanism (pytest-native correction) : pytest -m "story_<cvn_id>" does NOT work for marker arguments — pytest's -m filter operates on marker NAMES, not arguments. We ship a tiny pytest plugin (in S04 tests/conftest.py) that adds a --story <cvn_id> CLI flag and filters the collection accordingly :

# tests/conftest.py (S04 will ship this)
def pytest_addoption(parser):
    parser.addoption("--story", action="store", default=None,
                     help="Filter tests to a single Story id (e.g., CVN-N015-EA-S04)")

def pytest_collection_modifyitems(config, items):
    story = config.getoption("--story")
    if not story:
        return
    items[:] = [
        item for item in items
        if any(m.name == "story" and story in m.args for m in item.iter_markers())
    ]

Then the operator-facing surface becomes :

# All tests for a Story (used by G5 guardrail)
pytest --collect-only -q --story CVN-N015-EA-S04

# Run a Story's full test surface
pytest --story CVN-N015-EA-S04

Why this design (not the alternative @pytest.mark.story_CVN_N015_EA_S04 per-Story marker) : per-Story markers would require declaring each marker in pyproject.toml (or a registration hook) to avoid PytestUnknownMarkWarning, which means every Story creation triggers a marker-registration commit. The plugin pattern keeps @pytest.mark.story("<cvn_id>") as the single registered marker, with the Story id carried as a marker arg — no per-Story registration needed. CR run on commit de3b2e57 flagged the original -m "story_<cvn_id>" phrasing as broken pytest semantics — this §6.1 v2 corrects it.

6.2 Per-Story manifest schema¶

Manifest at documentation/stories/<cvn_id>/tests/manifest.yaml has a fixed schema (S04 ships the JSON schema for validation) :

cvn_id: CVN-N015-EA-S04
closed_at: 2026-08-15T14:30:00Z
git_sha_at_close: 9f8e7d6c
acceptance_criteria:
  - id: 1
    description: "Every test under tests/<type>/ carries @pytest.mark.<type>"
    proven_by:
      - test_id: "tests/unit/test_marker_discipline.py::test_unit_dir_carries_unit_marker"
        test_sha: 9f8e7d6c
        dataset_sha: null   # no dataset needed
        run_id: "github-actions-12345"
        outcome: PASSED
  - id: 2
    description: "Per-Story test count ≥ 1 per acceptance criterion"
    proven_by:
      - test_id: "tests/unit/test_g5_guardrail.py::test_g5_blocks_pr_without_story_tests"
        test_sha: 9f8e7d6c
        dataset_sha: "abc123de"
        run_id: "github-actions-12347"
        outcome: PASSED

The manifest is committed at Tested → Closed and immutable thereafter (NF11 + G6 CI guardrail).

6.3 G5 + G6 CI guardrails¶

G5 (Story-phase test enforcement) : at PR merge time, parses the PR's linked OP Story (from the GH issue → OP wp link in the body), looks up the Story's current state, and asserts the test artefacts required for the next transition are present. Examples :
PR transitions Story In progress → Developed → G5 asserts pytest --collect-only -q --story <cvn_id> returns ≥ 1 test per acceptance criterion + datasets exist under tests/datasets/ (uses the S04 pytest plugin's --story CLI flag — pytest's native -m filter operates on marker names not arguments ; see §6.1 for the plugin design)
PR transitions Story Tested → Closed → G5 asserts documentation/stories/<cvn_id>/tests/manifest.yaml exists + validates against the schema + every acceptance criterion has ≥ 1 proven_by row
G6 (manifest immutability after Closed) : at PR merge time, for any modified manifest.yaml, fetches the OP Story status. If the Story is Closed, fails the merge unless the PR is an explicit amendment Story (per S01's invariant — amendments require their own Story).

Both guardrails ship in S04 alongside the existing G1-G4. Hard gates, no thresholds.

6.4 Committee verdict body schema (extends ADR-68 per S01 §11.4)¶

The pr_review session JSON's verdict object MUST contain a tests: field with 4 sub-verdicts — committee CLI (scripts/expert_committee.py) is extended in S04 to surface these sub-verdicts as a mandatory section in the prompt. Example session JSON :

{
  "verdict": {
    "session_type": "pr_review",
    "status": "PASSED",
    "tests": {
      "coverage_per_acceptance_criterion": "PASSED",
      "adversarial_edge_coverage": "PASSED",
      "datasets_versioned_reproducible": "PASSED",
      "manifest_maps_tests_to_criteria": "PASSED"
    }
  }
}

Any INSUFFICIENT blocks merge regardless of the rest of the verdict.

The plan_review session JSON's verdict object MUST contain a tests_strategy field (PASSED or INSUFFICIENT). INSUFFICIENT blocks In specification → Specified.

6.5 `make` targets — operator-facing surface¶

S04 ships the following targets :

make test-fast — runs the fast tier locally (unit + property + contract)
make test-story CVN_ID=<cvn_id> — runs the full test surface for one Story
make reproduce CVN_ID=<cvn_id> — checks out the manifest's recorded SHAs, materialises datasets via DVC, runs the recorded tests, asserts outcomes match the manifest bit-for-bit (NF10 audit canary)
make test-manifest CVN_ID=<cvn_id> — drafts the manifest from the latest CI run for the Story (operator reviews + commits)

These 4 targets are the whole operator surface for the test factory ; everything else is CI-driven.

6.6 The audit story — why this matters¶

6 months from now, an audit reviewer asks : "this trade lost money on 2026-09-12 — what did we test for that code path at that time ?"

With this mechanism : 1. git log --until=2026-09-12 src/commun/pipeline/inference_api.py → finds commit SHA 2. From the commit, walk the GH PR → linked OP Story → documentation/stories/<cvn_id>/tests/manifest.yaml 3. Manifest pins the (test_sha, dataset_sha, run_id) for every acceptance criterion the Story claimed 4. make reproduce CVN_ID=<cvn_id> runs the exact same tests against the exact same datasets the Story used to validate itself

No "we think it was tested" — provable, deterministic, bit-for-bit reproducible from any historic point. This is the property that makes the test factory worth building.

7. Acceptance criteria¶

#	Criterion	Evidence
1	Plan dossier merged at `documentation/reviews/2026-05-06-cvn-n015-ea-s03-architecture-plan.md`	this file
2	5 new ADRs (0084 / 0085 / 0086 / 0087 / 0088) merged at `accepted`	`documentation/adr/0084-.md` … `0088-.md`
3	ADR-0083 already at `accepted` per S01 §11.5 ; S03 cross-refs the §11.4 test-verdict scope extension	cross-link in §5 row 0083
4	Committee `plan_review` PASSED — verdict body includes `tests_strategy: PASSED` field per §6.4. (No `operator waiver` bypass : ADR-68 makes the committee gate mandatory for plan_review on architecture Stories ; an exception would require an amendment to ADR-68 itself.)	session JSON link
5	Every S04-S09 Story explicitly references the architecture decisions for its scope	cross-refs in S04+ dossier headers (validated when those Stories open, NOT a blocker for this PR)
6	Each library version pinned to a specific minor (no float)	§1 row A + ADR-0084
7	Each Testcontainers helper has a warm-cache start budget documented	§3 table
8	wp#118 OP transition `New → In specification → Specified`	OP audit comment trail
9	Per-Story manifest schema documented + a JSON schema file ships in S04	§6.2 schema example
10	G5 + G6 CI guardrails documented (Story-phase enforcement + manifest immutability) ; G5+G6 implementation in S04	§6.3
11	Operator-facing `make` targets enumerated (`test-fast`, `test-story`, `reproduce`, `test-manifest`)	§6.5
12	DVC adopted for datasets > 10 MB ; in-repo content-addressed for ≤ 10 MB	§1 row D + ADR-0088
13	Audit story (§6.6) is the explicit measure of the test factory's success	§6.6 narrative

8. Out of scope (explicit)¶

Implementation of any fixture / factory / container helper — that's S04+ work.
CI workflow YAML files — touched by S04 (workflow scaffolding) ; this Story only defines the contract they implement.
Per-service conftest.py content — S05+ ; this Story sets the convention, doesn't fill it.
Migration of existing tests under tests/ — separate cleanup Story (S07?) ; this Story freezes the layout for new tests, doesn't refactor the existing suite.
Performance budget refinement — S02 NF1-NF3 stay as-is ; refined in a dedicated budget Story when nightly drift signals fire (per S01 §11 follow-up).
Cross-Epic concerns (ML behaviour / data quality / system-E2E choices) — those Epics' own architecture Stories.

9. Risks¶

Risk	Likelihood	Impact	Mitigation
S01 or S02 committee `plan_review` mandates a change that invalidates an architecture decision here	medium	medium	Each §1 decision has an explicit upstream trace ("driven by S02 NF4" etc.) ; if the upstream changes, the affected decision row gets re-locked in a v2 amendment. ~30 min per decision rework cost ; not a structural issue.
Testcontainers warm-cache start budgets (§3) prove too aggressive in CI (cold start dominates first run)	medium	low	Pre-pull container images in a dedicated CI cache layer (handled by S04 workflow file) ; budgets in §3 are steady-state, NF6 fresh-clone bound (S02) is separate. Add a budget alert in Grafana if any container exceeds its budget by >20 % over a 7-day rolling window.
`function`-scope default (decision B) measurably hurts the fast-tier 2-min budget	low	medium	xdist parallelism (S02 NF4) absorbs the per-test setup cost — measured on a representative slice before merging this dossier. If the budget gets stressed, the mitigation is to expand the session-opt-in list (not flip the default), which is a 1-line ADR-0085 amendment.
`any-failure-blocks` (decision C) proves too strict if a third-party CI runner goes flaky (e.g., GitHub Actions cache corruption)	low	low	The flaky-test detector F5 auto-categorises external-infra flakes (e.g., "image pull failed") separately from test-code flakes. Infra flakes auto-retry once at the workflow level (per `pytest-rerunfailures` config) ; only test-code flakes block the merge.
Library version pin (decision A) goes stale before next dependency update Story	medium	low	Quarterly cadence : a dedicated S0X "test stack version refresh" Story bumps minors + runs the full pyramid. Acceptance bar : zero new test failures vs the prior version. Out-of-scope here.
Single-operator means S04+ implementation Stories may diverge from this architecture under time pressure	medium	high	The strategy doc's invariant "downstream Stories that disagree MUST raise an amendment Story" is enforced by committee `pr_review` on each S04+ PR — same gate that catches this kind of drift in PR #685's REGISTRY pattern.
DVC bootstrapping cost (S3-compatible backend, dvc init, dvc add for existing datasets) blocks S04 timeline	medium	medium	DVC ships in S04 with exactly 1 backend (the existing MinIO Testcontainers helper) ; bootstrap is `dvc init && dvc remote add -d minio s3://test-datasets` (3 lines). Existing test datasets in `data/` migrate one-by-one as Stories that touch them open — no big-bang migration.
Manifest schema (§6.2) churns over the first 2-3 Stories that adopt it	high	low	The schema lives at `tests/manifests/_schema.yaml` ; schema changes are amendment Stories per S01 invariant. First 3 Stories' manifests can be migrated to the new schema by a one-shot script ; after that the schema is `accepted` and changes require explicit ratification.
`make reproduce` audit canary (NF10) is too slow to run nightly across all closed Stories	low	low	Random sampling : 1 closed Story per nightly cycle, weighted by recency (closed within last 30 days has 5× the probability of older closures). Full sweep monthly via a dedicated workflow. Auto-files GH issue if any reproducer fails.
Committee verdict body schema (§6.4) creates a JSON-schema dependency that ossifies the committee CLI	low	medium	Schema lives in `committee/sessions/_verdict_schema.yaml` ; CLI validates against it post-session. New verdict fields require an amendment Story to ADR-0087 (not a hot patch).

10. Sequencing + dependencies¶

S01 (strategy, wp#116, PR #852) ──┐
                                   ├──► S02 (requirements, wp#117, PR #856) ──┐
                                   │                                            ├──► S03 (this Story, wp#118)
                                   └────────────────────────────────────────────┘     plan_review committee
                                                                                       PR open + CR + merge
                                                                                       wp#118 New → In spec → Specified → In progress → Developed → Closed
                                                                                       │
                                                                                       └──► S04 (impl : conftest + factories + fast-tier workflow)
                                                                                       └──► S05 (impl : Testcontainers helpers per service)
                                                                                       └──► S06 (impl : flaky-test detector — uses #756)
                                                                                       └──► S07 (impl : migrate existing suite to new layout)
                                                                                       └──► S08 (impl : nightly tier workflow)
                                                                                       └──► S09 (impl : per-tier Grafana dashboard wiring)

S03 PR can land BEFORE S01 + S02 PRs merge (S03 references S01/S02 dossiers via the PR branch paths — same pattern S02 used for S01).
S03 closure requires S01 + S02 to have merged (the trace requires the upstream dossiers to exist on main).

Single-WIP rule (per ADR-69) : S03 sits in the `Specified` buffer once committee `plan_review` PASSES, behind S02 in priority. Currently `In progress` Stories : wp#103 (S14 Track 1 leakage), wp#46 (S07 Track 2), wp#45 (S06 Track 11). S03 doesn't compete for capacity until one of those + S02 close.

11. References¶

OP wp#118 / GH #838 : this Story's tracking
S01 deliverables (PR #852 merged squash 8b7a4d5b ; wp#116 Closed) :
Strategy doc : ../strategy/CVN-N015-test-strategy.md (status accepted)
S01 ADR-0083 : ../adr/0083-test-taxonomy-and-gate-hierarchy.md (merged via PR #852 ; status accepted — same gate as the S01 strategy doc, confirmed in §5 of this dossier)
S02 requirements dossier : 2026-05-06-cvn-n015-ea-s02-requirements-plan.md (merged on main via PR #856 squash 47d13d48 ; wp#117 Closed)
Testcontainers integration Story : #757 — §3 conventions inherit its scope
Pytest factories Story : #586 — §2 directory layout inherits its scope
Flaky-test detector Story : #756 — §4 promotion gate's flake handling delegates to it
ADR-58 (FTF guardrails) — drives the unit + cache marker discipline
ADR-68 (committee = default review channel)
ADR-77 (MkDocs SSoT — this dossier + 5 new ADRs (0084-0088) are the SSoT for foundation Epic architecture)
ADR-81 (8-state Story workflow)

12. Plan-review questions for committee¶

Decision A pin policy : we pin to a stable minor (e.g., pytest 8.3.x) and bump only via amendment Story. Is the quarterly cadence sustainable for a single operator, or should we adopt an automated dependabot-with-test-suite-gate that auto-merges minor bumps when the full pyramid passes ?
Decision B opt-in list : we explicitly opt session-scope for the 5 Testcontainers + heavy MLflow/PG fixtures. Is the list complete, or are there other expensive fixtures (e.g., a pre-trained xgboost model loaded from disk) that should also be session-scoped from day 1 ?
Decision C strictness across Epics : we say "any-failure-blocks for foundation Epic, threshold-based for ML behaviour / system-E2E (other Epics)". Does committee endorse this split, or do we want a single semantic across all Epics (foundation strict + Epics flagged with @pytest.mark.allow_threshold for stochastic tests) ?
§3 warm-cache budgets realism : the Airflow scheduler at < 8 s warm is the riskiest budget — DAG bag parse on macOS with our 30+ DAGs can blow this. Should we measure on a representative dag-bag snapshot before merging, or accept the budget with a 1-month "calibrate and refine" follow-up ?
§5 ADR ship-as-bundle vs split-PRs : we land 5 new ADRs (0084/0085/0086/0087/0088) in the same PR as this dossier. Is the bundle clearer (one merge = one architectural moment), or do reviewers prefer 1 PR per ADR for finer-grained CR + revert ?
Layout migration scope (§8 out-of-scope) : we explicitly defer migrating the existing tests/ suite to a separate Story. Is "new tests follow new layout, old tests stay where they are until S07 cleanup" the right operational story, or should we mandate migrate-on-touch (whoever modifies an old test relocates it) ?
ADR-0087 committee verdict scope (§6.4) : we extend ADR-68 to mandate a tests: sub-verdict object on every pr_review. Does the committee endorse the 4 sub-verdicts (coverage / adversarial+edge / datasets versioned / manifest mapping) as the right partition, or are there other dimensions worth pinning (e.g., flakiness budget, mutation score, performance regression check) ?
ADR-0088 DVC vs git-lfs vs roll-our-own : decision D picks DVC for > 10 MB datasets. The 10 MB threshold is gut-feel — should we measure git's behaviour on a representative dataset mix before pinning, or accept 10 MB as a starting point with a 1-month calibration follow-up ?
G5 Story-state lookup mechanism (§6.3) : G5 needs to know the OP Story's current state at PR merge time to decide which transition is being attempted. Options : (a) parse the OP wp via the OP API at CI time (adds an external dependency to the merge gate), (b) require the PR template to declare the intended next state explicitly in a machine-readable section. Operator preference ?
NF10 audit canary scope (§6.6) : we propose a nightly random sampling of 1 closed Story for make reproduce. Is that the right cadence, or should we instead trigger reproductions on events (e.g., a Story closed > 60 days ago hasn't been reproduced — promote to canary) ?
Manifest immutability vs amendment Stories (§6.3 G6 + S02 NF11) : we say a Closed Story's manifest is immutable, amendments require a new Story. Is this too strict for typo fixes / link rot updates ? Should we have an "errata" mechanism that allows minor corrections without a full Story (with an explicit errata log committed alongside the manifest) ?