CVN-N015-EA-S03 — Architecture + stack pick + integration design¶
Date : 2026-05-06
Story : CVN-N015-EA-S03 (OP wp#118)
GH issue : #838
Parent Epic : CVN-N015-EA — Test stack foundation (OP wp#107 / GH #827)
Depends on : S01 strategy (wp#116, PR #852) + S02 requirements (wp#117, PR #856) — both in flight, dossiers referenceable via PR branches
Blocks : S04-S09 implementation Stories
Operator decisions locked : 2026-05-06 — A=stack pinned to current stable (pytest 8.x + pytest-xdist 3.x + testcontainers-python 4.x), B=function scope default + session opt-in for Testcontainers, C=any-failure-blocks
Status : Specified (committee plan_review session 94aa2881 PASSED 2026-05-06 ; OP wp#118 transitioned to Specified 2026-05-06 + In progress on PR open + awaiting merge)
0. Intent + scope¶
S03 is the architecture decision Story for the foundation Epic. It picks the concrete stack, freezes the directory layout, sets the fixture-scope discipline, decides the Testcontainers conventions, and locks the CI tier promotion semantics. Output : a single dossier + 5 new ADRs (0084-0088) ratified at accepted ; ADR-0083 is already at accepted per S01 §11.5 (S03 cross-refs the §11.4 test-verdict scope extension). See §5 for the full table.
S04-S09 implementation Stories execute against S03's choices without further architectural deliberation — if a S04+ Story discovers a wrong choice, it raises an amendment Story rather than locally re-deciding (per the same invariant S01 set for the strategy doc).
Out of scope (covered elsewhere) : - Test taxonomy + cadence matrix → S01 strategy doc - Functional + non-functional requirements → S02 dossier - Specific fixture / factory implementations → S04 - Per-service Testcontainers helpers → S05+ - Flaky-test detector implementation → S06 (uses #756 Story)
1. Operator decisions — locked 2026-05-06¶
| # | Decision | Locked value | Rationale |
|---|---|---|---|
| A | Library versions | pytest 8.x (latest stable minor) ; pytest-xdist 3.x ; testcontainers-python 4.x ; time_machine (per S02 decision C) ; pytest-randomly + pytest-rerunfailures | All OSS, all support Python 3.12 (S02 NF8). Pin policy : freeze to the latest stable minor on merge of this Story ; bump only on a deliberate amendment Story (no automatic dependabot float — predictability beats freshness for a test foundation). |
| B | Default fixture scope | function scope by default ; session scope opt-in explicitly for Testcontainers + heavy MLflow/PG fixtures |
Pattern « safe-by-default, perf opt-in ». Function-scope eliminates an entire class of cross-test isolation bugs ; the wall-clock budgets B from S02 (2 / 10 / 30 min) are met via pytest-xdist -n auto parallelism (S02 NF4), NOT via shared session state. Session-scope warm-up cost is amortised over the parallel workers anyway. Explicit opt-in for Testcontainers because spinning Postgres / MLflow / etc. per test would blow the integration budget. |
| C | Promotion gate semantics between tiers | any-failure-blocks for the foundation Epic (unit / property / contract / cache / integration / DAG smoke) | Foundation = deterministic by construction. Threshold-based gates introduce a "is it flake or regression" ambiguity that quietly drifts into "the suite is yellow but we ship anyway", which is exactly what the strategy is designed to prevent. Real flakes are out-of-band-handled by the F5 flaky-test detector (auto-issue + 5 % flake rate threshold per F5) — they don't need a graceful gate. Threshold-based gates have their place in other Epics (ML behaviour, system-E2E) where stochasticity is intrinsic ; those Epics will pick their own promotion semantics. |
| D | Test dataset versioning tooling (S02 F8 implementation pick) | DVC (Data Version Control) for datasets > 10 MB ; content-addressed in-repo for datasets ≤ 10 MB (file hashed via sha256 and named <name>.<sha8>.parquet) |
DVC is the de facto OSS standard, integrates natively with git (each dataset has a .dvc pointer file in git that pins a content hash + storage backend), and supports any S3-compatible backend — we point it at our existing MinIO Testcontainers helper for tests + at the prod MinIO instance for production-grade golden datasets. The 10 MB threshold matches the practical git-friendliness limit (above which git status slows visibly on macOS). For ≤ 10 MB, in-repo with content-addressed naming gives us atomic git-tracked provenance without external infra. Roll-our-own ruled out — DVC handles the cache + reproduce + push/pull semantics that we'd otherwise re-build. git-lfs ruled out as primary — bandwidth costs become non-trivial at scale and the LFS pointers don't carry the lineage metadata DVC tracks. |
2. Directory layout — frozen¶
tests/
├── conftest.py # project-root fixtures (pg_engine, mlflow_server, time_freeze, seed_all)
├── factories/ # per-shape pure data builders (S02 F3)
│ ├── __init__.py
│ ├── ohlcv.py # make_ohlcv_window(seed, n_bars, freq, crypto)
│ ├── model_artefacts.py # make_xgb_model(seed) / make_lgb_model / make_cb_model
│ ├── ftf_results.py # make_finetune_run / make_finetune_results
│ └── signals.py # make_signal(side, confidence, filter_trace)
├── fixtures/ # cross-cutting fixtures wired against factories
│ ├── __init__.py
│ ├── containers.py # Testcontainers helpers : pg / redis / minio / mlflow / airflow_scheduler
│ ├── time.py # time_machine pytest fixture (frozen_time(when))
│ └── seed.py # seed_all fixture (random / numpy / xgboost / pytorch)
├── cases/ # data-driven test cases per Story (S02 F8)
│ └── <cvn_id>/ # one folder per Story
│ └── *.yaml # YAML test-case files consumed via pytest.parametrize
├── datasets/ # versioned test datasets (S02 F8 + decision D)
│ ├── small/ # ≤ 10 MB ; content-addressed in-repo (<name>.<sha8>.parquet)
│ └── large/ # > 10 MB ; DVC-tracked (.dvc pointer files in git, content in MinIO)
├── unit/ # pytest -m unit
├── property/ # pytest -m property (hypothesis-based)
├── contract/ # pytest -m contract (boundary schema validation)
├── cache/ # pytest -m cache (key correctness + invalidation)
├── integration/ # pytest -m integration (multi-component, in-process)
│ └── services/ # touches Testcontainers
├── dag_smoke/ # pytest -m dag_smoke (per-DAG dag.test())
└── e2e/ # pytest -m e2e (will populate via system-E2E Epic — placeholder dir)
documentation/stories/
└── <cvn_id>/ # one folder per Story (created at In specification)
└── tests/ # Story-phase test artefacts (per S01 §11.2)
├── strategy.md # Story's "test strategy" — extracted from plan dossier at Specified
├── test_run_<sha>.md # one file per CI run that gated a transition
├── manifest.yaml # committed at Tested → Closed ; immutable thereafter
└── datasets/ # local manifest of (dataset_name, dataset_sha) pairs used by this Story
Discipline rules (enforced by lint OR pytest --collect-only smoke at CI fast-tier) :
- A test file under tests/<type>/ MUST carry the matching pytest marker on every test (@pytest.mark.<type>).
- Every test MUST also carry @pytest.mark.story("<cvn_id>") (S02 F7) — enforced by a fast-tier CI lint rule.
- A factory MUST be a pure function or factory_boy-style class — no I/O, no fixtures.
- A fixture MUST live under tests/fixtures/ OR tests/conftest.py ; per-package conftest.py is allowed only for narrow fixtures used by ≤ 1 test file's siblings.
- No fixture > 50 LoC (S02 F4 acceptance signal).
- No fixture has > 3 parameters (S02 F4 acceptance signal).
- A test reading a dataset MUST read from tests/datasets/ (S02 F8 acceptance signal — CI fails if a test opens a file outside this tree as test input).
- A test case YAML under tests/cases/<cvn_id>/ MUST validate against the test-case schema tests/cases/_schema.yaml (S04 ships the schema).
3. Service-virtualization conventions — Testcontainers¶
| Service | Container helper | Default scope | Warm-cache start budget |
|---|---|---|---|
| Postgres | tests/fixtures/containers.py::pg_container |
session |
< 2 s (image pre-pulled in CI cache) |
| Redis | tests/fixtures/containers.py::redis_container |
session |
< 1 s |
| MinIO | tests/fixtures/containers.py::minio_container |
session |
< 2 s |
| MLflow | tests/fixtures/containers.py::mlflow_server |
session |
< 5 s (cold-start dominated by gunicorn boot) |
| Airflow scheduler | tests/fixtures/containers.py::airflow_scheduler |
session |
< 8 s (heaviest — entire DAG bag parse) |
Total cold-start under warm cache : < 18 s for the 5 services in parallel (xdist worker startup happens once per worker, not once per test). Fits inside S02 NF2 budget (≤ 10 min integration tier) with ~ 9 min of headroom for actual test execution.
Conventions :
- Each helper returns a tuple (container, client) where client is the canonical Python client for the service (e.g., SQLAlchemy engine, redis-py client, MLflow Client, etc.).
- Each helper accepts a reuse: bool = False kwarg (default False). When True, uses Testcontainers --reuse flag to keep the container alive across pytest invocations on dev laptops (NF6 fresh-clone path doesn't use reuse).
- Each helper has a docstring linking to the matching ADR (per ADR-77 SSoT discipline).
- Container images are pinned to specific tags (no latest).
4. CI tier mapping → workflow files¶
| Tier (per S01 strategy §3) | When | Marker selector | Wall-clock budget (S02 NF1-NF3) | Workflow file |
|---|---|---|---|---|
| fast | every PR touching code | pytest -m "unit or property or contract" |
p95 ≤ 2 min in CI ; p95 ≤ 30 s on laptop | .github/workflows/ci-fast.yml (already exists ; this Story specifies the marker contract) |
| medium / integration | every PR touching subsystem code | pytest -m "cache or integration or dag_smoke" |
p95 ≤ 10 min | .github/workflows/ci-integration.yml (new in S04) |
| nightly | scheduled @ 02:00 UTC + every push to main |
pytest -m "data_quality or performance or system_e2e" (added by EB-EI Epics) |
p95 ≤ 30 min | .github/workflows/ci-nightly.yml (new in S04) |
Promotion gate (decision C any-failure-blocks) : a failure in the fast tier blocks the medium tier from running on the same SHA. A failure in the medium tier blocks the nightly tier. This is strict — no thresholds, no "5 % failure tolerated" exceptions in the foundation Epic.
xdist parallelism contract (S02 NF4) : every workflow runs pytest -n auto. A failure under -n auto that doesn't reproduce under -n 1 is a fixture isolation bug, not a flake — gets reported via the F5 flaky-test detector AND blocks the merge until fixed.
5. ADR ratification — 5 new ADRs (0084-0088) land with this Story ; ADR-0083 already accepted per S01¶
This dossier ships 5 new ADRs at status accepted. ADR-0083 was already ratified at accepted by the same gate that ratified the S01 strategy doc (committee plan_review 53d76f0f PASSED) — S03 cross-refs but does NOT re-ratify it.
| ADR # | Title | Source | Status after merge |
|---|---|---|---|
| 0083 | Test taxonomy + gate hierarchy | S01 strategy doc (PR #852) | accepted already (per S01 §11.5 ratification ; this Story extends with the §11.4 test-verdict scope reference) |
| 0084 | Foundation Epic test stack pick (pytest 8.x + xdist 3.x + Testcontainers 4.x + DVC 3.x for datasets > 10 MB) | this Story | accepted |
| 0085 | Fixture scope discipline (function default, session opt-in for containers) | this Story | accepted |
| 0086 | CI tier promotion gate — any-failure-blocks for foundation Epic | this Story | accepted |
| 0087 | Story-phase test integration : tests are first-class artefacts of every ADR-81 transition + explicit committee verdict at every gate (ratifies S01 §11 strategic invariant + S02 U7 + F7 + extends ADR-68 committee scope) | this Story | accepted |
| 0088 | Test cases + datasets versioned + provenance-tracked (test-as-code with reproducibility guarantee) (ratifies S02 F8 + NF10 + NF11 ; pins decision D = DVC + content-addressed in-repo) | this Story | accepted |
Each new ADR is a single-page document keyed on the locked decisions (A/B/C/D above) and the strategic invariants from S01/S02. The 5 new ADRs ship in the same PR as this dossier (per ADR-77 SSoT — strategy / requirements / architecture artefacts ship as a coherent unit, not split across multiple PRs).
Why ADR-0087 + ADR-0088 are structuring : they bind the process (ADR-81 8-state Story workflow + ADR-68 committee channel) to the artefacts (tests + datasets + manifests) at every gate. Without them, the test stack is just plumbing ; with them, the test stack is the memory of correctness for every Story the project ever ships. They make the test factory auditable, reproducible, and impressive to anyone who reads the manifest of a Closed Story 6 months later — every assertion the Story made about "we tested for X" can be re-executed bit-for-bit from (git_sha, dataset_sha) alone.
6. Story-phase test integration mechanism — how the matrix is enforced¶
The S01 §11 strategic invariant ("tests are first-class artefacts of every ADR-81 transition") becomes operational via the following programmatic surface :
6.1 @pytest.mark.story discoverability¶
Every test carries a @pytest.mark.story("<cvn_id>") decorator. The <cvn_id> is the OP Story id (e.g., CVN-N015-EA-S04). Mechanism :
import pytest
@pytest.mark.story("CVN-N015-EA-S04")
@pytest.mark.unit
def test_seed_all_resets_random_consistently():
...
Filtering mechanism (pytest-native correction) : pytest -m "story_<cvn_id>" does NOT work for marker arguments — pytest's -m filter operates on marker NAMES, not arguments. We ship a tiny pytest plugin (in S04 tests/conftest.py) that adds a --story <cvn_id> CLI flag and filters the collection accordingly :
# tests/conftest.py (S04 will ship this)
def pytest_addoption(parser):
parser.addoption("--story", action="store", default=None,
help="Filter tests to a single Story id (e.g., CVN-N015-EA-S04)")
def pytest_collection_modifyitems(config, items):
story = config.getoption("--story")
if not story:
return
items[:] = [
item for item in items
if any(m.name == "story" and story in m.args for m in item.iter_markers())
]
Then the operator-facing surface becomes :
# All tests for a Story (used by G5 guardrail)
pytest --collect-only -q --story CVN-N015-EA-S04
# Run a Story's full test surface
pytest --story CVN-N015-EA-S04
Why this design (not the alternative @pytest.mark.story_CVN_N015_EA_S04 per-Story marker) : per-Story markers would require declaring each marker in pyproject.toml (or a registration hook) to avoid PytestUnknownMarkWarning, which means every Story creation triggers a marker-registration commit. The plugin pattern keeps @pytest.mark.story("<cvn_id>") as the single registered marker, with the Story id carried as a marker arg — no per-Story registration needed. CR run on commit de3b2e57 flagged the original -m "story_<cvn_id>" phrasing as broken pytest semantics — this §6.1 v2 corrects it.
6.2 Per-Story manifest schema¶
Manifest at documentation/stories/<cvn_id>/tests/manifest.yaml has a fixed schema (S04 ships the JSON schema for validation) :
cvn_id: CVN-N015-EA-S04
closed_at: 2026-08-15T14:30:00Z
git_sha_at_close: 9f8e7d6c
acceptance_criteria:
- id: 1
description: "Every test under tests/<type>/ carries @pytest.mark.<type>"
proven_by:
- test_id: "tests/unit/test_marker_discipline.py::test_unit_dir_carries_unit_marker"
test_sha: 9f8e7d6c
dataset_sha: null # no dataset needed
run_id: "github-actions-12345"
outcome: PASSED
- id: 2
description: "Per-Story test count ≥ 1 per acceptance criterion"
proven_by:
- test_id: "tests/unit/test_g5_guardrail.py::test_g5_blocks_pr_without_story_tests"
test_sha: 9f8e7d6c
dataset_sha: "abc123de"
run_id: "github-actions-12347"
outcome: PASSED
The manifest is committed at Tested → Closed and immutable thereafter (NF11 + G6 CI guardrail).
6.3 G5 + G6 CI guardrails¶
- G5 (Story-phase test enforcement) : at PR merge time, parses the PR's linked OP Story (from the GH issue → OP wp link in the body), looks up the Story's current state, and asserts the test artefacts required for the next transition are present. Examples :
- PR transitions Story
In progress → Developed→ G5 assertspytest --collect-only -q --story <cvn_id>returns ≥ 1 test per acceptance criterion + datasets exist undertests/datasets/(uses the S04 pytest plugin's--storyCLI flag — pytest's native-mfilter operates on marker names not arguments ; see §6.1 for the plugin design) - PR transitions Story
Tested → Closed→ G5 assertsdocumentation/stories/<cvn_id>/tests/manifest.yamlexists + validates against the schema + every acceptance criterion has ≥ 1proven_byrow - G6 (manifest immutability after Closed) : at PR merge time, for any modified
manifest.yaml, fetches the OP Story status. If the Story isClosed, fails the merge unless the PR is an explicitamendment Story(per S01's invariant — amendments require their own Story).
Both guardrails ship in S04 alongside the existing G1-G4. Hard gates, no thresholds.
6.4 Committee verdict body schema (extends ADR-68 per S01 §11.4)¶
The pr_review session JSON's verdict object MUST contain a tests: field with 4 sub-verdicts — committee CLI (scripts/expert_committee.py) is extended in S04 to surface these sub-verdicts as a mandatory section in the prompt. Example session JSON :
{
"verdict": {
"session_type": "pr_review",
"status": "PASSED",
"tests": {
"coverage_per_acceptance_criterion": "PASSED",
"adversarial_edge_coverage": "PASSED",
"datasets_versioned_reproducible": "PASSED",
"manifest_maps_tests_to_criteria": "PASSED"
}
}
}
Any INSUFFICIENT blocks merge regardless of the rest of the verdict.
The plan_review session JSON's verdict object MUST contain a tests_strategy field (PASSED or INSUFFICIENT). INSUFFICIENT blocks In specification → Specified.
6.5 make targets — operator-facing surface¶
S04 ships the following targets :
make test-fast— runs the fast tier locally (unit + property + contract)make test-story CVN_ID=<cvn_id>— runs the full test surface for one Storymake reproduce CVN_ID=<cvn_id>— checks out the manifest's recorded SHAs, materialises datasets via DVC, runs the recorded tests, asserts outcomes match the manifest bit-for-bit (NF10 audit canary)make test-manifest CVN_ID=<cvn_id>— drafts the manifest from the latest CI run for the Story (operator reviews + commits)
These 4 targets are the whole operator surface for the test factory ; everything else is CI-driven.
6.6 The audit story — why this matters¶
6 months from now, an audit reviewer asks : "this trade lost money on 2026-09-12 — what did we test for that code path at that time ?"
With this mechanism :
1. git log --until=2026-09-12 src/commun/pipeline/inference_api.py → finds commit SHA
2. From the commit, walk the GH PR → linked OP Story → documentation/stories/<cvn_id>/tests/manifest.yaml
3. Manifest pins the (test_sha, dataset_sha, run_id) for every acceptance criterion the Story claimed
4. make reproduce CVN_ID=<cvn_id> runs the exact same tests against the exact same datasets the Story used to validate itself
No "we think it was tested" — provable, deterministic, bit-for-bit reproducible from any historic point. This is the property that makes the test factory worth building.
7. Acceptance criteria¶
| # | Criterion | Evidence |
|---|---|---|
| 1 | Plan dossier merged at documentation/reviews/2026-05-06-cvn-n015-ea-s03-architecture-plan.md |
this file |
| 2 | 5 new ADRs (0084 / 0085 / 0086 / 0087 / 0088) merged at accepted |
documentation/adr/0084-*.md … 0088-*.md |
| 3 | ADR-0083 already at accepted per S01 §11.5 ; S03 cross-refs the §11.4 test-verdict scope extension |
cross-link in §5 row 0083 |
| 4 | Committee plan_review PASSED — verdict body includes tests_strategy: PASSED field per §6.4. (No operator waiver bypass : ADR-68 makes the committee gate mandatory for plan_review on architecture Stories ; an exception would require an amendment to ADR-68 itself.) |
session JSON link |
| 5 | Every S04-S09 Story explicitly references the architecture decisions for its scope | cross-refs in S04+ dossier headers (validated when those Stories open, NOT a blocker for this PR) |
| 6 | Each library version pinned to a specific minor (no float) | §1 row A + ADR-0084 |
| 7 | Each Testcontainers helper has a warm-cache start budget documented | §3 table |
| 8 | wp#118 OP transition New → In specification → Specified |
OP audit comment trail |
| 9 | Per-Story manifest schema documented + a JSON schema file ships in S04 | §6.2 schema example |
| 10 | G5 + G6 CI guardrails documented (Story-phase enforcement + manifest immutability) ; G5+G6 implementation in S04 | §6.3 |
| 11 | Operator-facing make targets enumerated (test-fast, test-story, reproduce, test-manifest) |
§6.5 |
| 12 | DVC adopted for datasets > 10 MB ; in-repo content-addressed for ≤ 10 MB | §1 row D + ADR-0088 |
| 13 | Audit story (§6.6) is the explicit measure of the test factory's success | §6.6 narrative |
8. Out of scope (explicit)¶
- Implementation of any fixture / factory / container helper — that's S04+ work.
- CI workflow YAML files — touched by S04 (workflow scaffolding) ; this Story only defines the contract they implement.
- Per-service
conftest.pycontent — S05+ ; this Story sets the convention, doesn't fill it. - Migration of existing tests under
tests/— separate cleanup Story (S07?) ; this Story freezes the layout for new tests, doesn't refactor the existing suite. - Performance budget refinement — S02 NF1-NF3 stay as-is ; refined in a dedicated budget Story when nightly drift signals fire (per S01 §11 follow-up).
- Cross-Epic concerns (ML behaviour / data quality / system-E2E choices) — those Epics' own architecture Stories.
9. Risks¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
S01 or S02 committee plan_review mandates a change that invalidates an architecture decision here |
medium | medium | Each §1 decision has an explicit upstream trace ("driven by S02 NF4" etc.) ; if the upstream changes, the affected decision row gets re-locked in a v2 amendment. ~30 min per decision rework cost ; not a structural issue. |
| Testcontainers warm-cache start budgets (§3) prove too aggressive in CI (cold start dominates first run) | medium | low | Pre-pull container images in a dedicated CI cache layer (handled by S04 workflow file) ; budgets in §3 are steady-state, NF6 fresh-clone bound (S02) is separate. Add a budget alert in Grafana if any container exceeds its budget by >20 % over a 7-day rolling window. |
function-scope default (decision B) measurably hurts the fast-tier 2-min budget |
low | medium | xdist parallelism (S02 NF4) absorbs the per-test setup cost — measured on a representative slice before merging this dossier. If the budget gets stressed, the mitigation is to expand the session-opt-in list (not flip the default), which is a 1-line ADR-0085 amendment. |
any-failure-blocks (decision C) proves too strict if a third-party CI runner goes flaky (e.g., GitHub Actions cache corruption) |
low | low | The flaky-test detector F5 auto-categorises external-infra flakes (e.g., "image pull failed") separately from test-code flakes. Infra flakes auto-retry once at the workflow level (per pytest-rerunfailures config) ; only test-code flakes block the merge. |
| Library version pin (decision A) goes stale before next dependency update Story | medium | low | Quarterly cadence : a dedicated S0X "test stack version refresh" Story bumps minors + runs the full pyramid. Acceptance bar : zero new test failures vs the prior version. Out-of-scope here. |
| Single-operator means S04+ implementation Stories may diverge from this architecture under time pressure | medium | high | The strategy doc's invariant "downstream Stories that disagree MUST raise an amendment Story" is enforced by committee pr_review on each S04+ PR — same gate that catches this kind of drift in PR #685's REGISTRY pattern. |
| DVC bootstrapping cost (S3-compatible backend, dvc init, dvc add for existing datasets) blocks S04 timeline | medium | medium | DVC ships in S04 with exactly 1 backend (the existing MinIO Testcontainers helper) ; bootstrap is dvc init && dvc remote add -d minio s3://test-datasets (3 lines). Existing test datasets in data/ migrate one-by-one as Stories that touch them open — no big-bang migration. |
| Manifest schema (§6.2) churns over the first 2-3 Stories that adopt it | high | low | The schema lives at tests/manifests/_schema.yaml ; schema changes are amendment Stories per S01 invariant. First 3 Stories' manifests can be migrated to the new schema by a one-shot script ; after that the schema is accepted and changes require explicit ratification. |
make reproduce audit canary (NF10) is too slow to run nightly across all closed Stories |
low | low | Random sampling : 1 closed Story per nightly cycle, weighted by recency (closed within last 30 days has 5× the probability of older closures). Full sweep monthly via a dedicated workflow. Auto-files GH issue if any reproducer fails. |
| Committee verdict body schema (§6.4) creates a JSON-schema dependency that ossifies the committee CLI | low | medium | Schema lives in committee/sessions/_verdict_schema.yaml ; CLI validates against it post-session. New verdict fields require an amendment Story to ADR-0087 (not a hot patch). |
10. Sequencing + dependencies¶
S01 (strategy, wp#116, PR #852) ──┐
├──► S02 (requirements, wp#117, PR #856) ──┐
│ ├──► S03 (this Story, wp#118)
└────────────────────────────────────────────┘ plan_review committee
PR open + CR + merge
wp#118 New → In spec → Specified → In progress → Developed → Closed
│
└──► S04 (impl : conftest + factories + fast-tier workflow)
└──► S05 (impl : Testcontainers helpers per service)
└──► S06 (impl : flaky-test detector — uses #756)
└──► S07 (impl : migrate existing suite to new layout)
└──► S08 (impl : nightly tier workflow)
└──► S09 (impl : per-tier Grafana dashboard wiring)
S03 PR can land BEFORE S01 + S02 PRs merge (S03 references S01/S02 dossiers via the PR branch paths — same pattern S02 used for S01).
S03 closure requires S01 + S02 to have merged (the trace requires the upstream dossiers to exist on main).
Single-WIP rule (per ADR-69) : S03 sits in the `Specified` buffer once committee `plan_review` PASSES, behind S02 in priority. Currently `In progress` Stories : wp#103 (S14 Track 1 leakage), wp#46 (S07 Track 2), wp#45 (S06 Track 11). S03 doesn't compete for capacity until one of those + S02 close.
11. References¶
- OP wp#118 / GH #838 : this Story's tracking
- S01 deliverables (PR #852 merged squash 8b7a4d5b ; wp#116
Closed) : - Strategy doc :
../strategy/CVN-N015-test-strategy.md(statusaccepted) - S01 ADR-0083 :
../adr/0083-test-taxonomy-and-gate-hierarchy.md(merged via PR #852 ; statusaccepted— same gate as the S01 strategy doc, confirmed in §5 of this dossier) - S02 requirements dossier :
2026-05-06-cvn-n015-ea-s02-requirements-plan.md(merged on main via PR #856 squash 47d13d48 ; wp#117Closed) - Testcontainers integration Story : #757 — §3 conventions inherit its scope
- Pytest factories Story : #586 — §2 directory layout inherits its scope
- Flaky-test detector Story : #756 — §4 promotion gate's flake handling delegates to it
- ADR-58 (FTF guardrails) — drives the unit + cache marker discipline
- ADR-68 (committee = default review channel)
- ADR-77 (MkDocs SSoT — this dossier + 5 new ADRs (0084-0088) are the SSoT for foundation Epic architecture)
- ADR-81 (8-state Story workflow)
12. Plan-review questions for committee¶
- Decision A pin policy : we pin to a stable minor (e.g., pytest 8.3.x) and bump only via amendment Story. Is the quarterly cadence sustainable for a single operator, or should we adopt an automated dependabot-with-test-suite-gate that auto-merges minor bumps when the full pyramid passes ?
- Decision B opt-in list : we explicitly opt session-scope for the 5 Testcontainers + heavy MLflow/PG fixtures. Is the list complete, or are there other expensive fixtures (e.g., a pre-trained xgboost model loaded from disk) that should also be session-scoped from day 1 ?
- Decision C strictness across Epics : we say "any-failure-blocks for foundation Epic, threshold-based for ML behaviour / system-E2E (other Epics)". Does committee endorse this split, or do we want a single semantic across all Epics (foundation strict + Epics flagged with
@pytest.mark.allow_thresholdfor stochastic tests) ? - §3 warm-cache budgets realism : the Airflow scheduler at < 8 s warm is the riskiest budget — DAG bag parse on macOS with our 30+ DAGs can blow this. Should we measure on a representative
dag-bagsnapshot before merging, or accept the budget with a 1-month "calibrate and refine" follow-up ? - §5 ADR ship-as-bundle vs split-PRs : we land 5 new ADRs (0084/0085/0086/0087/0088) in the same PR as this dossier. Is the bundle clearer (one merge = one architectural moment), or do reviewers prefer 1 PR per ADR for finer-grained CR + revert ?
- Layout migration scope (§8 out-of-scope) : we explicitly defer migrating the existing
tests/suite to a separate Story. Is "new tests follow new layout, old tests stay where they are until S07 cleanup" the right operational story, or should we mandate migrate-on-touch (whoever modifies an old test relocates it) ? - ADR-0087 committee verdict scope (§6.4) : we extend ADR-68 to mandate a
tests:sub-verdict object on everypr_review. Does the committee endorse the 4 sub-verdicts (coverage / adversarial+edge / datasets versioned / manifest mapping) as the right partition, or are there other dimensions worth pinning (e.g., flakiness budget, mutation score, performance regression check) ? - ADR-0088 DVC vs git-lfs vs roll-our-own : decision D picks DVC for > 10 MB datasets. The 10 MB threshold is gut-feel — should we measure git's behaviour on a representative dataset mix before pinning, or accept 10 MB as a starting point with a 1-month calibration follow-up ?
- G5 Story-state lookup mechanism (§6.3) : G5 needs to know the OP Story's current state at PR merge time to decide which transition is being attempted. Options : (a) parse the OP wp via the OP API at CI time (adds an external dependency to the merge gate), (b) require the PR template to declare the intended next state explicitly in a machine-readable section. Operator preference ?
- NF10 audit canary scope (§6.6) : we propose a nightly random sampling of 1 closed Story for
make reproduce. Is that the right cadence, or should we instead trigger reproductions on events (e.g., a Story closed > 60 days ago hasn't been reproduced — promote to canary) ? - Manifest immutability vs amendment Stories (§6.3 G6 + S02 NF11) : we say a
ClosedStory's manifest is immutable, amendments require a new Story. Is this too strict for typo fixes / link rot updates ? Should we have an "errata" mechanism that allows minor corrections without a full Story (with an expliciterratalog committed alongside the manifest) ?