ADR-0088 — Test cases + datasets versioned + provenance-tracked (test-as-code with reproducibility guarantee)¶

Status: accepted (committee plan_review session 94aa2881 PASSED 2026-05-06 ; ratified together with the S03 architecture dossier per the same gate) Date: 2026-05-06 Introduced by: CVN-N015-EA-S03 / GH issue #838 / OP wp#118 Ratifies: S02 requirements F8 + NF10 + NF11 + S03 architecture decision D Companion: ADR-0087 (Story-phase test integration — this ADR is the data-side complement)

Context¶

ADR-0087 says tests are first-class artefacts of every ADR-81 transition. But a test without its data is half a test : if the dataset that the test ran against isn't pinned + reproducible, the manifest's proven_by claim becomes "we believe the test passed" instead of "we can prove it passed by re-running it now".

Test datasets in CVNTrade range from <1 KB synthetic OHLCV windows (factory-generated, deterministic by seed) to multi-MB pre-trained model artefacts (xgboost / lightgbm / catboost pickles) to occasional > 10 MB FTF result snapshots. Without a versioning + provenance contract, every test that uses a non-trivial dataset becomes a hidden time bomb : the dataset on disk drifts, the test silently passes against the new state, and an audit reviewer 6 months later cannot reconstruct what the dataset looked like at proof time.

S02 NF10 requires reproducibility within ε-tolerance from (git_sha, dataset_sha, deps_lock_sha). This ADR specifies the dataset versioning + provenance half of that triple.

Decision¶

Invariant 1 — every test that uses non-factory data MUST cite a versioned dataset¶

Tests fall into 2 buckets : - Factory-generated data : produced by a tests/factories/<shape>.py builder with a seed parameter. Reproducibility = re-run the factory with the same seed. No dataset versioning needed (the factory IS the version). - Fixture data : OHLCV snapshots, model artefacts, FTF results, golden outputs from prior runs. Reproducibility = check out the recorded dataset_sha from tests/datasets/. Dataset versioning is mandatory.

A test reading a file from outside tests/datasets/ (or tests/cases/ for YAML test cases) is forbidden ; CI fails the PR if it detects such a read (S04 deliverable : pytest plugin or strace-based check).

Invariant 2 — split by size : DVC for > 10 MB, content-addressed in-repo for ≤ 10 MB¶

Size threshold	Storage	Naming convention	Pointer in git
≤ 10 MB	in-repo under `tests/datasets/small/<cvn_id>/`	`<name>.<sha8>.parquet` (content hash in filename)	the file itself, hash baked into name
> 10 MB	DVC-tracked, MinIO S3-compatible backend	DVC default	`tests/datasets/large/<cvn_id>/<name>.dvc` pointer file

The 10 MB threshold matches the practical git-friendliness limit (above which git status slows visibly on macOS — the operator's primary dev box). For ≤ 10 MB, in-repo with content-addressed naming gives atomic git-tracked provenance without external infra. For > 10 MB, DVC handles cache + reproduce + push/pull semantics that we'd otherwise re-build (per ADR-0084 decision D rationale).

git-lfs ruled out as primary — bandwidth costs become non-trivial at scale and LFS pointers don't carry the lineage metadata DVC tracks. Roll-our-own ruled out — DVC's existing tooling is mature and standard.

Invariant 3 — test cases as data, not as Python¶

Test-case YAML files live under tests/cases/<cvn_id>/ and are consumed via pytest.parametrize (or equivalent fixture). Schema lives at tests/cases/_schema.yaml (S04 deliverable) ; CI validates each YAML against the schema.

Why YAML cases (not just Python parametrize lists) : - A test case is data (input × expected output × maybe ε-tolerance) — keeping it in YAML separates the data-driven scenarios from the test code's assertion logic - An audit reviewer reading the YAML can answer "what was the test asserting at run time" without parsing Python AST - The YAML files' git history becomes the test-case audit trail (committee pr_review reviews YAML diffs in the same way it reviews code diffs)

Hypothesis-style property tests stay in Python (their value IS the property declaration, not the cases). This invariant covers data-driven tests only.

Invariant 4 — manifest pins the data triple¶

Per ADR-0087 Invariant 1, the per-Story manifest at documentation/stories/<cvn_id>/tests/manifest.yaml pins for each acceptance criterion :

acceptance_criteria:
  - id: 1
    description: "..."
    proven_by:
      - test_id: "tests/unit/test_x.py::test_y"
        test_sha: <git_sha_of_test_file>
        dataset_sha: <hash_of_dataset_OR_null_if_factory>
        run_id: <CI_run_id_that_proved_it>
        outcome: PASSED

ε-tolerance bands (per S02 NF10) are declared in the manifest schema : - exact for categoricals (PASSED/FAILED, predicted class) - ε ≤ 1e-6 for float metrics (f1_buy, sortino, return %) - ε ≤ 0.1 % for cumulative wealth simulations

make reproduce CVN_ID=<cvn_id> checks out the recorded SHAs, materialises the datasets (via DVC pull or in-repo file at that SHA), restores the deps lock, runs pytest --story <cvn_id> (per ADR-0087 Invariant 4), and asserts the recorded outcomes match within their declared ε bands.

Invariant 5 — manifest immutability after Closed (G6 CI guardrail)¶

Once a Story transitions to Closed, its manifest.yaml is immutable. Any future commit modifying a Closed Story's manifest is rejected by a CI guardrail (G6 — S04 deliverable). Mechanism (self-contained, no OP custom field required) : the guardrail walks git log to find the last commit that modified documentation/stories/<cvn_id>/tests/manifest.yaml BEFORE the OP Story transitioned to Closed (Story closure timestamp pulled from the OP REST /api/v3/work_packages/<wp_id>/activities endpoint — public API, no custom field). Any later commit modifying that file fails CI unless the commit message contains an explicit amendment-story: <new-cvn-id> trailer pointing to a fresh Story. Amendments require an explicit amendment Story per S01's invariant — no in-place edits.

(Errata mechanism for typos / link rot updates : open question in S03 §12 plan-review question #11 ; not in scope of this ADR — would be an amendment if the operator decides errata is worth the complexity.)

Consequences¶

Positive : - Every closed Story's claim of "we tested for X" is provable, not asserted — make reproduce gives bit-exact-within-ε reproduction - Audit reviewer 6 months from now can reconstruct the dataset state at proof time without forensic git archaeology - Test cases as YAML separate data-driven scenarios from assertion logic — easier review + clearer audit - Manifest immutability after Closed prevents silent retroactive editing of "what we tested"

Negative / risks : - DVC bootstrap cost in S04 (one-time dvc init && dvc remote add) ; mitigated by additive migration policy (existing test datasets in data/ migrate one-by-one as Stories that touch them open — no big-bang) - Manifest schema (S04 deliverable) churns over the first 2-3 Stories that adopt it ; mitigated by keeping the schema at tests/manifests/_schema.yaml with amendment-Story discipline ; first 3 Stories' manifests can be migrated by a one-shot script - make reproduce audit canary (NF10) is too slow to run nightly across all closed Stories ; mitigated by random sampling weighted by recency (1 closed Story per nightly cycle, 5× probability for closed within last 30 days) + monthly full sweep - ε-tolerance bands are first-pass approximations ; mitigated by amendment Story when nightly canary surfaces a band that's too tight (reproducer auto-files GH issue tagged audit-regression, operator decides whether to widen the band or fix the non-determinism)

Cross-references : - S02 requirements F8 + NF10 + NF11 (the contracts this ADR ratifies) - S03 architecture dossier §1 row D + §2 directory layout + §6 mechanism - ADR-0087 Story-phase test integration (the process-side complement) - ADR-0084 foundation test stack pick (DVC 3.x is in the locked stack) - ADR-23 features version-pinned, fail-fast (precedent for "data must carry version") - ADR-79 FTF Story closure 8-step (the prior pattern for "Story closure produces a verdict artefact" — this ADR generalises that to all Stories, not just FTF ones) - ADR-81 (8-state Story workflow — Invariant 5 G6 immutability gates trigger on the Closed state transition, with closure timestamp pulled from the OP REST activities endpoint) - ADR-77 (MkDocs SSoT — this ADR's artefacts under documentation/stories/<cvn_id>/tests/ and tests/cases/_schema.yaml are subject to SSoT discipline)