ADR-0088 — Test cases + datasets versioned + provenance-tracked (test-as-code with reproducibility guarantee)¶
Status: accepted (committee plan_review session 94aa2881 PASSED 2026-05-06 ; ratified together with the S03 architecture dossier per the same gate)
Date: 2026-05-06
Introduced by: CVN-N015-EA-S03 / GH issue #838 / OP wp#118
Ratifies: S02 requirements F8 + NF10 + NF11 + S03 architecture decision D
Companion: ADR-0087 (Story-phase test integration — this ADR is the data-side complement)
Context¶
ADR-0087 says tests are first-class artefacts of every ADR-81 transition. But a test without its data is half a test : if the dataset that the test ran against isn't pinned + reproducible, the manifest's proven_by claim becomes "we believe the test passed" instead of "we can prove it passed by re-running it now".
Test datasets in CVNTrade range from <1 KB synthetic OHLCV windows (factory-generated, deterministic by seed) to multi-MB pre-trained model artefacts (xgboost / lightgbm / catboost pickles) to occasional > 10 MB FTF result snapshots. Without a versioning + provenance contract, every test that uses a non-trivial dataset becomes a hidden time bomb : the dataset on disk drifts, the test silently passes against the new state, and an audit reviewer 6 months later cannot reconstruct what the dataset looked like at proof time.
S02 NF10 requires reproducibility within ε-tolerance from (git_sha, dataset_sha, deps_lock_sha). This ADR specifies the dataset versioning + provenance half of that triple.
Decision¶
Invariant 1 — every test that uses non-factory data MUST cite a versioned dataset¶
Tests fall into 2 buckets :
- Factory-generated data : produced by a tests/factories/<shape>.py builder with a seed parameter. Reproducibility = re-run the factory with the same seed. No dataset versioning needed (the factory IS the version).
- Fixture data : OHLCV snapshots, model artefacts, FTF results, golden outputs from prior runs. Reproducibility = check out the recorded dataset_sha from tests/datasets/. Dataset versioning is mandatory.
A test reading a file from outside tests/datasets/ (or tests/cases/ for YAML test cases) is forbidden ; CI fails the PR if it detects such a read (S04 deliverable : pytest plugin or strace-based check).
Invariant 2 — split by size : DVC for > 10 MB, content-addressed in-repo for ≤ 10 MB¶
| Size threshold | Storage | Naming convention | Pointer in git |
|---|---|---|---|
| ≤ 10 MB | in-repo under tests/datasets/small/<cvn_id>/ |
<name>.<sha8>.parquet (content hash in filename) |
the file itself, hash baked into name |
| > 10 MB | DVC-tracked, MinIO S3-compatible backend | DVC default | tests/datasets/large/<cvn_id>/<name>.dvc pointer file |
The 10 MB threshold matches the practical git-friendliness limit (above which git status slows visibly on macOS — the operator's primary dev box). For ≤ 10 MB, in-repo with content-addressed naming gives atomic git-tracked provenance without external infra. For > 10 MB, DVC handles cache + reproduce + push/pull semantics that we'd otherwise re-build (per ADR-0084 decision D rationale).
git-lfs ruled out as primary — bandwidth costs become non-trivial at scale and LFS pointers don't carry the lineage metadata DVC tracks. Roll-our-own ruled out — DVC's existing tooling is mature and standard.
Invariant 3 — test cases as data, not as Python¶
Test-case YAML files live under tests/cases/<cvn_id>/ and are consumed via pytest.parametrize (or equivalent fixture). Schema lives at tests/cases/_schema.yaml (S04 deliverable) ; CI validates each YAML against the schema.
Why YAML cases (not just Python parametrize lists) :
- A test case is data (input × expected output × maybe ε-tolerance) — keeping it in YAML separates the data-driven scenarios from the test code's assertion logic
- An audit reviewer reading the YAML can answer "what was the test asserting at run time" without parsing Python AST
- The YAML files' git history becomes the test-case audit trail (committee pr_review reviews YAML diffs in the same way it reviews code diffs)
Hypothesis-style property tests stay in Python (their value IS the property declaration, not the cases). This invariant covers data-driven tests only.
Invariant 4 — manifest pins the data triple¶
Per ADR-0087 Invariant 1, the per-Story manifest at documentation/stories/<cvn_id>/tests/manifest.yaml pins for each acceptance criterion :
acceptance_criteria:
- id: 1
description: "..."
proven_by:
- test_id: "tests/unit/test_x.py::test_y"
test_sha: <git_sha_of_test_file>
dataset_sha: <hash_of_dataset_OR_null_if_factory>
run_id: <CI_run_id_that_proved_it>
outcome: PASSED
ε-tolerance bands (per S02 NF10) are declared in the manifest schema : - exact for categoricals (PASSED/FAILED, predicted class) - ε ≤ 1e-6 for float metrics (f1_buy, sortino, return %) - ε ≤ 0.1 % for cumulative wealth simulations
make reproduce CVN_ID=<cvn_id> checks out the recorded SHAs, materialises the datasets (via DVC pull or in-repo file at that SHA), restores the deps lock, runs pytest --story <cvn_id> (per ADR-0087 Invariant 4), and asserts the recorded outcomes match within their declared ε bands.
Invariant 5 — manifest immutability after Closed (G6 CI guardrail)¶
Once a Story transitions to Closed, its manifest.yaml is immutable. Any future commit modifying a Closed Story's manifest is rejected by a CI guardrail (G6 — S04 deliverable). Mechanism (self-contained, no OP custom field required) : the guardrail walks git log to find the last commit that modified documentation/stories/<cvn_id>/tests/manifest.yaml BEFORE the OP Story transitioned to Closed (Story closure timestamp pulled from the OP REST /api/v3/work_packages/<wp_id>/activities endpoint — public API, no custom field). Any later commit modifying that file fails CI unless the commit message contains an explicit amendment-story: <new-cvn-id> trailer pointing to a fresh Story. Amendments require an explicit amendment Story per S01's invariant — no in-place edits.
(Errata mechanism for typos / link rot updates : open question in S03 §12 plan-review question #11 ; not in scope of this ADR — would be an amendment if the operator decides errata is worth the complexity.)
Consequences¶
Positive :
- Every closed Story's claim of "we tested for X" is provable, not asserted — make reproduce gives bit-exact-within-ε reproduction
- Audit reviewer 6 months from now can reconstruct the dataset state at proof time without forensic git archaeology
- Test cases as YAML separate data-driven scenarios from assertion logic — easier review + clearer audit
- Manifest immutability after Closed prevents silent retroactive editing of "what we tested"
Negative / risks :
- DVC bootstrap cost in S04 (one-time dvc init && dvc remote add) ; mitigated by additive migration policy (existing test datasets in data/ migrate one-by-one as Stories that touch them open — no big-bang)
- Manifest schema (S04 deliverable) churns over the first 2-3 Stories that adopt it ; mitigated by keeping the schema at tests/manifests/_schema.yaml with amendment-Story discipline ; first 3 Stories' manifests can be migrated by a one-shot script
- make reproduce audit canary (NF10) is too slow to run nightly across all closed Stories ; mitigated by random sampling weighted by recency (1 closed Story per nightly cycle, 5× probability for closed within last 30 days) + monthly full sweep
- ε-tolerance bands are first-pass approximations ; mitigated by amendment Story when nightly canary surfaces a band that's too tight (reproducer auto-files GH issue tagged audit-regression, operator decides whether to widen the band or fix the non-determinism)
Cross-references :
- S02 requirements F8 + NF10 + NF11 (the contracts this ADR ratifies)
- S03 architecture dossier §1 row D + §2 directory layout + §6 mechanism
- ADR-0087 Story-phase test integration (the process-side complement)
- ADR-0084 foundation test stack pick (DVC 3.x is in the locked stack)
- ADR-23 features version-pinned, fail-fast (precedent for "data must carry version")
- ADR-79 FTF Story closure 8-step (the prior pattern for "Story closure produces a verdict artefact" — this ADR generalises that to all Stories, not just FTF ones)
- ADR-81 (8-state Story workflow — Invariant 5 G6 immutability gates trigger on the Closed state transition, with closure timestamp pulled from the OP REST activities endpoint)
- ADR-77 (MkDocs SSoT — this ADR's artefacts under documentation/stories/<cvn_id>/tests/ and tests/cases/_schema.yaml are subject to SSoT discipline)