Skip to content

CVN-N001-EK-S02 — Test strategy

Story artifact required by ADR-0101. S02's deliverable is a set of derivations (or typed INFEASIBLE), so "test" here means validating the derivations — reproducibility, evidence grading, contract completeness, absence of snooping — and that the guardrails fail correctly when bypassed (negative tests), plus the enforcement tests S03/S04 must implement.

1. What is under test in S02

Not runtime code — the derived charter values + their provenance, and the discipline that produced them. Validation = (a) each value satisfies the plan DoD, (b) it is reproducible from its signed provenance, (c) no analysis-only / anti-snooping boundary was breached, and (d) a violation of any of these is detected and blocked (§4 negative tests).

2. Plan-review baseline (not re-validated here)

The D2 plan has already passed committee plan_review in Meeting #273 (strong consensus, 5/5). This test strategy does not re-validate the plan decision. It validates the S02 implementation outputs: the derived charter values, typed INFEASIBLE records, provenance, reproducibility, and boundary compliance. Only V1–V16 (§3) are S02 implementation-acceptance checks; the negative checks (§4) are mandatory alongside them.

3. S02 validation table (implementation acceptance — runnable on the outputs)

# Check How Pass criterion
V1 Reference capacity by the §9.1 rule review derivation rule pre-specified before cost; rejected alternatives recorded; non-deployment labelled
V2 P90 cost tier + downstream wiring review tier ∈ {A,B,C,D}; only A/B lockable; Tier-C completes S02 only as labelled non-lockable + marks S03 blocked; Tier-D → INFEASIBLE-cost-data
V3 Tier-B bounded review adjustment model error analysis + stress factor present; else downgraded to C
V4 E_econ_minE_pred_min inspect values two distinct stored values + documented mapping
V5 Mapping monotonicity review monotonicity/stability documented; else metric not used as primary gate
V6 Power feasibility review power report MDE_available computed; compared to E_pred_min; N_min if underpowered
V7 Power-sim contract complete checklist vs §11.1 block design · deps · purge/embargo · stat · reps · seed · sensitivity present
V8 Null-gate justified review §12 candidates compared; primary = most-conservative-valid; invalidity rationale recorded
V9 INFEASIBLE typed review single verdict + reason + required artifact + next action (§15)
V10 Signed derivations provenance check every value has the §16 fields (path/SHA/hashes/code/params/author/timestamp/repro)
V11 Reproducibility (tolerance pre-declared) re-run from provenance tolerance declared in the derivation record before rerun; deterministic → exact reproduction (numerical tolerance ≤ ±0.1% only where float non-determinism is documented); stochastic → declared seed + replications + CI tolerance + acceptable numerical drift, re-run within it
V12 Anti-snooping (choice provenance) audit choices vs §8.1 each calibration choice (capacity · primary metric · null-gate · budget · universe · label/horizon) has a non-performance rationale, or any exploratory influence is recorded as prior rationale + tuple-budgeted. "Exploratory influence" includes any post-hoc adjustment of a calibration parameter based on an observed outcome, not only the initial choice
V13 No run performed audit no training / sweep / Airflow launch / Phase-2 run executed (analysis-only attested)
V14 Docs build mkdocs build --strict green, no orphan, tables render
V15 Risk-owner Tier-C boundary inspect handoff / risk note risk-owner approval (if any) carries Tier-C only as non-lockable context; it does not convert Tier-C into lockable P90 evidence
V16 INFEASIBLE semantics review verdict record typed INFEASIBLE recorded as a successful S02 outcome, S03 blocked, allowed remediation listed; no placeholder written

4. Negative / violation checks (the guardrails must break correctly)

# Violation Expected result
N1 A charter value has no signed derivation value rejected; cannot feed S03
N2 Tier-C cost is marked lockable validation fails; S03 blocked
N3 Tier-D / placeholder cost is used typed INFEASIBLE-cost-data required; Tier-D cannot be carried even as non-lockable context (unlike Tier-C)
N4 Exploratory outcomes influenced capacity / metric / null-gate / universe / action policy / budget without prior-rationale registration anti-snooping violation; affected derivation invalid
N5 MLflow run id cited as training / predictive-run evidence validation fails (MLflow is provenance-only)
N6 Power sim lacks seed / replications / block design / purge-embargo mechanics / sensitivity V7 fails
N7 Primary null relies only on a diagnostic / random-entry null V8 fails
N8 Any Airflow launch / training job / cluster job / Phase-2 predictive run occurred analysis-only attestation fails; S02 invalidated + escalated

5. Downstream enforcement tests (contractual; built S03/S04)

Future test Owner Trigger Blocking condition Evidence required
Tier-A/B lockability gate S03 charter-lock attempt any cost value Tier-C/D or missing tier signed cost derivation at Tier A/B
Charter immutability S03 post-lock edit edit without a recorded re-lock re-lock record + joint sign-off
Reference-capacity non-deployment guard S03 capacity referenced capacity read as deployment/AUM non-deployment label on the value
Power-contract presence S04 Phase-2 run start missing locked E_pred_min / null-gate locked S02/S03 power values
Provenance integrity S03/S04 value used value not resolvable to immutable signed derivation provenance record

6. Validation evidence matrix (required in the implementation PR)

The implementation PR MUST include one evidence row per V- and N-check:

Check ID Evidence artifact / link Reviewer Result Notes
V1 pass/fail/n/a
N1 pass/fail/n/a

7. Test data / fixtures

Validation uses existing, read-only inputs only: existing OHLCV cache · existing ATR-H4 labels · existing trade/cost logs · signed derivation artifacts · plan / architecture / runbook references. No synthetic predictive outputs and no fabricated run-like fixtures — any artifact resembling a model run violates the analysis-only boundary (N8).

8. Non-applicable

Performance / load / integration testing is N/A for S02 (no runtime code) — rationale recorded per ADR-0101 Invariant 3; reviewer to accept. N/A does not mean untested: S02 validation is documentary / provenance / control validation (§3–§4). Any runtime/integration test discovered as necessary belongs to S04+ and must not be executed under S02.

9. Definition of test-done (S02)

V1–V16 and N1–N8 pass (or the corresponding typed INFEASIBLE is recorded with its artifact); the implementation PR includes the validation evidence matrix (§6) with one row per check; the downstream enforcement table (§5) is carried into the S03/S04 plans so each contract has an owner.