CVN-N001-EK-S02 — Test strategy¶
Story artifact required by ADR-0101. S02's deliverable is a set of derivations (or typed
INFEASIBLE), so "test" here means validating the derivations — reproducibility, evidence grading, contract completeness, absence of snooping — and that the guardrails fail correctly when bypassed (negative tests), plus the enforcement tests S03/S04 must implement.
1. What is under test in S02¶
Not runtime code — the derived charter values + their provenance, and the discipline that produced them. Validation = (a) each value satisfies the plan DoD, (b) it is reproducible from its signed provenance, (c) no analysis-only / anti-snooping boundary was breached, and (d) a violation of any of these is detected and blocked (§4 negative tests).
2. Plan-review baseline (not re-validated here)¶
The D2 plan has already passed committee plan_review in Meeting #273
(strong consensus, 5/5). This test strategy does not re-validate the plan decision. It validates the S02
implementation outputs: the derived charter values, typed INFEASIBLE records, provenance, reproducibility,
and boundary compliance. Only V1–V16 (§3) are S02 implementation-acceptance checks; the negative checks
(§4) are mandatory alongside them.
3. S02 validation table (implementation acceptance — runnable on the outputs)¶
| # | Check | How | Pass criterion |
|---|---|---|---|
| V1 | Reference capacity by the §9.1 rule | review derivation | rule pre-specified before cost; rejected alternatives recorded; non-deployment labelled |
| V2 | P90 cost tier + downstream wiring | review | tier ∈ {A,B,C,D}; only A/B lockable; Tier-C completes S02 only as labelled non-lockable + marks S03 blocked; Tier-D → INFEASIBLE-cost-data |
| V3 | Tier-B bounded | review adjustment model | error analysis + stress factor present; else downgraded to C |
| V4 | E_econ_min ≠ E_pred_min |
inspect values | two distinct stored values + documented mapping |
| V5 | Mapping monotonicity | review | monotonicity/stability documented; else metric not used as primary gate |
| V6 | Power feasibility | review power report | MDE_available computed; compared to E_pred_min; N_min if underpowered |
| V7 | Power-sim contract complete | checklist vs §11.1 | block design · deps · purge/embargo · stat · reps · seed · sensitivity present |
| V8 | Null-gate justified | review §12 | candidates compared; primary = most-conservative-valid; invalidity rationale recorded |
| V9 | INFEASIBLE typed |
review | single verdict + reason + required artifact + next action (§15) |
| V10 | Signed derivations | provenance check | every value has the §16 fields (path/SHA/hashes/code/params/author/timestamp/repro) |
| V11 | Reproducibility (tolerance pre-declared) | re-run from provenance | tolerance declared in the derivation record before rerun; deterministic → exact reproduction (numerical tolerance ≤ ±0.1% only where float non-determinism is documented); stochastic → declared seed + replications + CI tolerance + acceptable numerical drift, re-run within it |
| V12 | Anti-snooping (choice provenance) | audit choices vs §8.1 | each calibration choice (capacity · primary metric · null-gate · budget · universe · label/horizon) has a non-performance rationale, or any exploratory influence is recorded as prior rationale + tuple-budgeted. "Exploratory influence" includes any post-hoc adjustment of a calibration parameter based on an observed outcome, not only the initial choice |
| V13 | No run performed | audit | no training / sweep / Airflow launch / Phase-2 run executed (analysis-only attested) |
| V14 | Docs build | mkdocs build --strict |
green, no orphan, tables render |
| V15 | Risk-owner Tier-C boundary | inspect handoff / risk note | risk-owner approval (if any) carries Tier-C only as non-lockable context; it does not convert Tier-C into lockable P90 evidence |
| V16 | INFEASIBLE semantics |
review verdict record | typed INFEASIBLE recorded as a successful S02 outcome, S03 blocked, allowed remediation listed; no placeholder written |
4. Negative / violation checks (the guardrails must break correctly)¶
| # | Violation | Expected result |
|---|---|---|
| N1 | A charter value has no signed derivation | value rejected; cannot feed S03 |
| N2 | Tier-C cost is marked lockable | validation fails; S03 blocked |
| N3 | Tier-D / placeholder cost is used | typed INFEASIBLE-cost-data required; Tier-D cannot be carried even as non-lockable context (unlike Tier-C) |
| N4 | Exploratory outcomes influenced capacity / metric / null-gate / universe / action policy / budget without prior-rationale registration | anti-snooping violation; affected derivation invalid |
| N5 | MLflow run id cited as training / predictive-run evidence | validation fails (MLflow is provenance-only) |
| N6 | Power sim lacks seed / replications / block design / purge-embargo mechanics / sensitivity | V7 fails |
| N7 | Primary null relies only on a diagnostic / random-entry null | V8 fails |
| N8 | Any Airflow launch / training job / cluster job / Phase-2 predictive run occurred | analysis-only attestation fails; S02 invalidated + escalated |
5. Downstream enforcement tests (contractual; built S03/S04)¶
| Future test | Owner | Trigger | Blocking condition | Evidence required |
|---|---|---|---|---|
| Tier-A/B lockability gate | S03 | charter-lock attempt | any cost value Tier-C/D or missing tier | signed cost derivation at Tier A/B |
| Charter immutability | S03 | post-lock edit | edit without a recorded re-lock | re-lock record + joint sign-off |
| Reference-capacity non-deployment guard | S03 | capacity referenced | capacity read as deployment/AUM | non-deployment label on the value |
| Power-contract presence | S04 | Phase-2 run start | missing locked E_pred_min / null-gate |
locked S02/S03 power values |
| Provenance integrity | S03/S04 | value used | value not resolvable to immutable signed derivation | provenance record |
6. Validation evidence matrix (required in the implementation PR)¶
The implementation PR MUST include one evidence row per V- and N-check:
| Check ID | Evidence artifact / link | Reviewer | Result | Notes |
|---|---|---|---|---|
| V1 | pass/fail/n/a | |||
| … | ||||
| N1 | pass/fail/n/a | |||
| … |
7. Test data / fixtures¶
Validation uses existing, read-only inputs only: existing OHLCV cache · existing ATR-H4 labels · existing trade/cost logs · signed derivation artifacts · plan / architecture / runbook references. No synthetic predictive outputs and no fabricated run-like fixtures — any artifact resembling a model run violates the analysis-only boundary (N8).
8. Non-applicable¶
Performance / load / integration testing is N/A for S02 (no runtime code) — rationale recorded per
ADR-0101 Invariant 3; reviewer to accept. N/A does not mean untested: S02 validation is
documentary / provenance / control validation (§3–§4). Any runtime/integration test discovered as necessary
belongs to S04+ and must not be executed under S02.
9. Definition of test-done (S02)¶
V1–V16 and N1–N8 pass (or the corresponding typed INFEASIBLE is recorded with its artifact); the
implementation PR includes the validation evidence matrix (§6) with one row per check; the downstream
enforcement table (§5) is carried into the S03/S04 plans so each contract has an owner.