CVN-N001-EK-S02 — Runbook (operator manual)¶
Story artifact required by ADR-0101. For an analysis Story the runbook is the operator manual: pre-flight → execution → artifact → error → rollback → handoff. It operationalises the control invariants of architecture §5; the test strategy verifies these guardrails break correctly.
1. Roles¶
| Role | Authority in S02 |
|---|---|
| Analyst / operator | runs read-only derivations; produces signed artifacts; launcher of nothing (no Airflow / training / cluster) |
| Methodology reviewer | reviews derivation validity, reproducibility, anti-snooping compliance, power/null/mapping method, and whether any INFEASIBLE reason is correctly assigned (the plan_review is already PASSED, Meeting #273 — not re-done here) |
| Risk owner | reviews cost tier + reference-capacity non-deployment framing. May approve carrying a Tier-C bound forward as non-lockable context only. Cannot validate scientific signal, cannot authorize S04, and cannot convert non-lockable (Tier-C) evidence into lockable (Tier-A/B) evidence |
2. Operational invariants & violation handling¶
| Invariant | If violated → operator action |
|---|---|
| Analysis-only — no training / Airflow / cluster / Phase-2 run / model selection | STOP, record nothing as a value, escalate (§4), mark the derivation invalidated |
| Read-only inputs | discard; re-run from approved read-only inputs |
| Exploratory results are context-only (§8.1) | influenced choice recorded as prior rationale + tuple-budgeted, or derivation invalidated |
| Every value carries a signed derivation | unsigned value not written into the charter; re-derive with provenance |
| Tier-C cost is non-lockable | record provisional + S03-blocked; never mark lockable (§3 Tier-C rule) |
E_econ_min ≠ E_pred_min |
reject mapping derivation; redo distinct |
| MLflow = provenance-only | never cite an MLflow run id as training/predictive evidence |
Typed INFEASIBLE over placeholder |
record the typed INFEASIBLE, never a placeholder |
3. Pre-flight checklist (before any derivation)¶
- OpenProject wp#272 is the active Story record; the current owner is assigned.
- The D2 plan_review has PASSED (Meeting #273) and the runbook/architecture revisions match the plan revision.
- The working-tree commit SHA is known and will be recorded.
- Inputs are existing & read-only: OHLCV cache · ATR-H4 labels · trade/cost logs · exploratory results as context only.
- No Airflow DAG, training job, cluster launcher, Phase-2 predictive run, model comparison, or threshold selection will be invoked.
- The output artifact directory + naming convention (§5) are known.
- The reviewer for the derivation is identified; cost/capacity derivations have the risk-owner review path identified.
- The calibration-choice rationale is recorded before looking at outcomes; the reproducibility tolerance is pre-declared (test strategy V11/V12).
Tier-C handling rule¶
Tier-C cost evidence may be carried forward only as an explicitly non-lockable analysis artifact.
Risk-owner approval may allow Tier-C to be kept as non-lockable context, but cannot convert Tier-C
into a lockable P90 cost. S03 charter lock remains blocked until Tier A/B evidence exists, unless the
committee explicitly records a non-lockable exploratory charter state. A Tier-C bound used for any
downstream lockable decision is a violation. Tier-D evidence is unsupported → typed
INFEASIBLE-cost-data, and — unlike Tier-C — cannot be carried even as non-lockable context.
4. STOP-and-ask — escalation procedure (hard boundary)¶
STOP if a step would require a new training run · Airflow launch · cluster job · Phase-2 predictive run · model/threshold selection — or any of the subtler cases:
- using exploratory outcomes to choose capacity / metric / null-gate / action policy / budget / universe — "exploratory influence" includes any post-hoc adjustment of a calibration parameter based on an observed outcome, not only the initial choice;
- using Tier-C/D cost to avoid an
INFEASIBLE; - inserting an unsigned value into the charter;
- a derivation needing a new data pull;
- deriving a value after looking at predictive outcomes.
Procedure: STOP → record the blocker in the derivation log → escalate to the operator (requirement + affected value) → do not proceed until the operator decides → if a forbidden action already happened, the analysis-only attestation fails and S02 is invalidated + escalated (test strategy N8).
5. Artifact locations & naming¶
Resolved by the derivations workspace (scaffolding):
| Artifact | Location |
|---|---|
| Derivation record (per value) | documentation/stories/CVN-N001-EK-S02/derivations/<value>.yaml (from _TEMPLATE_derivation.yaml) |
INFEASIBLE records |
documentation/stories/CVN-N001-EK-S02/derivations/INFEASIBLE-<reason>.yaml (from _TEMPLATE_infeasible.yaml) |
| Charter values (unlocked) | the charter draft §2 placeholder table |
| Validation evidence matrix | documentation/stories/CVN-N001-EK-S02/derivations/validation_evidence_matrix.md |
| Review notes | the derivation record (reviewer field) + PR |
| MLflow (provenance-only) | experiment cvn-n001-ek-s02, run name <value>-<YYYYMMDD> |
6. How to produce a derivation (read-only, no launcher)¶
- Work from existing inputs only (architecture §2 / §8.1).
- Derive in a notebook / read-only script — never a training job, sweep, or Airflow DAG.
- Per-value contract: capacity rule → P90 cost (assign tier) → mapping → power → null → budgets → Sortino note.
- Record a signed derivation (§7 template) for every value.
- If the value cannot be defensibly derived → record a typed
INFEASIBLE(§7), not a placeholder.
7. Record templates¶
Derivation record (one per value):
derivation_id:
story: CVN-N001-EK-S02
value_name:
value:
units:
lockability: lockable | non-lockable | infeasible
cost_tier: A | B | C | D | n/a
artifact_path:
mlflow_run_id:
git_sha:
input_hashes:
code_version:
parameters:
author:
reviewer:
timestamp:
repro_command:
notes:
Typed INFEASIBLE record (one per blocked value):
infeasible_id:
story: CVN-N001-EK-S02
reason: cost-data | capacity | power | mapping | null
trigger:
failed_derivation:
evidence_attempted:
allowed_next_action:
blocked_downstream: S03 (charter lock)
required_remediation_artifact:
author:
reviewer:
timestamp:
8. Signing a derivation¶
A signed derivation = immutable artifact path or MLflow run id · git commit SHA · input dataset versions / hashes · code version · parameters · author · reviewer · generated timestamp · reproducible command (the §7 template fields). MLflow is provenance-only.
9. Recording a typed INFEASIBLE¶
Per plan §15:
single verdict + reason + required artifact + blocked downstream (S03) + allowed next action. A typed
INFEASIBLE is a successful S02 outcome. Never weaken E_pred_min, change the metric, or use Tier-C/D cost
to avoid one.
10. Change / re-derivation¶
A value is re-derived, not edited in place:
- the prior derivation is marked superseded (not deleted);
- downstream entries referencing the old derivation are invalidated;
- the reviewer must re-review the new derivation;
- if the old value was already handed to S03, S03 must be notified;
- if the cost tier changes, lockability must be re-evaluated.
11. Rollback¶
Do not delete signed derivation artifacts. Mark them invalidated / superseded, remove their values from the unlocked charter, and restore placeholders. Record: rollback reason · operator · timestamp · affected values · downstream notifications. No runtime impact (analysis-only). Downstream S03/S04 stay blocked until the values are re-established.
12. Handoff to S03 — package checklist¶
The S03 handoff package must include:
- unlocked charter values;
- signed derivation record for each value (§7);
- cost tier + lockability status for cost-derived values;
- typed
INFEASIBLErecords, if any; - unresolved threats to validity (plan §17);
- reviewer status for each derivation;
- risk-owner note for cost/capacity items;
- confirmation no forbidden run / launcher / training / model-selection / threshold-selection occurred;
- list of values eligible for S03 lock (Tier A/B) vs values blocking S03 (Tier-C /
INFEASIBLE).
Role separation: S02 analyst ≠ S03 locker ≠ S04 launcher.
13. Common mistakes (avoid)¶
- treating an MLflow run id as training/predictive evidence;
- using Tier-C as if lockable (or letting risk-owner approval "promote" it);
- editing a derivation record in place (instead of superseding);
- using exploratory cost-sensitivity outcomes to choose capacity / metric / null;
- launching a small helper training job "just to check";
- writing a placeholder when
INFEASIBLEshould be recorded.
14. Success criteria¶
S02 runbook execution is complete when every required charter value has either:
- a signed derivation + lockability status; or
- a typed
INFEASIBLErecord with required artifact + next action;
and the S03 handoff package (§12) is complete.
15. Owner handoff¶
Source of truth: OpenProject wp#272 (status) · this Story
hub (artifacts) · the signed derivations. A new owner resumes from the last signed derivations + the open /
typed-INFEASIBLE items. Never infer trading/deployment authority from S02 — the reference capacity is
non-deployment.