Skip to content

CVN-N001-EK-S02 — Runbook (operator manual)

Story artifact required by ADR-0101. For an analysis Story the runbook is the operator manual: pre-flight → execution → artifact → error → rollback → handoff. It operationalises the control invariants of architecture §5; the test strategy verifies these guardrails break correctly.

1. Roles

Role Authority in S02
Analyst / operator runs read-only derivations; produces signed artifacts; launcher of nothing (no Airflow / training / cluster)
Methodology reviewer reviews derivation validity, reproducibility, anti-snooping compliance, power/null/mapping method, and whether any INFEASIBLE reason is correctly assigned (the plan_review is already PASSED, Meeting #273 — not re-done here)
Risk owner reviews cost tier + reference-capacity non-deployment framing. May approve carrying a Tier-C bound forward as non-lockable context only. Cannot validate scientific signal, cannot authorize S04, and cannot convert non-lockable (Tier-C) evidence into lockable (Tier-A/B) evidence

2. Operational invariants & violation handling

Invariant If violated → operator action
Analysis-only — no training / Airflow / cluster / Phase-2 run / model selection STOP, record nothing as a value, escalate (§4), mark the derivation invalidated
Read-only inputs discard; re-run from approved read-only inputs
Exploratory results are context-only (§8.1) influenced choice recorded as prior rationale + tuple-budgeted, or derivation invalidated
Every value carries a signed derivation unsigned value not written into the charter; re-derive with provenance
Tier-C cost is non-lockable record provisional + S03-blocked; never mark lockable (§3 Tier-C rule)
E_econ_minE_pred_min reject mapping derivation; redo distinct
MLflow = provenance-only never cite an MLflow run id as training/predictive evidence
Typed INFEASIBLE over placeholder record the typed INFEASIBLE, never a placeholder

3. Pre-flight checklist (before any derivation)

  • OpenProject wp#272 is the active Story record; the current owner is assigned.
  • The D2 plan_review has PASSED (Meeting #273) and the runbook/architecture revisions match the plan revision.
  • The working-tree commit SHA is known and will be recorded.
  • Inputs are existing & read-only: OHLCV cache · ATR-H4 labels · trade/cost logs · exploratory results as context only.
  • No Airflow DAG, training job, cluster launcher, Phase-2 predictive run, model comparison, or threshold selection will be invoked.
  • The output artifact directory + naming convention (§5) are known.
  • The reviewer for the derivation is identified; cost/capacity derivations have the risk-owner review path identified.
  • The calibration-choice rationale is recorded before looking at outcomes; the reproducibility tolerance is pre-declared (test strategy V11/V12).

Tier-C handling rule

Tier-C cost evidence may be carried forward only as an explicitly non-lockable analysis artifact. Risk-owner approval may allow Tier-C to be kept as non-lockable context, but cannot convert Tier-C into a lockable P90 cost. S03 charter lock remains blocked until Tier A/B evidence exists, unless the committee explicitly records a non-lockable exploratory charter state. A Tier-C bound used for any downstream lockable decision is a violation. Tier-D evidence is unsupported → typed INFEASIBLE-cost-data, and — unlike Tier-C — cannot be carried even as non-lockable context.

4. STOP-and-ask — escalation procedure (hard boundary)

STOP if a step would require a new training run · Airflow launch · cluster job · Phase-2 predictive run · model/threshold selection — or any of the subtler cases:

  • using exploratory outcomes to choose capacity / metric / null-gate / action policy / budget / universe"exploratory influence" includes any post-hoc adjustment of a calibration parameter based on an observed outcome, not only the initial choice;
  • using Tier-C/D cost to avoid an INFEASIBLE;
  • inserting an unsigned value into the charter;
  • a derivation needing a new data pull;
  • deriving a value after looking at predictive outcomes.

Procedure: STOPrecord the blocker in the derivation log → escalate to the operator (requirement + affected value) → do not proceed until the operator decides → if a forbidden action already happened, the analysis-only attestation fails and S02 is invalidated + escalated (test strategy N8).

5. Artifact locations & naming

Resolved by the derivations workspace (scaffolding):

Artifact Location
Derivation record (per value) documentation/stories/CVN-N001-EK-S02/derivations/<value>.yaml (from _TEMPLATE_derivation.yaml)
INFEASIBLE records documentation/stories/CVN-N001-EK-S02/derivations/INFEASIBLE-<reason>.yaml (from _TEMPLATE_infeasible.yaml)
Charter values (unlocked) the charter draft §2 placeholder table
Validation evidence matrix documentation/stories/CVN-N001-EK-S02/derivations/validation_evidence_matrix.md
Review notes the derivation record (reviewer field) + PR
MLflow (provenance-only) experiment cvn-n001-ek-s02, run name <value>-<YYYYMMDD>

6. How to produce a derivation (read-only, no launcher)

  1. Work from existing inputs only (architecture §2 / §8.1).
  2. Derive in a notebook / read-only script — never a training job, sweep, or Airflow DAG.
  3. Per-value contract: capacity rule → P90 cost (assign tier) → mapping → power → null → budgets → Sortino note.
  4. Record a signed derivation (§7 template) for every value.
  5. If the value cannot be defensibly derived → record a typed INFEASIBLE (§7), not a placeholder.

7. Record templates

Derivation record (one per value):

derivation_id:
story: CVN-N001-EK-S02
value_name:
value:
units:
lockability: lockable | non-lockable | infeasible
cost_tier: A | B | C | D | n/a
artifact_path:
mlflow_run_id:
git_sha:
input_hashes:
code_version:
parameters:
author:
reviewer:
timestamp:
repro_command:
notes:

Typed INFEASIBLE record (one per blocked value):

infeasible_id:
story: CVN-N001-EK-S02
reason: cost-data | capacity | power | mapping | null
trigger:
failed_derivation:
evidence_attempted:
allowed_next_action:
blocked_downstream: S03 (charter lock)
required_remediation_artifact:
author:
reviewer:
timestamp:

8. Signing a derivation

A signed derivation = immutable artifact path or MLflow run id · git commit SHA · input dataset versions / hashes · code version · parameters · author · reviewer · generated timestamp · reproducible command (the §7 template fields). MLflow is provenance-only.

9. Recording a typed INFEASIBLE

Per plan §15: single verdict + reason + required artifact + blocked downstream (S03) + allowed next action. A typed INFEASIBLE is a successful S02 outcome. Never weaken E_pred_min, change the metric, or use Tier-C/D cost to avoid one.

10. Change / re-derivation

A value is re-derived, not edited in place:

  • the prior derivation is marked superseded (not deleted);
  • downstream entries referencing the old derivation are invalidated;
  • the reviewer must re-review the new derivation;
  • if the old value was already handed to S03, S03 must be notified;
  • if the cost tier changes, lockability must be re-evaluated.

11. Rollback

Do not delete signed derivation artifacts. Mark them invalidated / superseded, remove their values from the unlocked charter, and restore placeholders. Record: rollback reason · operator · timestamp · affected values · downstream notifications. No runtime impact (analysis-only). Downstream S03/S04 stay blocked until the values are re-established.

12. Handoff to S03 — package checklist

The S03 handoff package must include:

  • unlocked charter values;
  • signed derivation record for each value (§7);
  • cost tier + lockability status for cost-derived values;
  • typed INFEASIBLE records, if any;
  • unresolved threats to validity (plan §17);
  • reviewer status for each derivation;
  • risk-owner note for cost/capacity items;
  • confirmation no forbidden run / launcher / training / model-selection / threshold-selection occurred;
  • list of values eligible for S03 lock (Tier A/B) vs values blocking S03 (Tier-C / INFEASIBLE).

Role separation: S02 analyst ≠ S03 locker ≠ S04 launcher.

13. Common mistakes (avoid)

  • treating an MLflow run id as training/predictive evidence;
  • using Tier-C as if lockable (or letting risk-owner approval "promote" it);
  • editing a derivation record in place (instead of superseding);
  • using exploratory cost-sensitivity outcomes to choose capacity / metric / null;
  • launching a small helper training job "just to check";
  • writing a placeholder when INFEASIBLE should be recorded.

14. Success criteria

S02 runbook execution is complete when every required charter value has either:

  • a signed derivation + lockability status; or
  • a typed INFEASIBLE record with required artifact + next action;

and the S03 handoff package (§12) is complete.

15. Owner handoff

Source of truth: OpenProject wp#272 (status) · this Story hub (artifacts) · the signed derivations. A new owner resumes from the last signed derivations + the open / typed-INFEASIBLE items. Never infer trading/deployment authority from S02 — the reference capacity is non-deployment.