Plan dossier — CVN-N001-EI-S12 : Economic observability — keyed-config + cost-validated ground-truth substrate (Tier-2 gate)¶

Story: CVN-N001-EI-S12 · parent Epic CVN-N001-EI Type: implementation plan (ADR-68) Date: 2026-06-05 Decision basis: documentation/stories/CVN-N001-EI-S11/fiche_decision_economic_groundtruth.md §8 (Option C, operator-validated) · committee Meeting OP #250

Problem¶

The PG probe (2026-06-05, finetune_results, 9216 rows) found the program computes economic outcomes jointly with f1_buy, but the substrate is unusable as Tier-2 ground-truth: 1. Decoupled from the config — model_hyperparams 0/9216, optuna_trial_id 0/9216 are NULL; feature_hash is constant ("unknown"). No row joins back to the config that produced it (ADR-25 dead join-key). 2. Cost-unvalidated — sortino/expectancy are backtested under assumed cost_bps; the 3-value sweep is sensitivity, not validation (criterion 4 — fiche §2). 3. finetune_baselines = 0 rows (ADR-29 naive baseline absent).

Root cause (located): persist_result in src/commun/finetune/persistence.py omits model_hyperparams and optuna_trial_id from the INSERT entirely (→ NULL), and feature_hash receives the literal "unknown". The caller ablation_runner.py never propagates them, though VariantResult.hpo_params carries the winning hyperparameters at the write point.

Approach — incremental, each increment independently shippable¶

#	Increment	Surface	Status
1a	Persist `model_hyperparams` (the winning HPO params = config identity)	`ablation_runner.py` base_result → `persist_result` INSERT + column	✅ done (PR #1115)
1c	Real `feature_hash` (replace `"unknown"`)	`regime_trainer` fingerprint of fitted feature set → base_result → writer	✅ done (PR #1115)
2	Populate `finetune_baselines` (ADR-29 naive baseline)	compute buy_and_hold/random/naive per crypto×fold in the FTF DAG report task (load OHLCV in-pod) → `generate_report(baselines=…)` (today `={}`)	next — own PR, CLUSTER-gated
1b	Persist `optuna_trial_id`	5-hop thread `study.best_trial.number` → `HPOResult` → orchestrator `train_result` → `VariantResult` → writer	deferred indefinitely (YAGNI)
3	Val/OOS paired metrics (criterion 3)	persist the selection-side metric alongside OOS, not a single scalar	design
4	Cost-validity (criterion 4, constitutive)	anchor `cost_bps` on realized fills (reconciliation) or stamp every row `cost_basis='backtested_assumed'` as an honest label — never certify the assumption (fiche §8)	design

The deliverable is usably keyed + cost-validated — not "all N columns populated for completeness". 1a + 1c already deliver the join (a row → its config + feature-set); 2 makes the economics interpretable (vs baseline). That is the substrate.

1b is YAGNI, not "next". Its join target is the MLflow CVNTrade_HPO run — which the probe already found carries no f1_buy metric, only best_params (that 1a already persists) + data identity. It joins to a place that lacks the metric the proxy↔economy question needs → ~zero for S12. It is not co-located with the DAG work (different files, 5 hops, opaque orchestrator), so it does not ride along with increment 2. Revisit only if a concrete provenance need surfaces (and even then, low value given the target).
1a/1c are unit-mergeable — the mocked-cursor contract tests prove the code builds the INSERT correctly given an input. That is enough to merge #1115 (low-risk plumbing, no reason to block). It is NOT enough to declare the keyed substrate done: a mock does not prove (a) the real run feeds model_hyperparams/feature_hash non-NULL at the call site — the value can be empty exactly where it matters, which is literally why the column was 0/9216 while existing — nor (b) that the prod path is the one the test exercises.
2 is NOT unit-mergeable the same way: its load-bearing path is OHLCV-load-in-pod, which mocked-candle unit tests do not exercise (the S03 trap — tests touch a different path than the one that runs). Gate 2 on a real cluster dry-run that loads OHLCV in-pod and computes the baselines, not unit-only. Own PR, distinct test profile from #1115 — do not bundle.

Deliverable closure = a real run, not a green CI (symmetry between the two, only the date differs): - 2 — cluster dry-run before merge (the risky data-loading path must be proven up front). - 1a/1c — re-probe after merge: a real post-merge FTF run must write non-NULL, distinct model_hyperparams/feature_hash, confirmed by the same read-only PG probe that found 0/9216. Until then the "keyed" deliverable is open — #1115 going green does NOT close it; without the re-probe the key could stay dead in prod for the original reason and nobody sees it until the next probe.

Increments 3–4 carry design choices (see Questions).

Files¶

src/commun/finetune/persistence.py — INSERT columns/values (1a/1c done; 1b deferred, 2 next).
src/commun/finetune/ablation_runner.py — propagate join-key fields into base_result (1a + 1c done).
src/commun/regime/regime_trainer.py — feature_hash fingerprint + _feature_set_hash on VariantResult (1c done); optuna_trial_id deferred (1b).
src/commun/finetune/baselines.py — baseline population (2, own PR).
tests/unit/test_finetune_persistence.py + tests/unit/regime/test_feature_set_hash.py — contract tests (1a + 1c done: parity + JSON + NULL-when-absent + hash determinism).

Risks & mitigations¶

Writing what the caller doesn't have (ADR-25): adding a column without a real source just re-creates NULL. Mitigation — each increment threads the source end-to-end, with a contract test asserting the value reaches the bind param (not just the column exists). 1a done this way.
Backfill vs forward-fill: the fix keys/labels future rows; the 9189 pre-S19 rows stay dead. Acceptable — the clean substrate is regenerated by the post-S19 re-run (decision B), not backfilled.
No silent placeholder (ADR-25): absent values persist as NULL, never "{}"/"unknown" — 1a enforces this; 1c removes the existing "unknown" default.

Success criteria — closed by a real run, not by green CI¶

1a/1c (keyed): a real post-merge FTF run writes model_hyperparams + real feature_hash non-NULL and distinct, confirmed by re-running the read-only PG probe that found 0/9216. ⚠️ This is the closure step — #1115 merging green does NOT close the "keyed" deliverable; the re-probe does. (optuna_trial_id is out of scope — 1b deferred YAGNI.)
2: finetune_baselines populated for each run (ADR-29) — proven by the pre-merge cluster dry-run (OHLCV-in-pod), then confirmed on the run.
4: every economic row carries an explicit cost-basis label or a realized-fill-anchored cost.
Contract tests green; the §0bis four criteria pass on a fresh post-fix run.

Questions for the committee (plan_review)¶

The strategic plan_review already ran (Meeting #250) on the fiche; these are the residual implementation design questions for increments 3–4. 1. Cost-validity (4): is realized-fill reconciliation in scope for S12, or does S12 ship the honest label (cost_basis='backtested_assumed') + open a separate realized-eco-capture Story (overlaps s43 §2bis)? 2. Val/OOS pairing (3): persist both sides per row, or a separate finetune_results_validation table keyed identically?

~~optuna_trial_id source~~ — dropped: 1b is deferred indefinitely (YAGNI — joins to a metric-less MLflow target; see the increment table). No longer an open design question.