Plan dossier — CVN-N001-EI-S12 : Economic observability — keyed-config + cost-validated ground-truth substrate (Tier-2 gate)¶
Story: CVN-N001-EI-S12 · parent Epic CVN-N001-EI
Type: implementation plan (ADR-68)
Date: 2026-06-05
Decision basis: documentation/stories/CVN-N001-EI-S11/fiche_decision_economic_groundtruth.md §8 (Option C, operator-validated) · committee Meeting OP #250
Problem¶
The PG probe (2026-06-05, finetune_results, 9216 rows) found the program computes economic outcomes jointly with f1_buy, but the substrate is unusable as Tier-2 ground-truth:
1. Decoupled from the config — model_hyperparams 0/9216, optuna_trial_id 0/9216 are NULL; feature_hash is constant ("unknown"). No row joins back to the config that produced it (ADR-25 dead join-key).
2. Cost-unvalidated — sortino/expectancy are backtested under assumed cost_bps; the 3-value sweep is sensitivity, not validation (criterion 4 — fiche §2).
3. finetune_baselines = 0 rows (ADR-29 naive baseline absent).
Root cause (located): persist_result in src/commun/finetune/persistence.py omits model_hyperparams and optuna_trial_id from the INSERT entirely (→ NULL), and feature_hash receives the literal "unknown". The caller ablation_runner.py never propagates them, though VariantResult.hpo_params carries the winning hyperparameters at the write point.
Approach — incremental, each increment independently shippable¶
| # | Increment | Surface | Status |
|---|---|---|---|
| 1a | Persist model_hyperparams (the winning HPO params = config identity) |
ablation_runner.py base_result → persist_result INSERT + column |
✅ done (PR #1115) |
| 1c | Real feature_hash (replace "unknown") |
regime_trainer fingerprint of fitted feature set → base_result → writer |
✅ done (PR #1115) |
| 2 | Populate finetune_baselines (ADR-29 naive baseline) |
compute buy_and_hold/random/naive per crypto×fold in the FTF DAG report task (load OHLCV in-pod) → generate_report(baselines=…) (today ={}) |
next — own PR, CLUSTER-gated |
| 1b | Persist optuna_trial_id |
5-hop thread study.best_trial.number → HPOResult → orchestrator train_result → VariantResult → writer |
deferred indefinitely (YAGNI) |
| 3 | Val/OOS paired metrics (criterion 3) | persist the selection-side metric alongside OOS, not a single scalar | design |
| 4 | Cost-validity (criterion 4, constitutive) | anchor cost_bps on realized fills (reconciliation) or stamp every row cost_basis='backtested_assumed' as an honest label — never certify the assumption (fiche §8) |
design |
The deliverable is usably keyed + cost-validated — not "all N columns populated for completeness". 1a + 1c already deliver the join (a row → its config + feature-set); 2 makes the economics interpretable (vs baseline). That is the substrate.
- 1b is YAGNI, not "next". Its join target is the MLflow
CVNTrade_HPOrun — which the probe already found carries no f1_buy metric, onlybest_params(that 1a already persists) + data identity. It joins to a place that lacks the metric the proxy↔economy question needs → ~zero for S12. It is not co-located with the DAG work (different files, 5 hops, opaque orchestrator), so it does not ride along with increment 2. Revisit only if a concrete provenance need surfaces (and even then, low value given the target). - 1a/1c are unit-mergeable — the mocked-cursor contract tests prove the code builds the INSERT correctly given an input. That is enough to merge #1115 (low-risk plumbing, no reason to block). It is NOT enough to declare the keyed substrate done: a mock does not prove (a) the real run feeds
model_hyperparams/feature_hashnon-NULL at the call site — the value can be empty exactly where it matters, which is literally why the column was 0/9216 while existing — nor (b) that the prod path is the one the test exercises. - 2 is NOT unit-mergeable the same way: its load-bearing path is OHLCV-load-in-pod, which mocked-candle unit tests do not exercise (the S03 trap — tests touch a different path than the one that runs). Gate 2 on a real cluster dry-run that loads OHLCV in-pod and computes the baselines, not unit-only. Own PR, distinct test profile from #1115 — do not bundle.
Deliverable closure = a real run, not a green CI (symmetry between the two, only the date differs):
- 2 — cluster dry-run before merge (the risky data-loading path must be proven up front).
- 1a/1c — re-probe after merge: a real post-merge FTF run must write non-NULL, distinct model_hyperparams/feature_hash, confirmed by the same read-only PG probe that found 0/9216. Until then the "keyed" deliverable is open — #1115 going green does NOT close it; without the re-probe the key could stay dead in prod for the original reason and nobody sees it until the next probe.
Increments 3–4 carry design choices (see Questions).
Files¶
src/commun/finetune/persistence.py— INSERT columns/values (1a/1c done; 1b deferred, 2 next).src/commun/finetune/ablation_runner.py— propagate join-key fields intobase_result(1a + 1c done).src/commun/regime/regime_trainer.py—feature_hashfingerprint +_feature_set_hashonVariantResult(1c done);optuna_trial_iddeferred (1b).src/commun/finetune/baselines.py— baseline population (2, own PR).tests/unit/test_finetune_persistence.py+tests/unit/regime/test_feature_set_hash.py— contract tests (1a + 1c done: parity + JSON + NULL-when-absent + hash determinism).
Risks & mitigations¶
- Writing what the caller doesn't have (ADR-25): adding a column without a real source just re-creates NULL. Mitigation — each increment threads the source end-to-end, with a contract test asserting the value reaches the bind param (not just the column exists). 1a done this way.
- Backfill vs forward-fill: the fix keys/labels future rows; the 9189 pre-S19 rows stay dead. Acceptable — the clean substrate is regenerated by the post-S19 re-run (decision B), not backfilled.
- No silent placeholder (ADR-25): absent values persist as NULL, never
"{}"/"unknown"— 1a enforces this; 1c removes the existing"unknown"default.
Success criteria — closed by a real run, not by green CI¶
- 1a/1c (keyed): a real post-merge FTF run writes
model_hyperparams+ realfeature_hashnon-NULL and distinct, confirmed by re-running the read-only PG probe that found 0/9216. ⚠️ This is the closure step — #1115 merging green does NOT close the "keyed" deliverable; the re-probe does. (optuna_trial_idis out of scope — 1b deferred YAGNI.) - 2:
finetune_baselinespopulated for each run (ADR-29) — proven by the pre-merge cluster dry-run (OHLCV-in-pod), then confirmed on the run. - 4: every economic row carries an explicit cost-basis label or a realized-fill-anchored cost.
- Contract tests green; the §0bis four criteria pass on a fresh post-fix run.
Questions for the committee (plan_review)¶
The strategic plan_review already ran (Meeting #250) on the fiche; these are the residual implementation design questions for increments 3–4. 1. Cost-validity (4): is realized-fill reconciliation in scope for S12, or does S12 ship the honest label (
cost_basis='backtested_assumed') + open a separate realized-eco-capture Story (overlaps s43 §2bis)? 2. Val/OOS pairing (3): persist both sides per row, or a separatefinetune_results_validationtable keyed identically?~~
optuna_trial_idsource~~ — dropped: 1b is deferred indefinitely (YAGNI — joins to a metric-less MLflow target; see the increment table). No longer an open design question.