Architecture (r1 — for review)

How it's built. Architecture artefact (ADR-0095) for S14. Realizes the pre-registered decision rule of the plan r4 (plan_review PASSED, Meeting 272) — it does not re-decide it. Hamilton-native, pure decision core, I/O isolated. Two slices (per the plan re-scope): A = committee-invariant decision core (pure, tested), B = the in-pod controlled fit that feeds it.

Changelog — r1 (2026-06-10): initial architecture, traces plan r4 + the implemented slices A/B (s14_lgb_output_validity.py, s14_q1_fit_inpod.py, test_s14_lgb_output_validity.py).

§0 Provenance & the §0bis pivot¶

Realizes plan r4. The plan was re-scoped after a verified §0bis: prod LightGBM HPO draws log metrics={}, artifacts=[], and only best_n_estimators (the searched ceiling) — no best_iteration, no val-metric, no per-candle predictions. So a read-only rank/calibration verdict is inescapably on the s18 replay (A6, S09-gated). The architecture therefore splits:

Q1 — config-validity (PRE-S09, replay-independent): a small controlled instrumented fit on a regime-matched trusted fold via the canonical-producing FE path (the path that makes the A6 canonical — the reference, not the victim; _load_captured_parquet count = 0 in the normal trainer).
Q2 — fold-3 rank/calibration (POST-S09, gated): the read-only fork on the cleared s43 replay, run only if Q1 = CONFIG_OK_LGB.

1. Two-slice decomposition¶

Slice	File	Role	Purity
A — decision core	`commun/finetune/diagnostic/s14_lgb_output_validity.py`	pure probes + `decide_q1` (total over the truth table) + `run_q1` (injects a `fit_fn`)	pure — zero data-prep coupling; committee-invariant; fully unit-tested
B — controlled fit	`commun/finetune/diagnostic/s14_q1_fit_inpod.py`	builds the regime-matched `Datasets` (FE path) + the `fit_fn` (harness `train_with_fixed_params` → `best_iteration` + holdout `p_buy`) + `run_inpod`	side-effecting (cache, harness fit) — operator-triggered in-pod

Slice A depends on nothing but the injected fit_fn contract → it is testable in full isolation (12/12 green). Slice B is the only part coupled to data/cluster, and is necessarily in-pod (ADR-90 gates the LGB training HPs to ftf_config.base_env, injected by the FTF DAG).

2. Architectural style — Hamilton-native, pure decision¶

The decision (decide_q1) is the auditable core — the frozen rule + the truth table (plan §1 Fig 1). It is a pure function over already-collected results (FitResult), so the exhaustiveness test (every truth-table cell → exactly one verdict) is trivially checkable and immune to I/O flakiness. The fit (slice B) runs out-of-graph and feeds the decision via the FitResult contract.

3. Component view¶

flowchart TD
  subgraph B["Slice B (in-pod, operator-triggered) — side-effecting"]
    FE["build_regime_matched_datasets:
get_feature_store -> generate_labels_standalone (ATR0.5_1.5_H4)
-> purged+embargoed split -> Datasets"]
    FIT["make_harness_fit_fn -> fit_fn:
train_with_fixed_params(prod best_params)
-> best_iteration + predict_proba(test)[BUY]
per-draw over prod draws; es_metric ablation"]
  end
  subgraph A["Slice A (pure) — committee-invariant decision core"]
    PB["probes: base_rate, auprc_lift, ece_equal_mass, _ci_over_draws"]
    DEC["decide_q1 (truth table -> exactly one verdict)"]
  end
  FE --> FIT
  FIT -->|FitResult| RUN[run_q1]
  RUN --> PB --> DEC
  DEC -->|Q1Verdict_| EMIT["emit s14_q1_config_verdict (key=value) + (post-S09) s14_lgb_output_verdict"]

4. Decision core (slice A, pure, total)¶

base_rate(y) — positive prevalence; every metric reported as lift over it.
auprc_lift(y, p) = AUPRC / no-skill(=base_rate) → tail rank (1.0 = chance).
precision_at_top_rate(y, p, rate) — tail precision (scale-invariant).
ece_equal_mass(y, p) — equal-mass (quantile) bins ECE (the high-p tail is populated; equal-width would be bulk-dominated and blind there).
_ci_over_draws(...) — selection distribution over draws: median + max + 95 % CI (decide on the distribution, never recency).
decide_q1(fr, s43_base_rate, s43_label, ablation) — frozen order: tooling/labels/leakage → regime guard (label + base_rate ±20 %) → structural early-stopping (best_iteration ≤ 3 while ceiling ≫) → capacity (lift-CI below AUPRC_LIFT_FLOOR) → CONFIG_OK_LGB; a lift-CI that straddles the boundary → INCONCLUSIVE_POWER. _localise_early_stopping uses the multi-metric ablation to name the cause. No I/O, no raise, no print (ADR-25/31).
run_q1(fit_fn, datasets, prod_best_params, …) — orchestrator; a failed fit_fn → INCONCLUSIVE_TOOLING(reason=fit:…).

5. Interface contracts¶

FitResult (slice B → slice A): y_holdout, p_buy_holdout, best_iteration, n_estimators_ceiling, per_draw_p_buy/per_draw_y (the draw distribution), label_aligned, fe_fitted_on_train_only, no_look_ahead, label_name. Missing/None → INCONCLUSIVE_TOOLING(reason).

Q1Verdict_ (artefact + event=s14_q1_config_verdict key=value): verdict ∈ {CONFIG_DEGENERATE_LGB, CONFIG_OK_LGB, INCONCLUSIVE_POWER, INCONCLUSIVE_TOOLING}, cause, reason, base_rate, best_iteration, ceiling, auprc_lift (CI), ece, ablation, note.

Frozen thresholds (plan r4): BEST_ITER_FLOOR=3 (anchored), RANK_MATERIAL=0.10 / ECE_BAD=0.10 (conventional, flagged), BASE_RATE_TOL=0.20 (#4 regime match), AUPRC_LIFT_FLOOR=1.10.

6. Integration points (slice B)¶

cache.get_feature_store(crypto, tf) — the normal FE output (canonical-producing; A6-independent).
ETL.cvntrade_label.generate_labels_standalone(strategy, tf) — triple-barrier labels on demand (Feast cache carries no label_*).
commun.cache.fe_split.compute_embargo_bars — purged embargo gap before the test window (no look-ahead).
training.harness.train_with_fixed_params("lightgbm", Datasets, HPOParams) → model with .best_iteration (the real early-stop point prod does not log) + .predict_proba.
ADR-90 gate (load-bearing): the harness reads the LGB training HPs from ftf_config.base_env (PG/Console), injected in-pod by the FTF DAG → slice B's real fit is necessarily in-pod.

7. Observability & failure architecture¶

No-crash / fail-loud (ADR-25/31): every error path → structured INCONCLUSIVE_TOOLING(reason=…); never a UI raise, never print; NaN/empty guarded. A dataprep failure → INCONCLUSIVE_TOOLING(reason=dataprep:…).
Events: s14_q1_config_verdict (Q1) · s14_q1_datasets_built · s14_q1_draw_skipped (a bad draw doesn't sink the distribution) · s14_lgb_output_verdict (Q2, post-S09).
Local smoke: s14_q1_fit_inpod --smoke exercises the wiring on synthetic data (no cache/cluster) → proves no-crash; the real fit (best_iteration capture) is the in-pod dry-run.

8. ADR conformance¶

ADR	How
ADR-0095	diagnostic-story template (artefact 2/5)
ADR-90	training HPs in PG/Console only — why slice B is in-pod (the §0bis pivot's root cause)
ADR-25	no silent fallback; all error paths → `INCONCLUSIVE_TOOLING(reason)`
ADR-31	`event=key=value` logs, no `print`
ADR-23	FE/feature provenance carried (`FeatureVersion`)
ADR-92	DAG build SHA surfaced (when the in-pod DAG lands)

Files¶

File	State
`commun/finetune/diagnostic/s14_lgb_output_validity.py`	slice A — implemented, 12/12 tests green
`commun/finetune/diagnostic/s14_q1_fit_inpod.py`	slice B — implemented, `--smoke` no-crash; real fit in-pod
`tests/unit/finetune/diagnostic/test_s14_lgb_output_validity.py`	implemented (12 tests)
`dags/dag_diagnostic__s14_q1.py`	TODO — the in-pod Q1 DAG wrapping `run_inpod` (operator-triggered)

Open items¶

In-pod Q1 DAG (dag_diagnostic__s14_q1) wrapping run_inpod — operator-triggered, ADR-92 build SHA.
Regime-matched trusted fold selection — the operator-arbitrated config/draws + the fold windows.
Q2 module (_decide_q2, post-S09) — the r2 read-only fork on the cleared replay.