Architecture (r1 — for review)
How it's built. Architecture artefact (ADR-0095) for S14. Realizes the pre-registered decision rule of the plan r4 (plan_review PASSED, Meeting 272) — it does not re-decide it. Hamilton-native, pure decision core, I/O isolated. Two slices (per the plan re-scope): A = committee-invariant decision core (pure, tested), B = the in-pod controlled fit that feeds it.
Changelog — r1 (2026-06-10): initial architecture, traces plan r4 + the implemented slices A/B (
s14_lgb_output_validity.py,s14_q1_fit_inpod.py,test_s14_lgb_output_validity.py).
§0 Provenance & the §0bis pivot¶
Realizes plan r4. The plan was re-scoped after a verified §0bis: prod LightGBM HPO draws log
metrics={}, artifacts=[], and only best_n_estimators (the searched ceiling) — no best_iteration,
no val-metric, no per-candle predictions. So a read-only rank/calibration verdict is inescapably on
the s18 replay (A6, S09-gated). The architecture therefore splits:
- Q1 — config-validity (PRE-S09, replay-independent): a small controlled instrumented fit on a
regime-matched trusted fold via the canonical-producing FE path (the path that makes the A6
canonical — the reference, not the victim;
_load_captured_parquetcount = 0 in the normal trainer). - Q2 — fold-3 rank/calibration (POST-S09, gated): the read-only fork on the cleared s43 replay, run
only if Q1 =
CONFIG_OK_LGB.
1. Two-slice decomposition¶
| Slice | File | Role | Purity |
|---|---|---|---|
| A — decision core | commun/finetune/diagnostic/s14_lgb_output_validity.py |
pure probes + decide_q1 (total over the truth table) + run_q1 (injects a fit_fn) |
pure — zero data-prep coupling; committee-invariant; fully unit-tested |
| B — controlled fit | commun/finetune/diagnostic/s14_q1_fit_inpod.py |
builds the regime-matched Datasets (FE path) + the fit_fn (harness train_with_fixed_params → best_iteration + holdout p_buy) + run_inpod |
side-effecting (cache, harness fit) — operator-triggered in-pod |
Slice A depends on nothing but the injected fit_fn contract → it is testable in full isolation (12/12
green). Slice B is the only part coupled to data/cluster, and is necessarily in-pod (ADR-90 gates the
LGB training HPs to ftf_config.base_env, injected by the FTF DAG).
2. Architectural style — Hamilton-native, pure decision¶
The decision (decide_q1) is the auditable core — the frozen rule + the truth table (plan §1 Fig 1). It is
a pure function over already-collected results (FitResult), so the exhaustiveness test (every truth-table
cell → exactly one verdict) is trivially checkable and immune to I/O flakiness. The fit (slice B) runs
out-of-graph and feeds the decision via the FitResult contract.
3. Component view¶
flowchart TD
subgraph B["Slice B (in-pod, operator-triggered) — side-effecting"]
FE["build_regime_matched_datasets:
get_feature_store -> generate_labels_standalone (ATR0.5_1.5_H4)
-> purged+embargoed split -> Datasets"]
FIT["make_harness_fit_fn -> fit_fn:
train_with_fixed_params(prod best_params)
-> best_iteration + predict_proba(test)[BUY]
per-draw over prod draws; es_metric ablation"]
end
subgraph A["Slice A (pure) — committee-invariant decision core"]
PB["probes: base_rate, auprc_lift, ece_equal_mass, _ci_over_draws"]
DEC["decide_q1 (truth table -> exactly one verdict)"]
end
FE --> FIT
FIT -->|FitResult| RUN[run_q1]
RUN --> PB --> DEC
DEC -->|Q1Verdict_| EMIT["emit s14_q1_config_verdict (key=value) + (post-S09) s14_lgb_output_verdict"]
4. Decision core (slice A, pure, total)¶
base_rate(y)— positive prevalence; every metric reported as lift over it.auprc_lift(y, p)= AUPRC / no-skill(=base_rate) → tail rank (1.0 = chance).precision_at_top_rate(y, p, rate)— tail precision (scale-invariant).ece_equal_mass(y, p)— equal-mass (quantile) bins ECE (the high-p tail is populated; equal-width would be bulk-dominated and blind there)._ci_over_draws(...)— selection distribution over draws: median + max + 95 % CI (decide on the distribution, never recency).decide_q1(fr, s43_base_rate, s43_label, ablation)— frozen order: tooling/labels/leakage → regime guard (label + base_rate ±20 %) → structural early-stopping (best_iteration ≤ 3while ceiling ≫) → capacity (lift-CI belowAUPRC_LIFT_FLOOR) →CONFIG_OK_LGB; a lift-CI that straddles the boundary →INCONCLUSIVE_POWER._localise_early_stoppinguses the multi-metric ablation to name the cause. No I/O, no raise, no print (ADR-25/31).run_q1(fit_fn, datasets, prod_best_params, …)— orchestrator; a failedfit_fn→INCONCLUSIVE_TOOLING(reason=fit:…).
5. Interface contracts¶
FitResult (slice B → slice A): y_holdout, p_buy_holdout, best_iteration, n_estimators_ceiling,
per_draw_p_buy/per_draw_y (the draw distribution), label_aligned, fe_fitted_on_train_only,
no_look_ahead, label_name. Missing/None → INCONCLUSIVE_TOOLING(reason).
Q1Verdict_ (artefact + event=s14_q1_config_verdict key=value): verdict ∈ {CONFIG_DEGENERATE_LGB,
CONFIG_OK_LGB, INCONCLUSIVE_POWER, INCONCLUSIVE_TOOLING}, cause, reason, base_rate, best_iteration,
ceiling, auprc_lift (CI), ece, ablation, note.
Frozen thresholds (plan r4): BEST_ITER_FLOOR=3 (anchored), RANK_MATERIAL=0.10 / ECE_BAD=0.10
(conventional, flagged), BASE_RATE_TOL=0.20 (#4 regime match), AUPRC_LIFT_FLOOR=1.10.
6. Integration points (slice B)¶
cache.get_feature_store(crypto, tf)— the normal FE output (canonical-producing; A6-independent).ETL.cvntrade_label.generate_labels_standalone(strategy, tf)— triple-barrier labels on demand (Feast cache carries nolabel_*).commun.cache.fe_split.compute_embargo_bars— purged embargo gap before the test window (no look-ahead).training.harness.train_with_fixed_params("lightgbm", Datasets, HPOParams)→ model with.best_iteration(the real early-stop point prod does not log) +.predict_proba.- ADR-90 gate (load-bearing): the harness reads the LGB training HPs from
ftf_config.base_env(PG/Console), injected in-pod by the FTF DAG → slice B's real fit is necessarily in-pod.
7. Observability & failure architecture¶
- No-crash / fail-loud (ADR-25/31): every error path → structured
INCONCLUSIVE_TOOLING(reason=…); never a UI raise, neverprint; NaN/empty guarded. A dataprep failure →INCONCLUSIVE_TOOLING(reason=dataprep:…). - Events:
s14_q1_config_verdict(Q1) ·s14_q1_datasets_built·s14_q1_draw_skipped(a bad draw doesn't sink the distribution) ·s14_lgb_output_verdict(Q2, post-S09). - Local smoke:
s14_q1_fit_inpod --smokeexercises the wiring on synthetic data (no cache/cluster) → proves no-crash; the real fit (best_iteration capture) is the in-pod dry-run.
8. ADR conformance¶
| ADR | How |
|---|---|
| ADR-0095 | diagnostic-story template (artefact 2/5) |
| ADR-90 | training HPs in PG/Console only — why slice B is in-pod (the §0bis pivot's root cause) |
| ADR-25 | no silent fallback; all error paths → INCONCLUSIVE_TOOLING(reason) |
| ADR-31 | event=key=value logs, no print |
| ADR-23 | FE/feature provenance carried (FeatureVersion) |
| ADR-92 | DAG build SHA surfaced (when the in-pod DAG lands) |
Files¶
| File | State |
|---|---|
commun/finetune/diagnostic/s14_lgb_output_validity.py |
slice A — implemented, 12/12 tests green |
commun/finetune/diagnostic/s14_q1_fit_inpod.py |
slice B — implemented, --smoke no-crash; real fit in-pod |
tests/unit/finetune/diagnostic/test_s14_lgb_output_validity.py |
implemented (12 tests) |
dags/dag_diagnostic__s14_q1.py |
TODO — the in-pod Q1 DAG wrapping run_inpod (operator-triggered) |
Open items¶
- In-pod Q1 DAG (
dag_diagnostic__s14_q1) wrappingrun_inpod— operator-triggered, ADR-92 build SHA. - Regime-matched trusted fold selection — the operator-arbitrated config/draws + the fold windows.
- Q2 module (
_decide_q2, post-S09) — the r2 read-only fork on the cleared replay.