Skip to content

Architecture (r1 — for review)

How it's built. Architecture artefact (ADR-0095) for S14. Realizes the pre-registered decision rule of the plan r4 (plan_review PASSED, Meeting 272) — it does not re-decide it. Hamilton-native, pure decision core, I/O isolated. Two slices (per the plan re-scope): A = committee-invariant decision core (pure, tested), B = the in-pod controlled fit that feeds it.

Changelog — r1 (2026-06-10): initial architecture, traces plan r4 + the implemented slices A/B (s14_lgb_output_validity.py, s14_q1_fit_inpod.py, test_s14_lgb_output_validity.py).

§0 Provenance & the §0bis pivot

Realizes plan r4. The plan was re-scoped after a verified §0bis: prod LightGBM HPO draws log metrics={}, artifacts=[], and only best_n_estimators (the searched ceiling) — no best_iteration, no val-metric, no per-candle predictions. So a read-only rank/calibration verdict is inescapably on the s18 replay (A6, S09-gated). The architecture therefore splits:

  • Q1 — config-validity (PRE-S09, replay-independent): a small controlled instrumented fit on a regime-matched trusted fold via the canonical-producing FE path (the path that makes the A6 canonical — the reference, not the victim; _load_captured_parquet count = 0 in the normal trainer).
  • Q2 — fold-3 rank/calibration (POST-S09, gated): the read-only fork on the cleared s43 replay, run only if Q1 = CONFIG_OK_LGB.

1. Two-slice decomposition

Slice File Role Purity
A — decision core commun/finetune/diagnostic/s14_lgb_output_validity.py pure probes + decide_q1 (total over the truth table) + run_q1 (injects a fit_fn) pure — zero data-prep coupling; committee-invariant; fully unit-tested
B — controlled fit commun/finetune/diagnostic/s14_q1_fit_inpod.py builds the regime-matched Datasets (FE path) + the fit_fn (harness train_with_fixed_paramsbest_iteration + holdout p_buy) + run_inpod side-effecting (cache, harness fit) — operator-triggered in-pod

Slice A depends on nothing but the injected fit_fn contract → it is testable in full isolation (12/12 green). Slice B is the only part coupled to data/cluster, and is necessarily in-pod (ADR-90 gates the LGB training HPs to ftf_config.base_env, injected by the FTF DAG).

2. Architectural style — Hamilton-native, pure decision

The decision (decide_q1) is the auditable core — the frozen rule + the truth table (plan §1 Fig 1). It is a pure function over already-collected results (FitResult), so the exhaustiveness test (every truth-table cell → exactly one verdict) is trivially checkable and immune to I/O flakiness. The fit (slice B) runs out-of-graph and feeds the decision via the FitResult contract.

3. Component view

flowchart TD
  subgraph B["Slice B (in-pod, operator-triggered) — side-effecting"]
    FE["build_regime_matched_datasets:
get_feature_store -> generate_labels_standalone (ATR0.5_1.5_H4)
-> purged+embargoed split -> Datasets"] FIT["make_harness_fit_fn -> fit_fn:
train_with_fixed_params(prod best_params)
-> best_iteration + predict_proba(test)[BUY]
per-draw over prod draws; es_metric ablation"] end subgraph A["Slice A (pure) — committee-invariant decision core"] PB["probes: base_rate, auprc_lift, ece_equal_mass, _ci_over_draws"] DEC["decide_q1 (truth table -> exactly one verdict)"] end FE --> FIT FIT -->|FitResult| RUN[run_q1] RUN --> PB --> DEC DEC -->|Q1Verdict_| EMIT["emit s14_q1_config_verdict (key=value) + (post-S09) s14_lgb_output_verdict"]

4. Decision core (slice A, pure, total)

  • base_rate(y) — positive prevalence; every metric reported as lift over it.
  • auprc_lift(y, p) = AUPRC / no-skill(=base_rate) → tail rank (1.0 = chance).
  • precision_at_top_rate(y, p, rate) — tail precision (scale-invariant).
  • ece_equal_mass(y, p)equal-mass (quantile) bins ECE (the high-p tail is populated; equal-width would be bulk-dominated and blind there).
  • _ci_over_draws(...) — selection distribution over draws: median + max + 95 % CI (decide on the distribution, never recency).
  • decide_q1(fr, s43_base_rate, s43_label, ablation) — frozen order: tooling/labels/leakage → regime guard (label + base_rate ±20 %) → structural early-stopping (best_iteration ≤ 3 while ceiling ≫) → capacity (lift-CI below AUPRC_LIFT_FLOOR) → CONFIG_OK_LGB; a lift-CI that straddles the boundary → INCONCLUSIVE_POWER. _localise_early_stopping uses the multi-metric ablation to name the cause. No I/O, no raise, no print (ADR-25/31).
  • run_q1(fit_fn, datasets, prod_best_params, …) — orchestrator; a failed fit_fnINCONCLUSIVE_TOOLING(reason=fit:…).

5. Interface contracts

FitResult (slice B → slice A): y_holdout, p_buy_holdout, best_iteration, n_estimators_ceiling, per_draw_p_buy/per_draw_y (the draw distribution), label_aligned, fe_fitted_on_train_only, no_look_ahead, label_name. Missing/None → INCONCLUSIVE_TOOLING(reason).

Q1Verdict_ (artefact + event=s14_q1_config_verdict key=value): verdict ∈ {CONFIG_DEGENERATE_LGB, CONFIG_OK_LGB, INCONCLUSIVE_POWER, INCONCLUSIVE_TOOLING}, cause, reason, base_rate, best_iteration, ceiling, auprc_lift (CI), ece, ablation, note.

Frozen thresholds (plan r4): BEST_ITER_FLOOR=3 (anchored), RANK_MATERIAL=0.10 / ECE_BAD=0.10 (conventional, flagged), BASE_RATE_TOL=0.20 (#4 regime match), AUPRC_LIFT_FLOOR=1.10.

6. Integration points (slice B)

  • cache.get_feature_store(crypto, tf) — the normal FE output (canonical-producing; A6-independent).
  • ETL.cvntrade_label.generate_labels_standalone(strategy, tf) — triple-barrier labels on demand (Feast cache carries no label_*).
  • commun.cache.fe_split.compute_embargo_bars — purged embargo gap before the test window (no look-ahead).
  • training.harness.train_with_fixed_params("lightgbm", Datasets, HPOParams) → model with .best_iteration (the real early-stop point prod does not log) + .predict_proba.
  • ADR-90 gate (load-bearing): the harness reads the LGB training HPs from ftf_config.base_env (PG/Console), injected in-pod by the FTF DAG → slice B's real fit is necessarily in-pod.

7. Observability & failure architecture

  • No-crash / fail-loud (ADR-25/31): every error path → structured INCONCLUSIVE_TOOLING(reason=…); never a UI raise, never print; NaN/empty guarded. A dataprep failure → INCONCLUSIVE_TOOLING(reason=dataprep:…).
  • Events: s14_q1_config_verdict (Q1) · s14_q1_datasets_built · s14_q1_draw_skipped (a bad draw doesn't sink the distribution) · s14_lgb_output_verdict (Q2, post-S09).
  • Local smoke: s14_q1_fit_inpod --smoke exercises the wiring on synthetic data (no cache/cluster) → proves no-crash; the real fit (best_iteration capture) is the in-pod dry-run.

8. ADR conformance

ADR How
ADR-0095 diagnostic-story template (artefact 2/5)
ADR-90 training HPs in PG/Console only — why slice B is in-pod (the §0bis pivot's root cause)
ADR-25 no silent fallback; all error paths → INCONCLUSIVE_TOOLING(reason)
ADR-31 event=key=value logs, no print
ADR-23 FE/feature provenance carried (FeatureVersion)
ADR-92 DAG build SHA surfaced (when the in-pod DAG lands)

Files

File State
commun/finetune/diagnostic/s14_lgb_output_validity.py slice A — implemented, 12/12 tests green
commun/finetune/diagnostic/s14_q1_fit_inpod.py slice B — implemented, --smoke no-crash; real fit in-pod
tests/unit/finetune/diagnostic/test_s14_lgb_output_validity.py implemented (12 tests)
dags/dag_diagnostic__s14_q1.py TODO — the in-pod Q1 DAG wrapping run_inpod (operator-triggered)

Open items

  1. In-pod Q1 DAG (dag_diagnostic__s14_q1) wrapping run_inpod — operator-triggered, ADR-92 build SHA.
  2. Regime-matched trusted fold selection — the operator-arbitrated config/draws + the fold windows.
  3. Q2 module (_decide_q2, post-S09) — the r2 read-only fork on the cleared replay.