Plan dossier — CVN-N001-EE-S17 — Externalisation totale des hyperparamètres training¶

Date : 2026-05-11 Story : CVN-N001-EE-S17 (à créer post-committee, OP wp à allouer) Author : Dominique (operator) + Claude Session type : plan_review — implementation plan for the full externalization mandated by ADR-90 Related ADR : ADR-90 (just written today, this dossier is its companion implementation plan) Status : awaiting committee verdict

1. Why this Story exists¶

Two empirical events converged on 2026-05-11 :

The harness migration regression — f1_buy dropped from 0.42 to 0.22 on defi_top5 5m post-PRs #891/#896/#899/#901. Diagnosed root cause : 18 hyperparameter divergences silently introduced. Documented in documentation/reviews/2026-05-11-cvn-n001-ee-s16-harness-baseline-validation-experiment.md.
The validation experiment itself failed — patching defaults in code + helm upgrade (Option Z) was overridden by HPO Optuna's suggest_* calls. Live Loki showed learning_rate=0.016149 even after the patched-defaults image was deployed. The patches were dead code.

Together these motivate ADR-90 : every hyperparameter (defaults + HPO ranges) MUST live in PG ftf_config (Console-editable), no in-code defaults, fail-fast or WARN-fallback only.

This Story implements ADR-90's first PR (the LGB+XGB+CB scope identified by the diagnostic).

2. Scope — what gets externalized¶

2.1 Defaults (per-model × per-timeframe)¶

model	timeframes	per-TF default count	total defaults
XGB	1m, 5m, 15m, 30m, 1h	9 (max_depth, learning_rate, n_estimators, subsample, colsample_bytree, min_child_weight, gamma, reg_alpha, reg_lambda)	45
LGB	1m, 5m, 15m, 30m, 1h	9 (num_leaves, max_depth, learning_rate, n_estimators, min_child_samples, subsample, colsample_bytree, reg_alpha, reg_lambda)	45
CB	1m, 5m, 15m, 30m, 1h	4 (depth, learning_rate, iterations, l2_leaf_reg)	20
Subtotal defaults			110

2.2 HPO ranges (per-model × per-timeframe)¶

Each HPO-suggestable param has 3 keys : _RANGE_MIN, _RANGE_MAX, _RANGE_SCALE (linear or log).

model	timeframes	HPO range count per TF	total HPO range keys
XGB	1m, 5m, 15m, 30m, 1h	9 params × 3 keys = 27	135
LGB	1m, 5m, 15m, 30m, 1h	9 params × 3 keys = 27	135
CB	1m, 5m, 15m, 30m, 1h	4 params × 3 keys = 12	60
Subtotal HPO ranges			330

2.3 Cross-cutting hyperparams (model-agnostic, TF-agnostic)¶

Already in Console (verified 2026-05-11) : - CVN_EARLY_STOPPING_ROUNDS — 1 key (kept as-is, just reading-side fix) - CVN_HPO_OBJECTIVE — 1 key - CVN_THRESHOLD_METHOD — 1 key

No new keys here ; the resolver helper just centralizes the existing reads.

2.4 TOTAL : ~440 env vars to seed in Console¶

This is the operator's accepted heaviness (per 2026-05-11 statement "cela va alourdir son UI, I know").

3. Naming convention (per ADR-90 Clause 1)¶

CVN_HPO_<MODEL>_<TF>_<PARAM>                  # default
CVN_HPO_<MODEL>_<TF>_<PARAM>_RANGE_MIN        # HPO suggest min
CVN_HPO_<MODEL>_<TF>_<PARAM>_RANGE_MAX        # HPO suggest max
CVN_HPO_<MODEL>_<TF>_<PARAM>_RANGE_SCALE      # "linear" or "log"

MODEL ∈ {XGB, LGB, CB} (uppercase, no separator) TF ∈ {1M, 5M, 15M, 30M, 1H} (uppercase, no separator) PARAM : uppercase, underscores (e.g. LEARNING_RATE, MAX_DEPTH, REG_LAMBDA)

Examples : - CVN_HPO_XGB_5M_LEARNING_RATE = "0.07" - CVN_HPO_XGB_5M_LEARNING_RATE_RANGE_MIN = "0.05" - CVN_HPO_XGB_5M_LEARNING_RATE_RANGE_MAX = "0.15" - CVN_HPO_XGB_5M_LEARNING_RATE_RANGE_SCALE = "linear"

Type enforcement : the resolver parses to int, float, or str based on the param name (LEARNING_RATE → float, MAX_DEPTH → int, etc.). Naming convention table is the source of truth, kept in commun/finetune/hyperparams.py.

4. Implementation deliverables¶

4.1 Helper module `commun/finetune/hyperparams.py`¶

# Pseudo-code, finalized in implementation
def resolve(model_type: str, timeframe: str, param_name: str, fallback: Any | None = None) -> Any:
    """ADR-90 canonical resolver. Read docstring there."""
    key = f"CVN_HPO_{model_type.upper()}_{timeframe.upper()}_{param_name.upper()}"
    raw = os.environ.get(key)
    if raw is not None:
        return _parse(raw, _expected_type(param_name))
    if fallback is None:
        raise RuntimeError(
            f"HP {key} not in Console — set via ftf_config.base_env per ADR-90. "
            f"See documentation/adr/0090-training-hyperparameters-in-pg-console-only.md"
        )
    log_event(
        level="WARN",
        event="hpo_fallback_applied",
        model=model_type, timeframe=timeframe, param=param_name,
        fallback=fallback, key=key, reason="env_key_missing",
    )
    return fallback


def resolve_hpo_range(model_type: str, timeframe: str, param_name: str) -> tuple[Any, Any, str]:
    """Returns (min, max, scale). All three keys must be present or RuntimeError."""
    ...

4.2 Patch `lightgbm_dag.py` / `xgboost_dag.py` / `catboost_dag.py`¶

Each *_native_params function and each _hpo_space function reads via resolve() / resolve_hpo_range() instead of hpo_params.get(..., default) / trial.suggest_*(min, max).

Example XGB (illustrative, full diff in PR) :

# BEFORE (ADR-90 violation)
def xgb_native_params(hpo_params, xgb_binary):
    return {
        "max_depth": p.get("max_depth", 6),  # ← in-code default
        "learning_rate": p.get("learning_rate", 0.1),  # ← in-code default
        ...
    }

def _hpo_space(trial):
    return {
        "max_depth": trial.suggest_int("max_depth", 5, 12),  # ← in-code range
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3, log=True),  # ← in-code range
        ...
    }

# AFTER (ADR-90 compliant)
def xgb_native_params(hpo_params, xgb_binary, timeframe):
    return {
        "max_depth": p.get("max_depth", resolve("XGB", timeframe, "MAX_DEPTH", fallback=6)),
        "learning_rate": p.get("learning_rate", resolve("XGB", timeframe, "LEARNING_RATE", fallback=0.1)),
        ...
    }

def _hpo_space(trial, timeframe):
    mn, mx, sc = resolve_hpo_range("XGB", timeframe, "MAX_DEPTH")
    max_depth = trial.suggest_int("max_depth", mn, mx)
    mn, mx, sc = resolve_hpo_range("XGB", timeframe, "LEARNING_RATE")
    lr = trial.suggest_float("learning_rate", mn, mx, log=(sc == "log"))
    ...

The timeframe parameter is threaded down from the FTF runner via the existing Datasets dataclass or HPOParams (decision made during implementation).

4.3 Console seeding script `scripts/seed_hyperparams_console.py`¶

Idempotent CLI :

python scripts/seed_hyperparams_console.py \
  --console-host console.cvntrade.eu \
  --dry-run        # prints the 440 keys + values without writing

python scripts/seed_hyperparams_console.py \
  --console-host console.cvntrade.eu \
  --apply          # writes the keys ; existing keys with same value = no-op ;
                   # existing keys with different value = SKIP + WARN log line
                   # to allow operator to inspect before overwriting

Source values bundled in the script as Python dicts, derived from : - Legacy cvntrade_XGBoost_config.py::GRID_DEFAULT_HP[<TF>] (per-TF) and XGBoostHyperConfig._apply_timeframe_specific_params (per-TF HPO ranges) → XGB defaults + HPO ranges - Legacy cvntrade_LightGBM_config.py::GRID_DEFAULT_HP_LGB (TF-agnostic, replicated × 5 TFs) → LGB defaults - Legacy cvntrade_CatBoost_config.py::CatBoostConfig dataclass → CB defaults - LGB + CB HPO ranges : derived from existing harness _hpo_space (the harness ranges, kept verbatim because no legacy HPO ranges exist for LGB/CB — they used GridSearch not Optuna pre-harness)

4.4 CI grep gate `Story workflow guardrails (G5)`¶

New step in .github/workflows/ci.yml :

- name: Story workflow guardrails (G5) — ADR-90 hyperparams in code
  if: always()
  shell: bash
  run: |
    set -e
    VIOLATIONS=$(grep -rn -E '(learning_rate|max_depth|reg_alpha|reg_lambda|subsample|colsample|min_child|n_estimators|num_leaves|gamma|l2_leaf_reg|depth|iterations|early_stopping_rounds)[^=]*=\s*[0-9]' \
      src/training/ src/commun/finetune/ \
      --include='*.py' \
      --exclude-dir=__pycache__ \
      | grep -v 'hyperparams\.py' \
      || true)
    if [ -n "$VIOLATIONS" ]; then
      echo "::error title=Guardrail G5 — ADR-90 violation::Hyperparameter literals found in source. ADR-90 mandates Console-only via commun/finetune/hyperparams.py::resolve()"
      echo "$VIOLATIONS"
      exit 1
    fi
    echo "::notice title=Guardrail G5::No ADR-90 violations found"

Initially shipped in warn-only mode (logs but does not fail) for the first sprint to catch any in-flight PRs ; flips to fail-the-build mode in PR-2.

4.5 Parity tests (mandatory per ADR-25 + ADR-58)¶

tests/unit/training_harness/test_hyperparams_resolver_parity.py : - Verify resolve("XGB", "5m", "LEARNING_RATE") returns 0.07 when CVN_HPO_XGB_5M_LEARNING_RATE=0.07 is set (matches legacy) - Verify the resolver raises RuntimeError with the canonical message when key missing AND no fallback - Verify the resolver emits event=hpo_fallback_applied WARN log when fallback is used - Verify the seeding script's bundled values match the legacy values byte-for-byte (compare against git show e75418ca^:src/training/... — the deletion commit)

tests/unit/training_harness/test_hpo_space_uses_resolver.py : - Mock resolve_hpo_range and verify _hpo_space calls it for each param (no in-code numeric literals reach trial.suggest_*)

4.6 Grafana panel `cvntrade-hp-coverage`¶

Loki query :

sum by () (count_over_time({namespace="cvntrade"} |~ "event=hpo_fallback_applied" [7d]))
/ sum by () (count_over_time({namespace="cvntrade"} |~ "event=training_started" [7d]))

Dashboard rule : threshold WARN at 5% (some HPs fallback-resolved), CRIT at 30% (Console seeding broken).

4.7 mlops_readiness file¶

Per ADR-70, this Story touches src/training/ + src/commun/finetune/ → mlops_readiness mandatory. File location : documentation/stories/CVN-N001-EE-S17/mlops_readiness.md. Content adapts the template : - §1 monitoring : event=hpo_fallback_applied Loki query as the "% Console coverage" health metric - §2 alerting : CRIT alert when fallback rate > 30% - §3 drift : N/A (no model drift surface — this is a code refactor) - §4 rollout : staged in 2 PRs (S17 = LGB+XGB+CB ; S18 = threshold/calibration/etc.) ; W1 canary on defi_top5 5m - §5 rollback : revert helm tag = revert resolver = back to pre-S17 state ; Console seeding script idempotent so seeded keys harmless - §6 DRI : @dococeven, sunset 2026-08-11

5. Risk analysis¶

Risk	Severity	Mitigation
Console seeding script applies wrong legacy values (e.g. for LGB which had no per-TF defaults, the script replicates 1 value × 5 TFs ; what if 1m needs a different value?)	Medium	Script is idempotent and SKIP-on-conflict ; operator reviews dry-run output before --apply ; parity tests gate the seeded values
Threading `timeframe` through the harness DAGs requires touching every node signature	Medium	`timeframe` is already in `Datasets` dataclass via `Datasets.feature_version` / similar metadata ; minimal additional wiring
440 keys overload `ftf_config.base_env` JSONB column (size limit)	Low	PG JSONB has multi-MB capacity ; 440 short string values ≪ 1 MB
The Console UI doesn't paginate well at 440+ keys → operator can't find what they need	High UX	Accepted explicitly by operator on 2026-05-11 ; future "Console UX" sprint addresses
The CI grep gate G5 has false positives (e.g. matches a comment that mentions `learning_rate=0.05`)	Low	grep targets `=` followed by digit on left ; comments would have `#` before. Tested in dry-run mode for first sprint to catch false positives.
Legacy values that we re-seed are themselves out of date (e.g., the pre-harness baseline f1=0.42 may have been reachable but not optimal)	Medium	The seeding restores the pre-regression baseline ; subsequent FTF sweeps are then free to find better values via HPO with the corrected ranges
Parity test compares against `git show e75418ca^` but the legacy file may have been edited intermediate to that commit	Low	Cross-check with git log on the legacy files ; pick the LAST commit before deletion as the reference snapshot
Operator forgets to run the seeding script post-deploy → resolver fail-fasts → ALL trainings crash	High	Seeding is part of the Story closure ritual (ADR-79 §5) ; CI deploy-time check : `kubectl exec -- python -c 'from commun.finetune.hyperparams import audit_console_coverage; audit_console_coverage()'` ; refuses to mark deploy `green` if coverage < 100%
The 7-day fallback audit gate (Clause 4 of ADR-90) is missed and fallback path becomes permanent	Medium	OP Story comment template includes the 7-day check as a TODO ; calendar reminder ; Grafana panel CRIT threshold flips at d+7

6. Definition of done¶

7. Plan B / fail-back¶

If the externalization PR fails post-merge (e.g., a critical resolver bug crashes every training in prod) : - Revert the merge commit (single-commit revert) - Helm upgrade back to pre-S17 SHA - Console-seeded keys are harmless (the pre-S17 code ignores them — falls back to in-code defaults) - Open a new investigation Story for the resolver bug ; do NOT block on it because the pre-S17 state, while harness-broken, IS the current production state today

8. Out of scope (deferred to follow-up Stories)¶

PR-2 / CVN-N001-EE-S18 : Externalize threshold sweep params, calibration choice, regime weighting alpha, focal-loss params (the rest of the training-config surface)
CVN-N001-EE-S19 : Console UX optimization sprint — search, group-by-model, presets ("legacy 5m", "Track 11 v2"), bulk-edit
CVN-N0XX : Per-PTE / per-crypto hyperparam dimensions IF empirical evidence justifies (TBD ; current ADR-90 explicitly rejects this as out-of-scope)
CVN-N0XX : Auto-tuned HPO ranges based on data drift detection (long-term FTF protocol evolution)

10. Committee plan_review v1 — addressed (2026-05-11)¶

Session ea2e71ff — verdict PASSED / OK / strong consensus (5 experts, all in favour). Reason : "The implementation plan is sound, comprehensively addressing critical hyperparameter divergence issues and establishing a robust, observable, and mechanically enforced framework."

Recommendations addressed in v2¶

#	Reco	Status
1	Adjust Grafana CRIT threshold from 30% to 5-10% (more aggressive — 30% means major config issue)	✅ Updated §4.6 : WARN at 1%, CRIT at 5% (was 30%)
2	Refine LGB/CB seeding : explicitly flag the replicated TF-agnostic values as `TODO: differentiate by TF` in the script output	✅ Updated §4.3 : seeding script logs `[TODO_PER_TF]` next to each LGB/CB key during `--dry-run` and `--apply` ; documented in seeding-output template
3	Expedite CI grep gate : warn-only window must be days (not full sprint)	✅ Updated §4.4 : warn-only window capped at 3 days post-merge ; flip to fail-the-build is part of S17 closure (not a separate Story)
4	Explicitly define config injection mechanism : PG → env vars in K8s pods (full lifecycle)	✅ Added §11 below
5	Detail AuthN/AuthZ for Console UI + ftf_config access	✅ Added §12 below
6	Prioritize Console UX sprint (CVN-N001-EE-S19) to mitigate 440-key UX friction	✅ Acknowledged ; S19 priority bumped (operator to schedule in next sprint planning)

Dissents — operator decisions¶

TF-agnostic vs TF-aware granularity : 2 experts argued for hybrid (TF-agnostic for invariants like random_state). Operator decision : stick with TF-aware naming everywhere in PR-1 to keep the resolver mechanically simple ; invariant values like random_state=42 use CVN_HPO_<MODEL>_<TF>_RANDOM_STATE=42 replicated × 5 TFs. Documented in §3 already. Acceptable redundancy ; future ADR amendment may collapse if needed.
Strictness of fail-fast during migration : 3 experts wanted gentler migration (fallback acceptable for 1 sprint), 2 wanted strict fail-fast from PR-1. Operator decision : fail-fast with explicit fallback parameter from PR-1 (matches ADR-90 Clause 2). The fallback path is what the WARN log catches ; aggressive Grafana CRIT (reco #1) means missing keys surface fast.

11. Config injection mechanism (committee reco #4)¶

The full lifecycle from ftf_config.base_env to the harness Python code at runtime :

ftf_config (PG JSONB column, id=1)
    │
    │ (read at DAG parse time per ADR-65, via commun.finetune.dag_config.load_finetune_pte_defaults)
    ↓
finetune__pte DAG task `validate_params`
    │
    │ ──────────────►  os.environ injection : `os.environ[key] = value`
    │                  for every k,v in BASE_ENV
    │                  (one-shot before the K8sPodOperator spawns the worker pod)
    ↓
worker pod (Airflow KubernetesExecutor task) inherits the full os.environ
    │
    │ ──────────────►  commun.finetune.hyperparams.resolve()
    │                  reads os.environ[CVN_HPO_<MODEL>_<TF>_<PARAM>]
    ↓
LGB / XGB / CB DAG node uses the resolved value in lgb.train / xgb.train / model.fit

Critical assumption : the K8sPodOperator inherits the scheduler pod's os.environ. Verified in production : the existing CVN_EARLY_STOPPING_ROUNDS=150 Console value reaches the training pod (confirmed via event=training_started Loki sampling pre-2026-05-09 ; broken after the harness migration because the code stopped reading the env var, not because the env var stopped being injected).

Fail-fast on env-not-injected : the resolver's RuntimeError doubles as a guard against env-injection failure. If the Console value is set but os.environ.get(key) returns None, the resolver fails immediately with the canonical message ; the operator inspects kubectl exec ... env | grep CVN_HPO_ to confirm the injection path.

Lifecycle of a value change : 1. Operator edits value in Console UI 2. Console writes to ftf_config.base_env (PG) 3. Operator re-triggers the FTF DAG via Airflow UI (manual, ADR-22) 4. validate_params task reads ftf_config.base_env, injects into os.environ 5. K8sPodOperator spawns worker pod with the env 6. Training task reads via resolve() and uses the new value

Wall time from edit to first training-with-new-value : <30 seconds (Console save + 1 click DAG trigger + scheduler queues task). Compare to pre-S17 : 10 min CI build + 5 min helm upgrade + 1 click DAG trigger = ~15 min. 30× speedup is the headline operator benefit.

12. AuthN/AuthZ for Console UI + ftf_config access (committee reco #5)¶

Current state (pre-S17) : - Console UI : single-user, single-secret (operator-only). No role-based access. Authentication via shared cvntrade-env-secrets-bound HTTP basic auth. - ftf_config direct PG access : same credentials as the Airflow scheduler (read+write via mlflow PG user). Anyone with kubectl exec rights into the scheduler pod can psql -c "UPDATE ftf_config ...".

S17 changes : NONE in scope — the Console UI access model is unchanged ; the PG access model is unchanged. The 440 new keys inherit the existing security posture, which is : - Console is operator-only (single-user). No SSO. No audit log of who edited what. - PG ftf_config.updated_by column captures the writer's identity (manual writes set it ; the script seed_hyperparams_console.py will set updated_by='seed_script@<git_sha>' for traceability). - kubectl RBAC governs who can shell into pods to bypass the Console.

Future ADR-91 candidate : multi-user Console with role separation (read-only auditors, write-allowed operators), session-scoped audit log of every key change, MFA. NOT in S17. Documented as a known gap.

13. Open questions for the committee (v1 — ANSWERED in §10)¶

The original §9 open questions are answered in the v2 committee verdict + this revision. Re-listed for reference but no longer require a response :

Is the timeframe-aware naming convention (CVN_HPO_<MODEL>_<TF>_<PARAM>) the right granularity, OR should we ALSO support timeframe-agnostic keys (for params truly invariant across TFs, e.g. random_state=42) to reduce Console clutter?
Should the resolver fail-fast RuntimeError block the WHOLE FTF DAG (current design) OR fall back to the harness in-code values silently for the first sprint then flip to fail-fast (gentler migration)? ADR-90 currently mandates fail-fast ; this is the operator's explicit decision ; flag if disagreement.
The 440-key seeding script bundles legacy values. For LGB and CB which had NO per-TF differentiation in legacy, we replicate 1 value × 5 TFs. Is this the right default, or should the seeding script flag "TODO : differentiate by TF" for LGB/CB so the operator can tune later?
The CI grep gate G5 is initially warn-only. Is the 1-sprint warn-only window acceptable, or should it ship in fails-the-build mode from PR-1?
The Grafana panel CRIT threshold for fallback rate is set to 30%. Is this appropriate, or too lenient/strict?