ADR-0067 — Pluggable Feature-Selection Framework (variance / covariance / FI / SHAP × {global, perfold})¶
Status: active
Date: 2026-04-25
Introduced by: CVN-N001-ED extension / issue #684
Supersedes: amends ADR-64 (preflight is no longer the unique site of FI compute)
Context¶
Two structural defects surfaced in the 2026-04-24/25 feature_importance FTF run series:
-
Single-shot, fold-0-only FI cache. The legacy
FiReferenceStep(ADR-64) trains XGBoost on fold 0, persists scores to a single global cache, then reuses that ranking for every training fold. In production the operator retrains on the latest 9 months and recomputes feature importance at every release — the FTF was therefore measuring something different from production behaviour. Operator quote: "il n'est pas possible quevariance_100soit meilleur qu'un FI". It IS possible if today's FI is artificially stale. -
Hardcoded ranking method. Adding a new selector (SHAP, covariance, mutual information, ...) required editing
cvntrade_autonomous_fe.py(anif/elif/elseblock) plus the preflight step. There was no extension surface.
Two adjacent failure modes amplified the issues:
- The FE pipeline's
if method == 'fi'branch raised an opaqueRuntimeError("FI cache has N non-zero features ... need K")— the dashboard could not distinguish "K-too-high for this dataset" from "real training crash". - The reference XGBoost model used
depth=6, rounds=200, producing only ~50 features with non-zero importance per crypto. Variants requesting K > 50 always failed regardless of data.
Decision¶
Introduce a pluggable feature-selection framework under commun/finetune/feature_selection/, organised as a Hamilton DAG registry per ADR-61.
Components¶
commun/finetune/feature_selection/
├── dags/
│ ├── shared.py ← x_train, y_train, sample_weights, reference_model
│ ├── variance_dag.py ← variance_scores
│ ├── covariance_dag.py ← covariance_scores
│ ├── fi_dag.py ← fi_scores (consumes shared.reference_model)
│ └── shap_dag.py ← shap_scores (consumes shared.reference_model)
├── registry.py ← REGISTRY: name → (dag_module, output_node, supports_scopes)
├── dispatch.py ← get_selection() entry point + cache + viz
├── top_k.py ← select_top_k(scores, K, policy={fail, truncate})
└── errors.py ← KAboveFloorError, SelectorScopeError, UnknownSelectorError, error_payload
Selectors shipped¶
| Name | Scope(s) | Compute |
|---|---|---|
variance |
inline |
var(X[:, i]) post-stationarisation. Cheap. Target-blind. |
covariance |
inline |
\|cov(X[:, i], y)\| univariate target correlation. |
fi |
global, perfold |
XGBoost gain on the shared reference_model. |
shap |
global, perfold |
mean(\|SHAP value\|) from shap.TreeExplainer on the same reference_model. ~30s if fi already trained the model in the same execute call. |
Scopes¶
inline— computed in the FE pipeline at training time (no preflight cache).global— preflight one-shot, one cache per(symbol, strategy, timeframe). Cheap to read across folds, but stale (= legacy semantics).perfold— recomputed at the start of each fold's training, on that fold's training window. Mirrors production retrain cadence; one cache per(symbol, strategy, timeframe, fold_id).
Console keys¶
CVN_PREFLIGHT_ENABLED master kill-switch (existing)
CVN_GLOBAL_PREFLIGHT gate the global compute
CVN_PERFOLD_PREFLIGHT gate the per-fold compute
CVN_FI_REFERENCE_DEPTH XGBoost depth for fi/shap (default 10, was 6)
CVN_FI_REFERENCE_ROUNDS XGBoost rounds (default 500, was 200)
CVN_FEATURE_SELECTION_METHOD selector name (registry key)
CVN_FEATURE_SELECTION_SCOPE inline | global | perfold
CVN_FEATURE_SELECTION_K_OVERSHOOT_POLICY fail | truncate (default fail, ADR-25)
CVN_FOLD_ID injected by regime_trainer for perfold scope
Invariants¶
- Adding a selector is a 2-step change. (1) New
dags/<name>_dag.pywith a single output function<name>_scores(...)returningpd.Series. (2) New entry inREGISTRY. No edits to dispatch, cache layout, FE pipeline, or regime_trainer. - All compute is Hamilton. Per ADR-61, no imperative class hierarchy. Selectors are typed pure functions; Hamilton resolves the DAG. The dispatcher orchestrates (cache + Hamilton execute), it does not compute.
- Shared nodes deduplicate. When a single execute call requests multiple outputs that share a node (e.g.
fi_scores+shap_scoresboth consumereference_model), Hamilton materialises the shared node exactly once. Zero double training. - Anti-leakage by construction. The dispatcher signature requires
(x_train, y_train)from the caller. The DAG never queries the cache directly. Per-fold scope passes the fold's training window; global scope passes the preflight reference window. No code path can confuse train and test. - Cache key strict isolation.
cache/feature_selection/<method>/<scope>/<SYMBOL>_<strategy>_<tf>[_fold<N>].json. Methods, scopes, folds, and runs cannot collide. Variants on the same fold (fi_50_perfold+fi_150_perfold) share the cache by design and never double-compute. - K-overshoot is structured. When
select_top_k(scores, K, policy='fail')finds K > non-zero count, it raisesKAboveFloorErrorcarrying{requested_k, available_features, selector, scope, symbol}. The persistence layer serialises this to a JSON payload infinetune_results.errorso the dashboard distinguishescode=k_above_floorfrom real training crashes (legacy opaque'training_failed'still works during migration viaerror_payload()). - OTel events at every transition. Each Hamilton node emits
event=feature_selection_node node=<name>viacommun.observability.otel.emit_event. The dispatcher emitsfeature_selection_cache_hit,feature_selection_compute_start,feature_selection_compute_end,feature_selection_compute_failed, plusfeature_selection_cache_window_mismatchwhen aglobalcache is served to a caller whose training window differs from the one the scores were computed on (sharing intent preserved, audit trail kept). Loki/Grafana introspection without log-grep. - Per-scope cache key + mismatch behaviour (PR #686 CR pass 2):
global: key =(method, sym, strat, tf)— shared across folds. Mismatch on cache hit → warning event, scores still served.perfold: key =(method, sym, strat, tf, fold_id)—fold_idrequired. Mismatch on cache hit →RuntimeError(programming bug — caller passed a different(X, y)for the same fold).inline: key includesfold_idwhen supplied (= same isolation as perfold), else falls back to the global-shaped key for ad-hoc one-off compute. Mismatch behaviour same as perfold.- Reference-model hyperparameter bounds (PR #686 CR pass 1):
CVN_FI_REFERENCE_DEPTH∈[1, 30],CVN_FI_REFERENCE_ROUNDS∈[10, 5000]. Out-of-range or non-integer values raiseValueErrorat the edge (Console parse time) instead of crashing inside XGBoost. - Score alignment preserves NaN (PR #686 CR pass 5). The FE pipeline aligns scores to the actual feature columns via
aligned = scores.reindex(...)thenaligned.loc[~aligned.index.isin(scores.index)] = 0.0. Features absent from the cache get 0 (de-prioritised); features present in the cache withNaNkeepNaN(selector-explicit "undefined") and are filtered byselect_top_k's> 0mask. Distinct semantics, not collapsed via.fillna(0). - Single canonical scope-fallback (PR #686 CR pass 4):
commun.finetune.feature_selection.default_scope_for_method(method)returnsREGISTRY[method].supports_scopes[0]and is the only place in the codebase that picks a default scope for a selector. The FE pipeline calls it directly; the Console save-time validator mirrors it. Adding a selector with a non-standardsupports_scopesordering automatically updates the default everywhere.
Alternatives rejected¶
- OOP class hierarchy with abstract
FeatureSelectorbase. Conflicts with ADR-61 ("batch DAGs use Hamilton, not imperative code"). Loses Hamilton's automatic shared-node deduplication. Rejected immediately when caught by operator before phase 2 of the implementation. - Per-method preflight steps. Each new selector would be a
PreflightStepclass. Multiplies the preflight surface; no clean way to sharereference_modelbetween FI and SHAP without a global cache; doesn't address the per-fold need. - Padding with zero-importance features when K > non-zero count. Violates ADR-25 (no silent fallback). Pads the model with noise features chosen at random — would skew Sortino comparisons silently.
- Truncating by default when K > non-zero count. Easier on the operator but loses the signal that the FI procedure has hit its capacity ceiling. ADR-25 default is
fail;truncateis opt-in per variant. - PCA / ICA / autoencoder selectors plugged into the same framework. They are dimensionality reduction, not feature selection — they output linear combinations of features, not a subset. Forcing them through
select_top_kwould be a semantic lie. Tracked as a separate factordimensionality_reductionwith its own framework.
Consequences¶
Forcing functions¶
- New selector = new file + registry entry. Reviewer load minimal.
- Variant naming carries the (method, scope, K) tuple explicitly:
fi_80_perfold,shap_30_global. Dashboards group by prefix without parsing config. - The structured
KAboveFloorErrorpayload becomes a first-class column lens in Grafana (filter byerror_code = 'k_above_floor'), so K-overshoot is visible without grep.
Costs¶
- Per-fold compute adds N folds × ~3 min XGBoost training per crypto (was 1 × 3 min global). With defi_top5 × 5 folds × 1 reference_model per fold per (FI+SHAP combo) = ~75 min added wall-clock per FTF run. Acceptable; balanced by the deeper model + perfold caching.
- Migration of the legacy
cache/feature_importance/<key>.jsonfiles: the new framework writes tocache/feature_selection/fi/global/<key>.json. A one-shot migration script is OUT OF SCOPE for this PR — operators can either (a) re-run preflight to populate the new layout, or (b) symlink the old paths into the new structure. The legacyload_fi_reference()reader stays callable for the existing preflight step.
What becomes easier¶
- Cross-method comparison in the same FTF run. A single ablation factor exercises 4 methods × 2 scopes simultaneously, with shared OHLCV / labels / FE pipeline and identical Optuna seeds. Statistical claims about "FI is better than variance" become defensible.
- Production parity. Per-fold scope mirrors the deployment cadence (operator retrains weekly, recomputes FI on the new training window). The FTF measures what production will see.
- Drop-in selectors.
permutation,mutual_info, custom XGB-based variants (different hyperparameters), or third-party rankers (feature_engine.selection.*) can ship as PRs of ~80 lines each.
What becomes harder¶
- Debugging "why did variant X fail" now requires reading the JSON
error.codeinstead of grepping for "training_failed". Mitigated by the dashboard panel that surfaces error codes. - Schema migration of
finetune_results.error. The column istexttoday; the framework writes JSON into it. Backward-compatible (legacy reads still work) but dashboards that string-matcherror = 'training_failed'need updating toerror::json->>'code' = 'training_failed'(a one-time SQL fix).
Ownership¶
- DRI for the framework: assigned at first follow-up issue. Responsibilities: registry hygiene (no duplicate names, scope declarations match implementation), cache layout enforcement, ADR-67 amendments when new scopes are added.
References¶
- Parent: ADR-61 (Hamilton for batch dataflow), ADR-64 (preflight is a first-class phase — amended here)
- Triggering analysis: GitHub issue #684, conversation 2026-04-24/25
- Related: ADR-25 (no silent fallback, preserved by
KAboveFloorError), ADR-56 (every change A/B testable, achieved by parametrising on (method, scope, K)), ADR-62 (OTel observability — selectors emit structured events at every node), ADR-65 (Console-driven toggles — all framework knobs surfaced inftf_config.base_env) - Implementation:
commun/finetune/feature_selection/package