CVN-N001-ED — Feature Importance ablation (Round 2 of 3)¶
epic_id: CVN-N001-ED
need_id: CVN-N001 (F1 mission, #608)
Status: draft — plan for review
Created: 2026-04-24
Owner: operator
Related: CVN-N001-EC (pte_envelope, #630), n_features variance-based (#640)
Objective¶
Test whether feature-importance-based top-K selection outperforms the variance-based top-K selection shipped in Round 1 (n_features). Round 2 of the 3-round feature ablation roadmap.
Context — what Round 1 tells us¶
Latest n_features FTF run (ftf_20260423_164514_a07cce, 2026-04-23) on γ anchor sl0.5_tp1.5 H4:
- Winner: top_100 (Sortino 1.898)
- Full (~300 features) loses to top_100 by Sortino 0.26 → overfit signal
- Curve is ∩-inverted — optimum around 100–150 features, dropoff at top_50 and below
- But 0/4 statistical significance across all pairs — the 6 variants form a cluster
Interpretation: variance-based top-K captures some of the overfit benefit, but can't discriminate cleanly. Variance is a blunt proxy — a high-variance feature isn't necessarily informative. FI is the natural next lever because it measures actual predictive contribution.
The leakage problem¶
Naïve approach (train model → extract FI → reselect features → retrain on same data) leaks labels into feature selection. Invariant to preserve: the feature set passed to trainer T must be chosen on data disjoint from T's training fold.
Pattern adopted: OOF FI on a reserved reference fold.
Rolling walk-forward (3 folds):
┌──────────────┬──────────────┬──────────────┬──────────────┐
│ FI ref fold │ Fold 1 │ Fold 2 │ Fold 3 │
│ [train FI] │ [train final │ [train final │ [train final │
│ │ · FI applied]│ · FI applied]│ · FI applied]│
└──────────────┴──────────────┴──────────────┴──────────────┘
│ ↑ ↑ ↑
│ └──────────────┴──────────────┘
│ FI cache loaded from here
└→ FI cache written once, read N times, never re-computed from a fold that trains a final model
Trade-offs: - Cost: one additional small training run per crypto (reference fold) — amortized across all subsequent FI-method runs. - Representativity: FI is computed on old data. If feature importance drifts over time, reference FI may be stale. Mitigated by re-running the reference when the training data window shifts materially (e.g. every 3 months). - Alternative rejected: per-fold FI recomputed within each fold on its own train set. More representative but needs separation from final training (e.g. stratified sub-split) — more code, more risk of accidental leakage. Defer unless FI drift observed.
Implementation plan¶
1. Code changes¶
src/commun/cache/components/cvntrade_autonomous_fe.py (~30 lines)
Extend the top-K cap logic with a method switch:
max_features = int(os.environ.get("CVN_MAX_FEATURES", "0"))
method = os.environ.get("CVN_FEATURE_SELECTION_METHOD", "variance").lower()
if max_features > 0 and X_train_transformed.shape[1] > max_features:
if method == "variance":
scores = X_train_transformed.var()
elif method == "fi":
scores = load_fi_reference(
symbol=os.environ["CVN_CRYPTO_SYMBOL"],
strategy=os.environ["CVN_STRATEGY"],
timeframe=os.environ["CVN_TIMEFRAME"],
)
# Align scores to current feature set; raise if missing (ADR-25)
scores = scores.reindex(X_train_transformed.columns).dropna()
if len(scores) < max_features:
raise RuntimeError(
f"FI cache has {len(scores)} features, need {max_features}. "
f"Re-run the FI reference step for {symbol}."
)
else:
raise ValueError(f"Unknown CVN_FEATURE_SELECTION_METHOD={method!r}")
top_k_cols = scores.nlargest(max_features).index.tolist()
...
src/commun/cache/components/feature_importance.py (~100 lines, NEW)
def compute_fi_reference(symbol, strategy, timeframe, fold_train_data):
"""Train a lightweight reference model on fold 0 train, return FI."""
# XGBoost with default params — cheap, no HPO
# Train on the fold 0 train window (no val, no test, no future data)
# importance_type='gain' — measures avg gain per split
# Persist as JSON + MLflow artifact
...
def load_fi_reference(symbol, strategy, timeframe) -> pd.Series:
"""Load FI scores from cache. Fail-fast if absent (ADR-25)."""
...
src/commun/finetune/guardrails.py (~15 lines)
def _validate_feature_selection_method(env, ctx):
method = env.get("CVN_FEATURE_SELECTION_METHOD", "variance").lower()
if method not in ("variance", "fi"):
raise VariantGuardrailError(
f"{ctx}CVN_FEATURE_SELECTION_METHOD={method!r} invalid. "
"Supported: variance | fi"
)
# If fi + non-zero cap requested, the cache must exist (checked at runtime,
# but warn here if running without a prior FI reference on this symbol)
src/commun/finetune/ablation_matrix.py (~15 lines)
New factor feature_importance:
AblationFactor(
name="feature_importance",
factor_type="training",
category="data",
description=(
"Feature count via model-trained importance (round 2 of feature "
"ablation). Requires a pre-computed FI reference (fail-fast if "
"absent). See CVN-N001-ED epic."
),
env_vars={
"variance_100": {
"CVN_FEATURE_SELECTION_METHOD": "variance",
"CVN_MAX_FEATURES": "100",
},
"fi_30": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "30"},
"fi_50": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "50"},
"fi_100": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "100"},
"fi_150": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "150"},
"fi_200": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "200"},
},
)
Note: variance_100 variant included as anchor so cond1 can compare FI-top-K vs variance-top-K at the same K.
2. FI reference run (one-shot pre-FTF)¶
New Airflow DAG: launch__feature_importance_reference
- For each crypto in
defi_top5: - Load fold 0 train data (same slicing as FTF fold 0 train)
- Train XGBoost default params, binary classification,
sample_weightbalanced - Extract
model.feature_importances_withimportance_type='gain' - Write to
cache/feature_importance/<symbol>_<strategy>_<timeframe>.json - Log to MLflow (artifact + tag
purpose=fi_reference,fold_id=0)
Runtime estimate: ~5 min per crypto × 5 = 25 min total.
3. FTF run¶
- Factor:
feature_importance - Variants: 6 (1 variance anchor + 5 FI variants at K=30/50/100/150/200)
- Cryptos:
defi_top5 - Folds: 3
- Trials: 50 per variant
- Anchor env: γ
sl0.5_tp1.5 H4(from ftf_config)
Runtime estimate: 6 variants × 5 cryptos × 3 folds × 50 trials × ~5s per trial ≈ ~3 h.
4. Analysis¶
Reuse scripts/analyze_pte_envelope_run.py with minor adapter (the current anchor assumption is hardcoded to sl0.5_tp1.5 PTE variant; needs generalization to factor-aware anchor). Follow-up: replace hardcoded anchor with a --anchor VARIANT CLI flag.
Success criteria¶
Lock rule (3 conditions, plan §5 template):
1. fi_XXX winner beats the variance_100 anchor: ≥ 2 of 4 metrics BH p<0.05, d ≥ 0.3 in favor of FI
2. advantage = f1_buy − const_F1 > +0.02 for the winner
3. AAVEUSDC Sortino > -1.0 under the winner
If all 3 met → LOCK CVN_FEATURE_SELECTION_METHOD=fi and its K in ftf_config.
If cond1 fails but winner Sortino ≥ 2.0 → PARTIAL LOCK (keep fi_XXX as γ candidate, re-test after Round 3).
If no variant beats the variance anchor → NOT_LOCK → move to Round 3 (feature groups).
Guardrails (ADR-58)¶
CVN_FEATURE_SELECTION_METHOD∈ {variance, fi}method=firequires FI cache present for the symbol — fail-fast at variant launch (ADR-25)- FI cache age > 90 days → warn (soft), require operator confirm via env
CVN_FI_STALE_OK=1 - FI is a guardrailed factor per ADR-58: PR must include integration test that
method=fiwithout a cache raises, and a happy-path test with a stubbed cache
Alternatives rejected¶
- SHAP values instead of gain — 10× slower, minor gain in robustness. Keep as option if the FTF shows gain importance is noisy.
- Permutation importance — most robust but 50× slower on 300 features. Only worth it if gain FI plateaus and Round 2 doesn't unblock F1.
- Recursive Feature Elimination (RFE) — removes features one at a time, expensive + risk of instability. Rejected.
- Per-fold FI instead of reference FI — more representative but needs in-fold separation (sub-split train) — more code and leakage risk. Deferred to Round 2+ if drift observed.
Out of scope¶
- FI for non-XGBoost trainers (LightGBM, CatBoost). XGBoost's gain FI is used as the authoritative reference; the same top-K is applied to all 3 trainers downstream. If a trainer performs dramatically worse with XGBoost-selected features, revisit.
- Mid-flight FI refresh — the reference is computed once per symbol per anchor PTE. If the anchor PTE changes (new γ), the FI cache becomes stale → operator must re-run the reference step.
- FI for
fullvariant — no top-K cap, no selection, no FI needed.fullis tested in Round 1 (n_features) as baseline and doesn't need retest here.
Dependencies¶
- Round 1 (
n_featuresvariance-based): ✅ shipped in #640, analysis complete. - FI cache computation step: NOT yet implemented — blocks Round 2.
- Guardrail for
CVN_FEATURE_SELECTION_METHOD: NOT yet implemented.
Estimated effort¶
| Step | Effort |
|---|---|
cvntrade_autonomous_fe.py method switch |
0.25 d |
feature_importance.py module + FI cache I/O |
0.5 d |
| Guardrails + unit tests | 0.25 d |
ablation_matrix.py new factor + integration test |
0.25 d |
FI reference DAG (launch__feature_importance_reference) |
0.5 d |
| Documentation + ADR-64 (new: "FI-based feature selection requires OOF cache") | 0.25 d |
| CR + review cycles | 0.5 d |
| Total dev | ~2.5 days |
| Operator FI-ref run + FTF run + analysis | ~4 h |
Risks¶
| Risk | Mitigation |
|---|---|
| FI drift between reference fold and training folds | Re-run reference every 3 months; log timestamp + warn if > 90 days |
| Cache missing silently → fallback to variance | Fail-fast via guardrail + ADR-25 |
| FI selection overfits the reference fold | Cross-validate the reference (3-fold inside the reference window) — future hardening |
| XGBoost gain differs systematically from LightGBM / CatBoost's | If observed, switch to permutation importance (model-agnostic) |
References¶
- Parent need:
CVN-N001(F1 mission, #608) - Sibling epics:
CVN-N001-EC(PTE envelope, #630), round 1 (#640) - Analyzer output :
/tmp/nfeatures/analysis_ftf_20260423_164514_a07cce_ATR0.5_1.5_H4.md(not committed; regenerate withpython scripts/analyze_pte_envelope_run.py …) - Round 3 (feature groups): future epic once Round 2 concludes
- ADR-47 (meta-label on separate fold) — same leakage-prevention philosophy applied here
- ADR-56 (every change gated by CVN_* + FTF factor)
- ADR-58 (every factor has guardrail + integration test)
- ADR-59 (all params in ftf_config, editable via Console)
Stories (retro-registered in OP — 2026-06-09)¶
Cet Epic (plan 2026-04-24) n'avait jamais été tracé en OpenProject. Enregistré a posteriori : Epic wp#261 (GH #1150), parent Need CVN-N001.
| Story | Titre | GH · OP | Statut |
|---|---|---|---|
| CVN-N001-ED-S01 | FI ablation impl — selection + FiReferenceStep + guardrail | #1151 · wp#262 | Closed (PRs #655/#656/#663/#684/#685 mergés) |
Non conclu / non-poursuivi (programme pivoté vers le gel ML_USELESS) — non créé en Story : le run FTF FI + décision LOCK/PARTIAL/NOT_LOCK (§3-§5) n'a jamais produit de verdict. Follow-up ouvert : #706 — remplacer variance par MI (variance cassée post-StandardScaler), reste une issue standalone. Réouverture du run FI via nouvelle Story si le travail feature-selection reprend.