CVN-N001-ED — Feature Importance ablation (Round 2 of 3)¶

epic_id: CVN-N001-ED need_id: CVN-N001 (F1 mission, #608) Status: draft — plan for review Created: 2026-04-24 Owner: operator Related: CVN-N001-EC (pte_envelope, #630), n_features variance-based (#640)

Objective¶

Test whether feature-importance-based top-K selection outperforms the variance-based top-K selection shipped in Round 1 (n_features). Round 2 of the 3-round feature ablation roadmap.

Context — what Round 1 tells us¶

Latest n_features FTF run (ftf_20260423_164514_a07cce, 2026-04-23) on γ anchor sl0.5_tp1.5 H4:

Winner: top_100 (Sortino 1.898)
Full (~300 features) loses to top_100 by Sortino 0.26 → overfit signal
Curve is ∩-inverted — optimum around 100–150 features, dropoff at top_50 and below
But 0/4 statistical significance across all pairs — the 6 variants form a cluster

Interpretation: variance-based top-K captures some of the overfit benefit, but can't discriminate cleanly. Variance is a blunt proxy — a high-variance feature isn't necessarily informative. FI is the natural next lever because it measures actual predictive contribution.

The leakage problem¶

Naïve approach (train model → extract FI → reselect features → retrain on same data) leaks labels into feature selection. Invariant to preserve: the feature set passed to trainer T must be chosen on data disjoint from T's training fold.

Pattern adopted: OOF FI on a reserved reference fold.

Rolling walk-forward (3 folds):
┌──────────────┬──────────────┬──────────────┬──────────────┐
│  FI ref fold │   Fold 1     │   Fold 2     │   Fold 3     │
│  [train FI]  │ [train final │ [train final │ [train final │
│              │ · FI applied]│ · FI applied]│ · FI applied]│
└──────────────┴──────────────┴──────────────┴──────────────┘
   │                  ↑              ↑              ↑
   │                  └──────────────┴──────────────┘
   │                       FI cache loaded from here
   └→ FI cache written once, read N times, never re-computed from a fold that trains a final model

Trade-offs: - Cost: one additional small training run per crypto (reference fold) — amortized across all subsequent FI-method runs. - Representativity: FI is computed on old data. If feature importance drifts over time, reference FI may be stale. Mitigated by re-running the reference when the training data window shifts materially (e.g. every 3 months). - Alternative rejected: per-fold FI recomputed within each fold on its own train set. More representative but needs separation from final training (e.g. stratified sub-split) — more code, more risk of accidental leakage. Defer unless FI drift observed.

Implementation plan¶

1. Code changes¶

src/commun/cache/components/cvntrade_autonomous_fe.py (~30 lines)

Extend the top-K cap logic with a method switch:

max_features = int(os.environ.get("CVN_MAX_FEATURES", "0"))
method = os.environ.get("CVN_FEATURE_SELECTION_METHOD", "variance").lower()

if max_features > 0 and X_train_transformed.shape[1] > max_features:
    if method == "variance":
        scores = X_train_transformed.var()
    elif method == "fi":
        scores = load_fi_reference(
            symbol=os.environ["CVN_CRYPTO_SYMBOL"],
            strategy=os.environ["CVN_STRATEGY"],
            timeframe=os.environ["CVN_TIMEFRAME"],
        )
        # Align scores to current feature set; raise if missing (ADR-25)
        scores = scores.reindex(X_train_transformed.columns).dropna()
        if len(scores) < max_features:
            raise RuntimeError(
                f"FI cache has {len(scores)} features, need {max_features}. "
                f"Re-run the FI reference step for {symbol}."
            )
    else:
        raise ValueError(f"Unknown CVN_FEATURE_SELECTION_METHOD={method!r}")
    top_k_cols = scores.nlargest(max_features).index.tolist()
    ...

src/commun/cache/components/feature_importance.py (~100 lines, NEW)

def compute_fi_reference(symbol, strategy, timeframe, fold_train_data):
    """Train a lightweight reference model on fold 0 train, return FI."""
    # XGBoost with default params — cheap, no HPO
    # Train on the fold 0 train window (no val, no test, no future data)
    # importance_type='gain' — measures avg gain per split
    # Persist as JSON + MLflow artifact
    ...

def load_fi_reference(symbol, strategy, timeframe) -> pd.Series:
    """Load FI scores from cache. Fail-fast if absent (ADR-25)."""
    ...

src/commun/finetune/guardrails.py (~15 lines)

def _validate_feature_selection_method(env, ctx):
    method = env.get("CVN_FEATURE_SELECTION_METHOD", "variance").lower()
    if method not in ("variance", "fi"):
        raise VariantGuardrailError(
            f"{ctx}CVN_FEATURE_SELECTION_METHOD={method!r} invalid. "
            "Supported: variance | fi"
        )
    # If fi + non-zero cap requested, the cache must exist (checked at runtime,
    # but warn here if running without a prior FI reference on this symbol)

src/commun/finetune/ablation_matrix.py (~15 lines)

New factor feature_importance:

AblationFactor(
    name="feature_importance",
    factor_type="training",
    category="data",
    description=(
        "Feature count via model-trained importance (round 2 of feature "
        "ablation). Requires a pre-computed FI reference (fail-fast if "
        "absent). See CVN-N001-ED epic."
    ),
    env_vars={
        "variance_100": {
            "CVN_FEATURE_SELECTION_METHOD": "variance",
            "CVN_MAX_FEATURES": "100",
        },
        "fi_30":  {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "30"},
        "fi_50":  {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "50"},
        "fi_100": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "100"},
        "fi_150": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "150"},
        "fi_200": {"CVN_FEATURE_SELECTION_METHOD": "fi", "CVN_MAX_FEATURES": "200"},
    },
)

Note: variance_100 variant included as anchor so cond1 can compare FI-top-K vs variance-top-K at the same K.

2. FI reference run (one-shot pre-FTF)¶

New Airflow DAG: launch__feature_importance_reference

For each crypto in defi_top5:
Load fold 0 train data (same slicing as FTF fold 0 train)
Train XGBoost default params, binary classification, sample_weight balanced
Extract model.feature_importances_ with importance_type='gain'
Write to cache/feature_importance/<symbol>_<strategy>_<timeframe>.json
Log to MLflow (artifact + tag purpose=fi_reference, fold_id=0)

Runtime estimate: ~5 min per crypto × 5 = 25 min total.

3. FTF run¶

Factor: feature_importance
Variants: 6 (1 variance anchor + 5 FI variants at K=30/50/100/150/200)
Cryptos: defi_top5
Folds: 3
Trials: 50 per variant
Anchor env: γ sl0.5_tp1.5 H4 (from ftf_config)

Runtime estimate: 6 variants × 5 cryptos × 3 folds × 50 trials × ~5s per trial ≈ ~3 h.

4. Analysis¶

Reuse scripts/analyze_pte_envelope_run.py with minor adapter (the current anchor assumption is hardcoded to sl0.5_tp1.5 PTE variant; needs generalization to factor-aware anchor). Follow-up: replace hardcoded anchor with a --anchor VARIANT CLI flag.

Success criteria¶

Lock rule (3 conditions, plan §5 template): 1. fi_XXX winner beats the variance_100 anchor: ≥ 2 of 4 metrics BH p<0.05, d ≥ 0.3 in favor of FI 2. advantage = f1_buy − const_F1 > +0.02 for the winner 3. AAVEUSDC Sortino > -1.0 under the winner

If all 3 met → LOCK CVN_FEATURE_SELECTION_METHOD=fi and its K in ftf_config. If cond1 fails but winner Sortino ≥ 2.0 → PARTIAL LOCK (keep fi_XXX as γ candidate, re-test after Round 3). If no variant beats the variance anchor → NOT_LOCK → move to Round 3 (feature groups).

Guardrails (ADR-58)¶

CVN_FEATURE_SELECTION_METHOD ∈ {variance, fi}
method=fi requires FI cache present for the symbol — fail-fast at variant launch (ADR-25)
FI cache age > 90 days → warn (soft), require operator confirm via env CVN_FI_STALE_OK=1
FI is a guardrailed factor per ADR-58: PR must include integration test that method=fi without a cache raises, and a happy-path test with a stubbed cache

Alternatives rejected¶

SHAP values instead of gain — 10× slower, minor gain in robustness. Keep as option if the FTF shows gain importance is noisy.
Permutation importance — most robust but 50× slower on 300 features. Only worth it if gain FI plateaus and Round 2 doesn't unblock F1.
Recursive Feature Elimination (RFE) — removes features one at a time, expensive + risk of instability. Rejected.
Per-fold FI instead of reference FI — more representative but needs in-fold separation (sub-split train) — more code and leakage risk. Deferred to Round 2+ if drift observed.

Out of scope¶

FI for non-XGBoost trainers (LightGBM, CatBoost). XGBoost's gain FI is used as the authoritative reference; the same top-K is applied to all 3 trainers downstream. If a trainer performs dramatically worse with XGBoost-selected features, revisit.
Mid-flight FI refresh — the reference is computed once per symbol per anchor PTE. If the anchor PTE changes (new γ), the FI cache becomes stale → operator must re-run the reference step.
FI for full variant — no top-K cap, no selection, no FI needed. full is tested in Round 1 (n_features) as baseline and doesn't need retest here.

Dependencies¶

Round 1 (n_features variance-based): ✅ shipped in #640, analysis complete.
FI cache computation step: NOT yet implemented — blocks Round 2.
Guardrail for CVN_FEATURE_SELECTION_METHOD: NOT yet implemented.

Estimated effort¶

Step	Effort
`cvntrade_autonomous_fe.py` method switch	0.25 d
`feature_importance.py` module + FI cache I/O	0.5 d
Guardrails + unit tests	0.25 d
`ablation_matrix.py` new factor + integration test	0.25 d
FI reference DAG (`launch__feature_importance_reference`)	0.5 d
Documentation + ADR-64 (new: "FI-based feature selection requires OOF cache")	0.25 d
CR + review cycles	0.5 d
Total dev	~2.5 days
Operator FI-ref run + FTF run + analysis	~4 h

Risks¶

Risk	Mitigation
FI drift between reference fold and training folds	Re-run reference every 3 months; log timestamp + warn if > 90 days
Cache missing silently → fallback to variance	Fail-fast via guardrail + ADR-25
FI selection overfits the reference fold	Cross-validate the reference (3-fold inside the reference window) — future hardening
XGBoost gain differs systematically from LightGBM / CatBoost's	If observed, switch to permutation importance (model-agnostic)

References¶

Parent need: CVN-N001 (F1 mission, #608)
Sibling epics: CVN-N001-EC (PTE envelope, #630), round 1 (#640)
Analyzer output : /tmp/nfeatures/analysis_ftf_20260423_164514_a07cce_ATR0.5_1.5_H4.md (not committed; regenerate with python scripts/analyze_pte_envelope_run.py …)
Round 3 (feature groups): future epic once Round 2 concludes
ADR-47 (meta-label on separate fold) — same leakage-prevention philosophy applied here
ADR-56 (every change gated by CVN_* + FTF factor)
ADR-58 (every factor has guardrail + integration test)
ADR-59 (all params in ftf_config, editable via Console)

Stories (retro-registered in OP — 2026-06-09)¶

Cet Epic (plan 2026-04-24) n'avait jamais été tracé en OpenProject. Enregistré a posteriori : Epic wp#261 (GH #1150), parent Need CVN-N001.

Story	Titre	GH · OP	Statut
CVN-N001-ED-S01	FI ablation impl — selection + FiReferenceStep + guardrail	#1151 · wp#262	Closed (PRs #655/#656/#663/#684/#685 mergés)

Non conclu / non-poursuivi (programme pivoté vers le gel ML_USELESS) — non créé en Story : le run FTF FI + décision LOCK/PARTIAL/NOT_LOCK (§3-§5) n'a jamais produit de verdict. Follow-up ouvert : #706 — remplacer variance par MI (variance cassée post-StandardScaler), reste une issue standalone. Réouverture du run FI via nouvelle Story si le travail feature-selection reprend.