Enrichment Indicator Inventory — Phase 0 of #599 refactor¶

Status: v1 — Phase 0 deliverable, to be ticked off by the implementation team during Phase 2 porting. Sources: src/ETL/cvntrade_enrich.py (1378 lines) + src/ETL/post_enrichment/cvntrade_xgboost_feature_generator.py (712 lines). Audit date: 2026-04-20. Feature count (stable): ≈ 150 columns (depends on timeframe/params).

1. How to read this inventory¶

Each feature is classified into one of five compute classes, determining the incremental-update strategy the refactor will use:

Class	Label	Update rule	Warmup needed
R	Simple recursive	`y_t = α·x_t + (1−α)·y_{t−1}` — keep 1 float per indicator	small, converges geometrically
W	Wilder smoothing (Welles Wilder EMA variant)	same as R, α = 1/N	small, converges geometrically
B	Finite-window ring buffer	keep last N values, O(1) update with deque	exactly N bars
T	Multi-timeframe resample	state = current partial higher-tf bar + sub-indicator state (recursive structure)	`N × (higher_tf / base_tf)` + sub-indicator warmup
S	Stateless per-candle function	no state — function of current bar only (or ≤ few prior bars via shift)	0 to handful of bars
X	Non-incremental (blocker)	requires full past or non-local operator	—

Verdict column : - ✅ incremental-compatible by closed-form (R/W/T) - 🟡 incremental-compatible via ring buffer (B) — memory linear in N, still O(1) update - 🟢 stateless trivial (S) - 🔴 blocker — inherently batch or tricky; flag for scope discussion

Feature names that end in _shifted or _lag_* are pure shifts of other features; they're class S with the caveat that the base feature must be stored in the state.

2. Classification summary (counts)¶

Class	Count (approx.)	Action
R/W Recursive / Wilder	15	Port first (simplest, closed-form, smallest risk)
B Ring buffer	80+	Port in bulk (same template pattern for rolling ops)
T Multi-timeframe	6	Requires dedicated sub-engine design
S Stateless	40+	Trivial port (shift / diff / pct_change / function of current bar)
X Blocker	1 candidate	⚠️ Flagged — Fear & Greed external merge, see §7

No other blockers identified as of Phase 0. Implementation team should re-audit xgb_* during Phase 2 start.

3. pandas-ta indicators (class R/W/B)¶

Generated by ta.bbands / ta.rsi / ta.mfi / ta.atr / ta.macd / ta.stoch / ta.adx / ta.sma, configured via self.params. Names include the period as a suffix. For a standard timeframe the columns are:

Column pattern	Class	Formula	WARMUP_BARS	Verdict
`RSI_{period}` (×3: short/medium/long)	W	Wilder RSI `gain_smooth=(1−1/n)·prev + (1/n)·gain`	≈ 5×N for 1e-3 convergence; 10×N for 1e-6	✅
`MFI_{period}` (×3)	W	Money Flow Index — Wilder smoothing on money-flow ratio	≈ 5×N	✅
`ATRr_{period}` (×3: short/medium/long)	W	Wilder ATR `atr_t=(1−1/n)·atr_{t−1}+(1/n)·tr_t`	≈ 5×N	✅
`BBL/BBM/BBU/BBB/BBP_{period}_{std}` (×3)	B	SMA ± k·std on window	N bars	🟡
`MACD_{f}_{s}_{sig}`, `MACDs_`, `MACDh_` (×3)	R	EMA(fast) − EMA(slow), signal = EMA(diff)	≈ 5×max(s, sig)	✅
`STOCHk_{k}_{d}_{smooth}`, `STOCHd_*` (×3)	B	`100 × (close − low_N) / (high_N − low_N)`, smoothed	max(k, d) bars	🟡
`ADX_{period}`, `DMP_{period}`, `DMN_{period}` (×2)	W	Wilder smoothing on directional movement	≈ 5×N	✅
`SMA_{period}` (×2: medium/long)	B	Simple rolling mean	N bars	🟡
`distance_SMA_{period}` (×2)	S	`(close − SMA_N) / SMA_N` — function of SMA	inherits SMA warmup	🟢

Total: ~30 columns. All ✅ or 🟡 — textbook streaming indicators, zero risk.

4. Custom rolling / stateless (class B or S)¶

4.1 Volume-derived¶

Column	Class	Formula	WARMUP	Verdict
`Volume_Delta`	S	`volume.diff()`	1	🟢
`SMA_Volume_{period}` (×3)	B	rolling mean of volume	N	🟡
`volume_ma_5/10/20`	B	rolling mean	5/10/20	🟡
`volume_ratio_5/10/20`	S	`volume / volume_ma_*`	inherits	🟢
`volume_price_momentum`	S	`volume_ratio_5 × price_change_abs_1h`	inherits	🟢
`volume_weighted_price`	B	rolling sum(close × volume) / rolling sum(volume)	N	🟡
`vwap_deviation`	S	`(close − vwap) / close`	inherits	🟢
`volume_trend_3`	B	rolling `1 if last > first else -1`	3	🟡
`volume_price_efficiency`	S	`volume / (high − low)`	0	🟢
`market_impact_proxy`	S	`\|price_change_1h\| / (volume_ratio_5 + 0.1)`	inherits	🟢
`volume_momentum`	S	`volume.pct_change(N)`	N	🟢
`volume_autocorr_3h`	B	rolling corr on `volume.shift(lag)`	`win + lag`	🟡
`institutional_buying_pressure`	B	rolling sum of `volume × (close−open)/open if up`	N	🟡
`volume_weighted_up_pressure`	B	rolling sum `volume × max(0, price_change_1h)`	N	🟡
`order_flow_imbalance`	B	`(vol_up_ma − vol_down_ma) / (vol_up_ma + vol_down_ma + 1)`	N	🟡
`volume_acceleration`	B	`mean(rolling ratio)` over window	N	🟡
`volume_regime_change`	S	`(volume_ratio_5 > 1.3 × volume_ratio_5.shift(1))`	1	🟢

4.2 Price-volatility / range (rolling-std family)¶

Column	Class	Formula	WARMUP	Verdict
`price_volatility_5/10/20`	B	`rolling_std / rolling_mean` (coefficient of variation)	N	🟡
`price_change_1h/3h/6h`	S	`close.pct_change(N)`	N	🟢
`price_change_abs_1h/3h`	S	`\|price_change_*\|`	inherits	🟢
`high_low_range_pct`	S	`(high − low) / close`	0	🟢
`high_low_range_ma_ratio_5/10`	B	`range / rolling_mean(range)`	5/10	🟡
`close_to_high_ratio`, `close_to_low_ratio`	S	ratios on current bar	0	🟢
`atr_normalized`	S	`ATRr / close`	inherits	🟢
`atr_expansion`	B	`ATRr / rolling_mean(ATRr, N)`	N	🟡
`ATRr_{period}_std_10`, `_std_30`	B	rolling std of ATR	10/30	🟡
`volatility_regime_high/low`	B	`price_volatility > rolling_quantile(0.8)`	20	🟡 — quantile: check pandas impl is incremental-friendly (skiplist / tdigest optional)
`volatility_breakout`	B	`price_volatility > rolling_mean × 1.5`	10	🟡
`volatility_regime_change`	S	`price_vol > price_vol.shift(1) × 1.5`	1	🟢
`velocity_consistency`	B	rolling std of pct_change	3	🟡
`volatility_regime`	B	`(high − low) / rolling_mean(high−low, N)`	N	🟡
`market_trending_strength`	B	rolling std + diff composition	N	🟡
`intraday_return_volatility`	S	`(high − low) / open`	0	🟢

4.3 Momentum / support-resistance¶

Column	Class	Formula	WARMUP	Verdict
`price_momentum_3`	B	`rolling.apply(lambda)` — first-vs-last in window	3	🟡 — rewrite as closed-form `(close − close.shift(N)) / close.shift(N)`
`price_acceleration`	S	`price_change_1h.diff(1)`	2	🟢
`recent_high_5/recent_low_5`	B	rolling max/min	5	🟡
`resistance_breach`, `support_breach`	S	compare close vs shifted recent extreme	6	🟢
`recent_high_N`, `recent_low_N` (gating v2/v3/v4)	B	rolling max/min	N	🟡
`dynamic_support_strength`	B	`rolling_min(low) / close`	N	🟡
`upward_breakout_signal`	B	compare high to shifted rolling-max(high)	N	🟡
`resistance_break_strength`	B	same pattern	N	🟡
`support_level_strength`, `resistance_proximity`	B	2-window max/min ratios	max(N_short, N_long)	🟡
`momentum_divergence`	B	`price_change_1h − rolling_mean(price_change_3h)`	N	🟡
`acceleration_signal`	B	rolling mean of price_acceleration	N	🟡
`ma_cross_bullish`	B	compare two SMAs + their shifted versions	max(N)	🟡
`price_above_ma_strength`	S	`(close − SMA) / SMA`	inherits	🟢
`momentum_consistency`, `momentum_strength`	S	sum of sign(price_change_*)	inherits	🟢
`higher_highs_signal`, `higher_lows_signal`	S	compare `.shift(1)` vs `.shift(2)`	2	🟢
`bullish_structure_score`	S	sum of above	inherits	🟢
`price_position_short/medium/long`	S	`close / SMA_N − 1`	inherits	🟢
`trend_alignment_score`	S	sum of ordinal compares	inherits	🟢

4.4 Market microstructure / pattern proxies¶

Column	Class	Formula	WARMUP	Verdict
`bid_ask_pressure`	S	`sign(close − open)`	0	🟢
`price_tick_movement`	S	`sign(close.diff())`	1	🟢
`price_tick_persistence`	B	rolling sum of tick movement	N	🟡
`doji_pattern`, `hammer_pattern`	S	boolean on current OHLC	0	🟢
`buyer_seller_pressure`	S	`(close − open) / (high − low)`	0	🟢
`local_fractal_high`, `local_fractal_low`	S	3-bar fractal with `.shift(1)` + `.shift(2)`	2	🟢
`price_entropy`	B	entropy of `close / sum(close)` over rolling window	N	🟡 — rewrite as incremental Shannon if the window is large (today `vol_med` = 10, so ring buffer is cheap)
`price_autocorr_3h`	B	`rolling.corr(shift(lag))`	`win + lag`	🟡 — incremental corr via Welford works; ring buffer simpler for small N
`price_vs_vwap_strength`	B	inherits VWAP	N	🟡
`bullish_divergence_1h`	S	compare `price_change_1h > 0 & rsi < rsi.shift(1)`	1	🟢
`bullish_momentum_acceleration`	B	rolling mean of diff(price_change_1h)	N	🟡
`volatility_skew_up`	B	rolling mean of (high.pct_change − low.pct_change)	N	🟡
`price_efficiency_up`	S	`(close − open) / (high − low) if up else 0`	0	🟢
`price_gap`	S	`(open − close.shift(1)) / close.shift(1)`	1	🟢
`significant_gap`	S	boolean of gap > 0.005	1	🟢
`up_down_battle_score`	B	ratio of rolling sums	N	🟡
`candle_sentiment_momentum`	B	rolling mean of weighted candle pattern sum	N	🟡
`rsi_timeframe_divergence`	S	sum of indicator comparisons	inherits RSIs	🟢
`momentum_confluence_strength`	S	mean of `\|rsi − 50\|`	inherits RSIs	🟢
`rsi_momentum`	S	`RSI.diff(1)`	1	🟢
`rsi_action_zone`, `rsi_neutral_zone`	S	boolean on RSI	0	🟢
`adx_momentum`	S	`ADX.diff(1)`	1	🟢
`adx_action_zone`	S	`ADX > 25`	0	🟢

4.5 Composite gating scores (v1/v2/v3/v4/v5)¶

Column	Class	Notes
`gating_opportunity_score` (v1)	S	sum of 3 indicator booleans
`gating_opportunity_score_v2`	S	weighted sum of clipped features
`gating_opportunity_score_v3`	S	same pattern, more terms
`gating_opportunity_score_v4_up`	S	same
`direction_prediction_score_v5`	S	same

All composite scores are stateless combinations of already-computed features. Port cost: zero as long as the inputs are ported.

4.6 Chandeliers / candlestick (class S, 60+ columns)¶

ta.cdl_pattern() outputs ~60 CDL_* columns (e.g. CDL_DOJI, CDL_HAMMER, …) each of which depends on the last 3-5 bars max (stateless pattern matchers).

Column pattern	Class	Verdict
`CDL_*` (all candlestick patterns)	S	🟢
`candle_sentiment_score`	S	🟢
`bullish_pressure_score`, `bearish_pressure_score`, `indecision_score`	S	🟢

Zero risk.

4.7 Temporal cyclic (class S)¶

Column	Class	Verdict
`hour_sin`, `hour_cos`	S	🟢
`day_sin`, `day_cos`	S	🟢

Function of timestamp only. Trivial.

4.8 Lagged features (class S)¶

Column	Class	Verdict
`RSI_{N}_lag_1/3/5`, `ADX_{N}_lag_1/3/5`	S	🟢 — state just needs to hold the last 5 values of the base column

5. Multi-timeframe resamples (class T — needs dedicated design)¶

Column	Class	Why it's T	WARMUP	Verdict
`atr_1h_price`	T	ATR on 1h-resampled OHLCV, computed from 15m base	≈ `4 × ATR_window` in 15m bars	✅ via sub-engine
`atr_1h_pct`	T	`atr_1h_price / close_1h`	same	✅
`atr_1h_zscore`	T	z-score of `atr_1h_pct` over 168h (1 week) = 672 × 15m bars — largest window in the inventory	672 bars	🟡 — ring buffer of 672 floats per crypto = 5.4 kB, fine
`realized_vol_1h`	T	rolling std of `close.pct_change()` over 1h = 4 × 15m bars	4	✅
`range_ratio_1h`	T	`(high − low) / atr_1h_price` (per-candle, inherits atr_1h)	inherits	✅
`trend_strength_6h`	T	`\|close.pct_change(24)\| / atr_1h_pct` — 6h = 24 × 15m bars	24 + atr_1h warmup	✅

Design note: the multi-timeframe sub-engine holds: - a partial higher-tf bar (open, high, low, close, volume accumulator) updated on each base-tf candle - a sub-indicator state (Wilder ATR on the 1h axis) - a ring buffer of the last M higher-tf bars (for z-score, rolling stats)

When the base-tf timestamp crosses a higher-tf boundary, the partial bar becomes final, the sub-indicator updates, and a new partial bar starts. Aligned-close semantics (committee decision §11.3).

This is the highest-complexity sub-task of Phase 2. Budget 1 week of the implementation timeline to this engine alone (the design doc 6-8w estimate assumed this).

6. Fear & Greed integration (class X — blocker candidate)¶

df_fng = _get_fear_and_greed_cached(self.logger)   # external API, process cache
df_calc["_fng_date"] = pd.to_datetime(...).dt.normalize()
df_calc["fear_and_greed"] = df_calc["_fng_date"].map(df_fng_keyed).fillna(50.0)

Column	Class	Issue
`fear_and_greed`	X (partial)	Values change on a daily calendar key, not per candle. Current impl reads the entire F&G series and does a date-based map. Incremental port needs: (a) a daily cache of `{date → F&G value}` warmed at bootstrap, (b) per-candle lookup by `candle.timestamp.date()`.

Verdict: ✅ portable, no blocker — incremental port is straightforward: load F&G series at bootstrap, look up by date at each candle, same API as today. Reclassified from X to S-with-external-state. Keep the flag on this row as a reminder: external-data features need their state loaded at bootstrap, not recomputed from the candle stream.

7. XGBoost post-enrichment features (class R/S/B, composites)¶

All xgb_* features are post-hoc combinations of already-enriched columns. They compute on top of the enrichment output, using the same pandas ops (rolling / diff / pct_change / ratios). None introduce new memory primitives.

Column	Input dependency	Class	Verdict
`xgb_rsi_squared_grp2`	RSI_medium	S	🟢
`xgb_rsi_log_transform_grp2`	RSI_medium	S	🟢
`xgb_rsi_exp_decay_grp2`	RSI_medium	S	🟢
`xgb_price_sma_ratio_log_grp1`	close, SMA_medium	S	🟢
`xgb_price_volatility_grp1`	close, rolling_std(close, 50)	B (N=50)	🟡
`xgb_network_activity_proxy_grp2`	volume, close	S	🟢
`xgb_mining_pressure_grp2`	volume, close, rolling means	B (N=20)	🟡 — known high NaN rate (logged at runtime: `>50% NaN (historique court?)`), confirm porting matches
`xgb_sentiment_composite_grp2`	RSI, pct_change	S	🟢
`xgb_crypto_regime_indicator_grp3`	SMA_medium/long ratio	S	🟢
`xgb_momentum_volume_composite_*_grp2`	momentum × sign(volume)	S	🟢
`xgb_ratio_*_grp2` (dynamic tech ratios)	RSI, MACD, etc.	S	🟢
`xgb_ma_alignment_short_medium_grp3`	SMA(5), SMA(20)	B	🟡
`xgb_ma_alignment_medium_long_grp3`	SMA(20), SMA(50)	B	🟡
`xgb_ma_convergence_strength_grp3`	SMA(5), SMA(50)	B	🟡
`xgb_vpt_momentum_grp3`	cumulative `volume × pct_change`, diff(5)	R (cumsum has state)	✅ — state = running VPT, diff over last 6 values
`xgb_advanced_ad_indicator_grp2`	close, high, low, volume, rolling(10)	B	🟡
`xgb_hurst_proxy_5d_grp3`	rolling std/mean of `pct_change(5)` over 50 bars	B	🟡
`xgb_volume_entropy_proxy_grp2`	`volume / rolling sum(volume, 50)`, log	B	🟡
`xgb_accel_{N}_close_mean_grp2` (N=3/6/12)	`close.pct_change().diff()` rolling mean	B	🟡
`xgb_accel_{N}_volume_mean_grp2`	same for volume	B	🟡
`xgb_accel_{N}_stability_grp2`	`rolling_std² of accel`	B	🟡
`xgb_accel_{N}_cross_mom_grp2`	`price_mom × vol_mom` (both rolling)	B	🟡
`xgb_accel_{N}_divergence_grp2`	`price_mom − vol_mom`	B	🟡
`xgb_accel_{N}_curvature_norm_grp2`	`rolling_mean(accel) / rolling_std(close)`	B	🟡
`xgb_accel_regime_consistency_grp2`	compare sign(short vs long accel)	S	🟢
`xgb_accel_strength_ratio_grp2`	short_accel /	long_accel
`xgb_accel_breakthrough_{N}_grp2` (N=6/12)	adaptive threshold via 48-bar rolling stats	B	🟡
`xgb_accel_persist_{N}_grp2`	same + 3-bar momentum	B	🟡
`xgb_accel_{N}_range_grp2` (N=3/6)	rolling mean of `range_pct.diff()`	B	🟡
`xgb_accel_asymmetry_{N}_grp2` (N=12/24)	ratio of rolling counts	B	🟡
`xgb_accel_amplitude_ratio_{N}_grp2`	ratio of conditional rolling means	B	🟡

xgb_ audit result: ~30 xgb_ features, zero blockers*. All fit cleanly in class B or S. xgb_vpt_momentum_grp3 is the only one with cumulative state — straightforward (running sum + diff over last 6 values).

8. Blockers identified¶

None at Phase 0 closure. The one candidate (fear_and_greed) was reclassified to S-with-external-state (§6) once the external-data state pattern was clarified.

However, three items warrant flagging for Phase 2 architectural attention:

atr_1h_zscore — largest window in the inventory (672 bars ring buffer). Not a blocker, just the sizing benchmark for state memory.
Rolling quantiles (volatility_regime_high/low, uses rolling.quantile(0.8)/quantile(0.2) over 20 bars). Pandas' rolling.quantile is stateless; a strictly incremental version needs a skiplist or tdigest. Given N=20, a simple ring-buffer recompute per candle is O(20 log 20) = trivial. Flag as "acceptable for now, revisit if profiling shows it matters".
price_autocorr_3h — rolling Pearson correlation with a shifted copy. Welford's online algorithm extends to covariance; for small window (likely 12 bars) a ring buffer is simpler.

None are scope-blockers. They are implementation-detail decisions for the relevant indicator class templates.

9. WARMUP_BARS aggregation¶

The Phase 0 deliverable feeds the Phase 2 runtime validation in bootstrap():

# src/commun/pipeline/enrichment/engine.py
WARMUP_BARS_PER_FEATURE = {
    "RSI_14": 70,          # 5 × 14
    "ADX_14": 70,
    "SMA_50":  50,
    "MACD_24_64_18": 320,  # 5 × 64
    "atr_1h_zscore": 672,  # 168h @ 15m
    # ...
}

REQUIRED_BOOTSTRAP_BARS = max(WARMUP_BARS_PER_FEATURE.values())
# ≈ 672 bars at 15m timeframe = 7 days

Rough ballpark for the typical config (timeframe=15m, rsi_long=28, atr_long=30, macd_slow=64, zscore_1h_window=168h):

REQUIRED_BOOTSTRAP_BARS ≈ 672 (driven by atr_1h_zscore)

A typical FTF run already ingests 30k+ bars per crypto, so the warmup constraint is met 45× over. Paper/live deployments need to fetch ≥ 672 bars (≈ 7 days at 15m) at startup — matches committee decision §11.4.

10. Implementation tick-off grid¶

The Phase 2 implementation team ticks off each class block as it lands. Dependencies go top-down (recursive first, multi-tf last).

Block	Owner	Target phase	Status
StatefulEnricher core + EMA stub template	dococeven	Phase 1	✅ done 2026-04-21 (63/63 tests green post-CR-rounds + post-rebase 2026-05-03 ; original 57 + 6 added during CR rounds 1-2)
Shadow-comparison harness wiring	dococeven	Phase 1	✅ done 2026-04-21 (facade flag dispatch + divergence budget)
Recursive/Wilder base templates (EMA, Wilder, MACD) — extend from the Phase 1 EMA template	dococeven	2.1	unblocked, ready to start
pandas-ta indicator shims (RSI, ATR, ADX, STOCH, BBands)	dococeven	2.2	blocked on 2.1
Ring-buffer base templates (SMA, rolling std/mean/max/min, rolling quantile)	dococeven	2.3	blocked on 2.1
Gating v1/v2/v3/v4/v5 port (stateless combinations on top of above)	dococeven	2.4	blocked on 2.2/2.3
Candlestick pattern shims (stateless)	dococeven	2.4	blocked on 2.1
Temporal cyclic (hour_sin, …)	dococeven	2.4	can start after 2.1
Multi-timeframe sub-engine + atr_1h_* family	dococeven	2.5	blocked on 2.2/2.3 (biggest block)
Fear & Greed external-state loader	dococeven	2.6	independent, can start after 2.1
xgb_* features (all ~30)	dococeven	2.7	blocked on 2.1-2.5

11. Post-inventory confidence statement (for committee)¶

~150 features audited, broken down across 5 compute classes.
0 hard blockers identified.
Biggest single sub-task: the multi-timeframe sub-engine (§5) — budget 1 week inside the Phase 2 allocation.
Memory budget per (symbol, tf, feature_set) key: dominated by atr_1h_zscore ring buffer (672 × 8 B = 5.4 kB) + a handful of smaller rings. Total ≈ 10–15 kB of state floats per key, well within the 2.5 MB budget estimated in documentation/../design/CVN-N005-stateful-enrichment.md §8quater.
The 6–8 week estimate for 1 FTE is realistic given this inventory. Reducing to 4 weeks would require 2 FTE in parallel (one on core+simple, one on multi-tf).

This document is the authoritative spec of Phase 2 porting. Any indicator encountered in code that is not in this inventory must be added here and reclassified before being ported.