Skip to content

Enrichment Indicator Inventory — Phase 0 of #599 refactor

Status: v1 — Phase 0 deliverable, to be ticked off by the implementation team during Phase 2 porting. Sources: src/ETL/cvntrade_enrich.py (1378 lines) + src/ETL/post_enrichment/cvntrade_xgboost_feature_generator.py (712 lines). Audit date: 2026-04-20. Feature count (stable): ≈ 150 columns (depends on timeframe/params).


1. How to read this inventory

Each feature is classified into one of five compute classes, determining the incremental-update strategy the refactor will use:

Class Label Update rule Warmup needed
R Simple recursive y_t = α·x_t + (1−α)·y_{t−1} — keep 1 float per indicator small, converges geometrically
W Wilder smoothing (Welles Wilder EMA variant) same as R, α = 1/N small, converges geometrically
B Finite-window ring buffer keep last N values, O(1) update with deque exactly N bars
T Multi-timeframe resample state = current partial higher-tf bar + sub-indicator state (recursive structure) N × (higher_tf / base_tf) + sub-indicator warmup
S Stateless per-candle function no state — function of current bar only (or ≤ few prior bars via shift) 0 to handful of bars
X Non-incremental (blocker) requires full past or non-local operator

Verdict column : - ✅ incremental-compatible by closed-form (R/W/T) - 🟡 incremental-compatible via ring buffer (B) — memory linear in N, still O(1) update - 🟢 stateless trivial (S) - 🔴 blocker — inherently batch or tricky; flag for scope discussion

Feature names that end in _shifted or _lag_* are pure shifts of other features; they're class S with the caveat that the base feature must be stored in the state.


2. Classification summary (counts)

Class Count (approx.) Action
R/W Recursive / Wilder 15 Port first (simplest, closed-form, smallest risk)
B Ring buffer 80+ Port in bulk (same template pattern for rolling ops)
T Multi-timeframe 6 Requires dedicated sub-engine design
S Stateless 40+ Trivial port (shift / diff / pct_change / function of current bar)
X Blocker 1 candidate ⚠️ Flagged — Fear & Greed external merge, see §7

No other blockers identified as of Phase 0. Implementation team should re-audit xgb_* during Phase 2 start.


3. pandas-ta indicators (class R/W/B)

Generated by ta.bbands / ta.rsi / ta.mfi / ta.atr / ta.macd / ta.stoch / ta.adx / ta.sma, configured via self.params. Names include the period as a suffix. For a standard timeframe the columns are:

Column pattern Class Formula WARMUP_BARS Verdict
RSI_{period} (×3: short/medium/long) W Wilder RSI gain_smooth=(1−1/n)·prev + (1/n)·gain ≈ 5×N for 1e-3 convergence; 10×N for 1e-6
MFI_{period} (×3) W Money Flow Index — Wilder smoothing on money-flow ratio ≈ 5×N
ATRr_{period} (×3: short/medium/long) W Wilder ATR atr_t=(1−1/n)·atr_{t−1}+(1/n)·tr_t ≈ 5×N
BBL/BBM/BBU/BBB/BBP_{period}_{std} (×3) B SMA ± k·std on window N bars 🟡
MACD_{f}_{s}_{sig}, MACDs_*, MACDh_* (×3) R EMA(fast) − EMA(slow), signal = EMA(diff) ≈ 5×max(s, sig)
STOCHk_{k}_{d}_{smooth}, STOCHd_* (×3) B 100 × (close − low_N) / (high_N − low_N), smoothed max(k, d) bars 🟡
ADX_{period}, DMP_{period}, DMN_{period} (×2) W Wilder smoothing on directional movement ≈ 5×N
SMA_{period} (×2: medium/long) B Simple rolling mean N bars 🟡
distance_SMA_{period} (×2) S (close − SMA_N) / SMA_N — function of SMA inherits SMA warmup 🟢

Total: ~30 columns. All ✅ or 🟡 — textbook streaming indicators, zero risk.


4. Custom rolling / stateless (class B or S)

4.1 Volume-derived

Column Class Formula WARMUP Verdict
Volume_Delta S volume.diff() 1 🟢
SMA_Volume_{period} (×3) B rolling mean of volume N 🟡
volume_ma_5/10/20 B rolling mean 5/10/20 🟡
volume_ratio_5/10/20 S volume / volume_ma_* inherits 🟢
volume_price_momentum S volume_ratio_5 × price_change_abs_1h inherits 🟢
volume_weighted_price B rolling sum(close × volume) / rolling sum(volume) N 🟡
vwap_deviation S (close − vwap) / close inherits 🟢
volume_trend_3 B rolling 1 if last > first else -1 3 🟡
volume_price_efficiency S volume / (high − low) 0 🟢
market_impact_proxy S |price_change_1h| / (volume_ratio_5 + 0.1) inherits 🟢
volume_momentum S volume.pct_change(N) N 🟢
volume_autocorr_3h B rolling corr on volume.shift(lag) win + lag 🟡
institutional_buying_pressure B rolling sum of volume × (close−open)/open if up N 🟡
volume_weighted_up_pressure B rolling sum volume × max(0, price_change_1h) N 🟡
order_flow_imbalance B (vol_up_ma − vol_down_ma) / (vol_up_ma + vol_down_ma + 1) N 🟡
volume_acceleration B mean(rolling ratio) over window N 🟡
volume_regime_change S (volume_ratio_5 > 1.3 × volume_ratio_5.shift(1)) 1 🟢

4.2 Price-volatility / range (rolling-std family)

Column Class Formula WARMUP Verdict
price_volatility_5/10/20 B rolling_std / rolling_mean (coefficient of variation) N 🟡
price_change_1h/3h/6h S close.pct_change(N) N 🟢
price_change_abs_1h/3h S |price_change_*| inherits 🟢
high_low_range_pct S (high − low) / close 0 🟢
high_low_range_ma_ratio_5/10 B range / rolling_mean(range) 5/10 🟡
close_to_high_ratio, close_to_low_ratio S ratios on current bar 0 🟢
atr_normalized S ATRr / close inherits 🟢
atr_expansion B ATRr / rolling_mean(ATRr, N) N 🟡
ATRr_{period}_std_10, _std_30 B rolling std of ATR 10/30 🟡
volatility_regime_high/low B price_volatility > rolling_quantile(0.8) 20 🟡 — quantile: check pandas impl is incremental-friendly (skiplist / tdigest optional)
volatility_breakout B price_volatility > rolling_mean × 1.5 10 🟡
volatility_regime_change S price_vol > price_vol.shift(1) × 1.5 1 🟢
velocity_consistency B rolling std of pct_change 3 🟡
volatility_regime B (high − low) / rolling_mean(high−low, N) N 🟡
market_trending_strength B rolling std + diff composition N 🟡
intraday_return_volatility S (high − low) / open 0 🟢

4.3 Momentum / support-resistance

Column Class Formula WARMUP Verdict
price_momentum_3 B rolling.apply(lambda) — first-vs-last in window 3 🟡 — rewrite as closed-form (close − close.shift(N)) / close.shift(N)
price_acceleration S price_change_1h.diff(1) 2 🟢
recent_high_5/recent_low_5 B rolling max/min 5 🟡
resistance_breach, support_breach S compare close vs shifted recent extreme 6 🟢
recent_high_N, recent_low_N (gating v2/v3/v4) B rolling max/min N 🟡
dynamic_support_strength B rolling_min(low) / close N 🟡
upward_breakout_signal B compare high to shifted rolling-max(high) N 🟡
resistance_break_strength B same pattern N 🟡
support_level_strength, resistance_proximity B 2-window max/min ratios max(N_short, N_long) 🟡
momentum_divergence B price_change_1h − rolling_mean(price_change_3h) N 🟡
acceleration_signal B rolling mean of price_acceleration N 🟡
ma_cross_bullish B compare two SMAs + their shifted versions max(N) 🟡
price_above_ma_strength S (close − SMA) / SMA inherits 🟢
momentum_consistency, momentum_strength S sum of sign(price_change_*) inherits 🟢
higher_highs_signal, higher_lows_signal S compare .shift(1) vs .shift(2) 2 🟢
bullish_structure_score S sum of above inherits 🟢
price_position_short/medium/long S close / SMA_N − 1 inherits 🟢
trend_alignment_score S sum of ordinal compares inherits 🟢

4.4 Market microstructure / pattern proxies

Column Class Formula WARMUP Verdict
bid_ask_pressure S sign(close − open) 0 🟢
price_tick_movement S sign(close.diff()) 1 🟢
price_tick_persistence B rolling sum of tick movement N 🟡
doji_pattern, hammer_pattern S boolean on current OHLC 0 🟢
buyer_seller_pressure S (close − open) / (high − low) 0 🟢
local_fractal_high, local_fractal_low S 3-bar fractal with .shift(1) + .shift(2) 2 🟢
price_entropy B entropy of close / sum(close) over rolling window N 🟡 — rewrite as incremental Shannon if the window is large (today vol_med = 10, so ring buffer is cheap)
price_autocorr_3h B rolling.corr(shift(lag)) win + lag 🟡 — incremental corr via Welford works; ring buffer simpler for small N
price_vs_vwap_strength B inherits VWAP N 🟡
bullish_divergence_1h S compare price_change_1h > 0 & rsi < rsi.shift(1) 1 🟢
bullish_momentum_acceleration B rolling mean of diff(price_change_1h) N 🟡
volatility_skew_up B rolling mean of (high.pct_change − low.pct_change) N 🟡
price_efficiency_up S (close − open) / (high − low) if up else 0 0 🟢
price_gap S (open − close.shift(1)) / close.shift(1) 1 🟢
significant_gap S boolean of gap > 0.005 1 🟢
up_down_battle_score B ratio of rolling sums N 🟡
candle_sentiment_momentum B rolling mean of weighted candle pattern sum N 🟡
rsi_timeframe_divergence S sum of indicator comparisons inherits RSIs 🟢
momentum_confluence_strength S mean of |rsi − 50| inherits RSIs 🟢
rsi_momentum S RSI.diff(1) 1 🟢
rsi_action_zone, rsi_neutral_zone S boolean on RSI 0 🟢
adx_momentum S ADX.diff(1) 1 🟢
adx_action_zone S ADX > 25 0 🟢

4.5 Composite gating scores (v1/v2/v3/v4/v5)

Column Class Notes
gating_opportunity_score (v1) S sum of 3 indicator booleans
gating_opportunity_score_v2 S weighted sum of clipped features
gating_opportunity_score_v3 S same pattern, more terms
gating_opportunity_score_v4_up S same
direction_prediction_score_v5 S same

All composite scores are stateless combinations of already-computed features. Port cost: zero as long as the inputs are ported.

4.6 Chandeliers / candlestick (class S, 60+ columns)

ta.cdl_pattern() outputs ~60 CDL_* columns (e.g. CDL_DOJI, CDL_HAMMER, …) each of which depends on the last 3-5 bars max (stateless pattern matchers).

Column pattern Class Verdict
CDL_* (all candlestick patterns) S 🟢
candle_sentiment_score S 🟢
bullish_pressure_score, bearish_pressure_score, indecision_score S 🟢

Zero risk.

4.7 Temporal cyclic (class S)

Column Class Verdict
hour_sin, hour_cos S 🟢
day_sin, day_cos S 🟢

Function of timestamp only. Trivial.

4.8 Lagged features (class S)

Column Class Verdict
RSI_{N}_lag_1/3/5, ADX_{N}_lag_1/3/5 S 🟢 — state just needs to hold the last 5 values of the base column

5. Multi-timeframe resamples (class T — needs dedicated design)

Column Class Why it's T WARMUP Verdict
atr_1h_price T ATR on 1h-resampled OHLCV, computed from 15m base 4 × ATR_window in 15m bars ✅ via sub-engine
atr_1h_pct T atr_1h_price / close_1h same
atr_1h_zscore T z-score of atr_1h_pct over 168h (1 week) = 672 × 15m bars — largest window in the inventory 672 bars 🟡 — ring buffer of 672 floats per crypto = 5.4 kB, fine
realized_vol_1h T rolling std of close.pct_change() over 1h = 4 × 15m bars 4
range_ratio_1h T (high − low) / atr_1h_price (per-candle, inherits atr_1h) inherits
trend_strength_6h T |close.pct_change(24)| / atr_1h_pct — 6h = 24 × 15m bars 24 + atr_1h warmup

Design note: the multi-timeframe sub-engine holds: - a partial higher-tf bar (open, high, low, close, volume accumulator) updated on each base-tf candle - a sub-indicator state (Wilder ATR on the 1h axis) - a ring buffer of the last M higher-tf bars (for z-score, rolling stats)

When the base-tf timestamp crosses a higher-tf boundary, the partial bar becomes final, the sub-indicator updates, and a new partial bar starts. Aligned-close semantics (committee decision §11.3).

This is the highest-complexity sub-task of Phase 2. Budget 1 week of the implementation timeline to this engine alone (the design doc 6-8w estimate assumed this).


6. Fear & Greed integration (class X — blocker candidate)

df_fng = _get_fear_and_greed_cached(self.logger)   # external API, process cache
df_calc["_fng_date"] = pd.to_datetime(...).dt.normalize()
df_calc["fear_and_greed"] = df_calc["_fng_date"].map(df_fng_keyed).fillna(50.0)
Column Class Issue
fear_and_greed X (partial) Values change on a daily calendar key, not per candle. Current impl reads the entire F&G series and does a date-based map. Incremental port needs: (a) a daily cache of {date → F&G value} warmed at bootstrap, (b) per-candle lookup by candle.timestamp.date().

Verdict: ✅ portable, no blocker — incremental port is straightforward: load F&G series at bootstrap, look up by date at each candle, same API as today. Reclassified from X to S-with-external-state. Keep the flag on this row as a reminder: external-data features need their state loaded at bootstrap, not recomputed from the candle stream.


7. XGBoost post-enrichment features (class R/S/B, composites)

All xgb_* features are post-hoc combinations of already-enriched columns. They compute on top of the enrichment output, using the same pandas ops (rolling / diff / pct_change / ratios). None introduce new memory primitives.

Column Input dependency Class Verdict
xgb_rsi_squared_grp2 RSI_medium S 🟢
xgb_rsi_log_transform_grp2 RSI_medium S 🟢
xgb_rsi_exp_decay_grp2 RSI_medium S 🟢
xgb_price_sma_ratio_log_grp1 close, SMA_medium S 🟢
xgb_price_volatility_grp1 close, rolling_std(close, 50) B (N=50) 🟡
xgb_network_activity_proxy_grp2 volume, close S 🟢
xgb_mining_pressure_grp2 volume, close, rolling means B (N=20) 🟡 — known high NaN rate (logged at runtime: >50% NaN (historique court?)), confirm porting matches
xgb_sentiment_composite_grp2 RSI, pct_change S 🟢
xgb_crypto_regime_indicator_grp3 SMA_medium/long ratio S 🟢
xgb_momentum_volume_composite_*_grp2 momentum × sign(volume) S 🟢
xgb_ratio_*_grp2 (dynamic tech ratios) RSI, MACD, etc. S 🟢
xgb_ma_alignment_short_medium_grp3 SMA(5), SMA(20) B 🟡
xgb_ma_alignment_medium_long_grp3 SMA(20), SMA(50) B 🟡
xgb_ma_convergence_strength_grp3 SMA(5), SMA(50) B 🟡
xgb_vpt_momentum_grp3 cumulative volume × pct_change, diff(5) R (cumsum has state) ✅ — state = running VPT, diff over last 6 values
xgb_advanced_ad_indicator_grp2 close, high, low, volume, rolling(10) B 🟡
xgb_hurst_proxy_5d_grp3 rolling std/mean of pct_change(5) over 50 bars B 🟡
xgb_volume_entropy_proxy_grp2 volume / rolling sum(volume, 50), log B 🟡
xgb_accel_{N}_close_mean_grp2 (N=3/6/12) close.pct_change().diff() rolling mean B 🟡
xgb_accel_{N}_volume_mean_grp2 same for volume B 🟡
xgb_accel_{N}_stability_grp2 rolling_std² of accel B 🟡
xgb_accel_{N}_cross_mom_grp2 price_mom × vol_mom (both rolling) B 🟡
xgb_accel_{N}_divergence_grp2 price_mom − vol_mom B 🟡
xgb_accel_{N}_curvature_norm_grp2 rolling_mean(accel) / rolling_std(close) B 🟡
xgb_accel_regime_consistency_grp2 compare sign(short vs long accel) S 🟢
xgb_accel_strength_ratio_grp2 short_accel / long_accel
xgb_accel_breakthrough_{N}_grp2 (N=6/12) adaptive threshold via 48-bar rolling stats B 🟡
xgb_accel_persist_{N}_grp2 same + 3-bar momentum B 🟡
xgb_accel_{N}_range_grp2 (N=3/6) rolling mean of range_pct.diff() B 🟡
xgb_accel_asymmetry_{N}_grp2 (N=12/24) ratio of rolling counts B 🟡
xgb_accel_amplitude_ratio_{N}_grp2 ratio of conditional rolling means B 🟡

xgb_ audit result: ~30 xgb_ features, zero blockers*. All fit cleanly in class B or S. xgb_vpt_momentum_grp3 is the only one with cumulative state — straightforward (running sum + diff over last 6 values).


8. Blockers identified

None at Phase 0 closure. The one candidate (fear_and_greed) was reclassified to S-with-external-state (§6) once the external-data state pattern was clarified.

However, three items warrant flagging for Phase 2 architectural attention:

  1. atr_1h_zscore — largest window in the inventory (672 bars ring buffer). Not a blocker, just the sizing benchmark for state memory.
  2. Rolling quantiles (volatility_regime_high/low, uses rolling.quantile(0.8)/quantile(0.2) over 20 bars). Pandas' rolling.quantile is stateless; a strictly incremental version needs a skiplist or tdigest. Given N=20, a simple ring-buffer recompute per candle is O(20 log 20) = trivial. Flag as "acceptable for now, revisit if profiling shows it matters".
  3. price_autocorr_3h — rolling Pearson correlation with a shifted copy. Welford's online algorithm extends to covariance; for small window (likely 12 bars) a ring buffer is simpler.

None are scope-blockers. They are implementation-detail decisions for the relevant indicator class templates.


9. WARMUP_BARS aggregation

The Phase 0 deliverable feeds the Phase 2 runtime validation in bootstrap():

# src/commun/pipeline/enrichment/engine.py
WARMUP_BARS_PER_FEATURE = {
    "RSI_14": 70,          # 5 × 14
    "ADX_14": 70,
    "SMA_50":  50,
    "MACD_24_64_18": 320,  # 5 × 64
    "atr_1h_zscore": 672,  # 168h @ 15m
    # ...
}

REQUIRED_BOOTSTRAP_BARS = max(WARMUP_BARS_PER_FEATURE.values())
# ≈ 672 bars at 15m timeframe = 7 days

Rough ballpark for the typical config (timeframe=15m, rsi_long=28, atr_long=30, macd_slow=64, zscore_1h_window=168h):

REQUIRED_BOOTSTRAP_BARS ≈ 672 (driven by atr_1h_zscore)

A typical FTF run already ingests 30k+ bars per crypto, so the warmup constraint is met 45× over. Paper/live deployments need to fetch ≥ 672 bars (≈ 7 days at 15m) at startup — matches committee decision §11.4.


10. Implementation tick-off grid

The Phase 2 implementation team ticks off each class block as it lands. Dependencies go top-down (recursive first, multi-tf last).

Block Owner Target phase Status
StatefulEnricher core + EMA stub template dococeven Phase 1 done 2026-04-21 (63/63 tests green post-CR-rounds + post-rebase 2026-05-03 ; original 57 + 6 added during CR rounds 1-2)
Shadow-comparison harness wiring dococeven Phase 1 done 2026-04-21 (facade flag dispatch + divergence budget)
Recursive/Wilder base templates (EMA, Wilder, MACD) — extend from the Phase 1 EMA template dococeven 2.1 unblocked, ready to start
pandas-ta indicator shims (RSI, ATR, ADX, STOCH, BBands) dococeven 2.2 blocked on 2.1
Ring-buffer base templates (SMA, rolling std/mean/max/min, rolling quantile) dococeven 2.3 blocked on 2.1
Gating v1/v2/v3/v4/v5 port (stateless combinations on top of above) dococeven 2.4 blocked on 2.2/2.3
Candlestick pattern shims (stateless) dococeven 2.4 blocked on 2.1
Temporal cyclic (hour_sin, …) dococeven 2.4 can start after 2.1
Multi-timeframe sub-engine + atr_1h_* family dococeven 2.5 blocked on 2.2/2.3 (biggest block)
Fear & Greed external-state loader dococeven 2.6 independent, can start after 2.1
xgb_* features (all ~30) dococeven 2.7 blocked on 2.1-2.5

11. Post-inventory confidence statement (for committee)

  • ~150 features audited, broken down across 5 compute classes.
  • 0 hard blockers identified.
  • Biggest single sub-task: the multi-timeframe sub-engine (§5) — budget 1 week inside the Phase 2 allocation.
  • Memory budget per (symbol, tf, feature_set) key: dominated by atr_1h_zscore ring buffer (672 × 8 B = 5.4 kB) + a handful of smaller rings. Total ≈ 10–15 kB of state floats per key, well within the 2.5 MB budget estimated in documentation/../design/CVN-N005-stateful-enrichment.md §8quater.
  • The 6–8 week estimate for 1 FTE is realistic given this inventory. Reducing to 4 weeks would require 2 FTE in parallel (one on core+simple, one on multi-tf).

This document is the authoritative spec of Phase 2 porting. Any indicator encountered in code that is not in this inventory must be added here and reclassified before being ported.