Enrichment Indicator Inventory — Phase 0 of #599 refactor¶
Status: v1 — Phase 0 deliverable, to be ticked off by the implementation team during Phase 2 porting.
Sources: src/ETL/cvntrade_enrich.py (1378 lines) + src/ETL/post_enrichment/cvntrade_xgboost_feature_generator.py (712 lines).
Audit date: 2026-04-20.
Feature count (stable): ≈ 150 columns (depends on timeframe/params).
1. How to read this inventory¶
Each feature is classified into one of five compute classes, determining the incremental-update strategy the refactor will use:
| Class | Label | Update rule | Warmup needed |
|---|---|---|---|
| R | Simple recursive | y_t = α·x_t + (1−α)·y_{t−1} — keep 1 float per indicator |
small, converges geometrically |
| W | Wilder smoothing (Welles Wilder EMA variant) | same as R, α = 1/N | small, converges geometrically |
| B | Finite-window ring buffer | keep last N values, O(1) update with deque | exactly N bars |
| T | Multi-timeframe resample | state = current partial higher-tf bar + sub-indicator state (recursive structure) | N × (higher_tf / base_tf) + sub-indicator warmup |
| S | Stateless per-candle function | no state — function of current bar only (or ≤ few prior bars via shift) | 0 to handful of bars |
| X | Non-incremental (blocker) | requires full past or non-local operator | — |
Verdict column : - ✅ incremental-compatible by closed-form (R/W/T) - 🟡 incremental-compatible via ring buffer (B) — memory linear in N, still O(1) update - 🟢 stateless trivial (S) - 🔴 blocker — inherently batch or tricky; flag for scope discussion
Feature names that end in _shifted or _lag_* are pure shifts of other
features; they're class S with the caveat that the base feature must be
stored in the state.
2. Classification summary (counts)¶
| Class | Count (approx.) | Action |
|---|---|---|
| R/W Recursive / Wilder | 15 | Port first (simplest, closed-form, smallest risk) |
| B Ring buffer | 80+ | Port in bulk (same template pattern for rolling ops) |
| T Multi-timeframe | 6 | Requires dedicated sub-engine design |
| S Stateless | 40+ | Trivial port (shift / diff / pct_change / function of current bar) |
| X Blocker | 1 candidate | ⚠️ Flagged — Fear & Greed external merge, see §7 |
No other blockers identified as of Phase 0. Implementation team should
re-audit xgb_* during Phase 2 start.
3. pandas-ta indicators (class R/W/B)¶
Generated by ta.bbands / ta.rsi / ta.mfi / ta.atr / ta.macd / ta.stoch / ta.adx / ta.sma, configured via self.params. Names include the period as a suffix. For a standard timeframe the columns are:
| Column pattern | Class | Formula | WARMUP_BARS | Verdict |
|---|---|---|---|---|
RSI_{period} (×3: short/medium/long) |
W | Wilder RSI gain_smooth=(1−1/n)·prev + (1/n)·gain |
≈ 5×N for 1e-3 convergence; 10×N for 1e-6 | ✅ |
MFI_{period} (×3) |
W | Money Flow Index — Wilder smoothing on money-flow ratio | ≈ 5×N | ✅ |
ATRr_{period} (×3: short/medium/long) |
W | Wilder ATR atr_t=(1−1/n)·atr_{t−1}+(1/n)·tr_t |
≈ 5×N | ✅ |
BBL/BBM/BBU/BBB/BBP_{period}_{std} (×3) |
B | SMA ± k·std on window | N bars | 🟡 |
MACD_{f}_{s}_{sig}, MACDs_*, MACDh_* (×3) |
R | EMA(fast) − EMA(slow), signal = EMA(diff) | ≈ 5×max(s, sig) | ✅ |
STOCHk_{k}_{d}_{smooth}, STOCHd_* (×3) |
B | 100 × (close − low_N) / (high_N − low_N), smoothed |
max(k, d) bars | 🟡 |
ADX_{period}, DMP_{period}, DMN_{period} (×2) |
W | Wilder smoothing on directional movement | ≈ 5×N | ✅ |
SMA_{period} (×2: medium/long) |
B | Simple rolling mean | N bars | 🟡 |
distance_SMA_{period} (×2) |
S | (close − SMA_N) / SMA_N — function of SMA |
inherits SMA warmup | 🟢 |
Total: ~30 columns. All ✅ or 🟡 — textbook streaming indicators, zero risk.
4. Custom rolling / stateless (class B or S)¶
4.1 Volume-derived¶
| Column | Class | Formula | WARMUP | Verdict |
|---|---|---|---|---|
Volume_Delta |
S | volume.diff() |
1 | 🟢 |
SMA_Volume_{period} (×3) |
B | rolling mean of volume | N | 🟡 |
volume_ma_5/10/20 |
B | rolling mean | 5/10/20 | 🟡 |
volume_ratio_5/10/20 |
S | volume / volume_ma_* |
inherits | 🟢 |
volume_price_momentum |
S | volume_ratio_5 × price_change_abs_1h |
inherits | 🟢 |
volume_weighted_price |
B | rolling sum(close × volume) / rolling sum(volume) | N | 🟡 |
vwap_deviation |
S | (close − vwap) / close |
inherits | 🟢 |
volume_trend_3 |
B | rolling 1 if last > first else -1 |
3 | 🟡 |
volume_price_efficiency |
S | volume / (high − low) |
0 | 🟢 |
market_impact_proxy |
S | |price_change_1h| / (volume_ratio_5 + 0.1) |
inherits | 🟢 |
volume_momentum |
S | volume.pct_change(N) |
N | 🟢 |
volume_autocorr_3h |
B | rolling corr on volume.shift(lag) |
win + lag |
🟡 |
institutional_buying_pressure |
B | rolling sum of volume × (close−open)/open if up |
N | 🟡 |
volume_weighted_up_pressure |
B | rolling sum volume × max(0, price_change_1h) |
N | 🟡 |
order_flow_imbalance |
B | (vol_up_ma − vol_down_ma) / (vol_up_ma + vol_down_ma + 1) |
N | 🟡 |
volume_acceleration |
B | mean(rolling ratio) over window |
N | 🟡 |
volume_regime_change |
S | (volume_ratio_5 > 1.3 × volume_ratio_5.shift(1)) |
1 | 🟢 |
4.2 Price-volatility / range (rolling-std family)¶
| Column | Class | Formula | WARMUP | Verdict |
|---|---|---|---|---|
price_volatility_5/10/20 |
B | rolling_std / rolling_mean (coefficient of variation) |
N | 🟡 |
price_change_1h/3h/6h |
S | close.pct_change(N) |
N | 🟢 |
price_change_abs_1h/3h |
S | |price_change_*| |
inherits | 🟢 |
high_low_range_pct |
S | (high − low) / close |
0 | 🟢 |
high_low_range_ma_ratio_5/10 |
B | range / rolling_mean(range) |
5/10 | 🟡 |
close_to_high_ratio, close_to_low_ratio |
S | ratios on current bar | 0 | 🟢 |
atr_normalized |
S | ATRr / close |
inherits | 🟢 |
atr_expansion |
B | ATRr / rolling_mean(ATRr, N) |
N | 🟡 |
ATRr_{period}_std_10, _std_30 |
B | rolling std of ATR | 10/30 | 🟡 |
volatility_regime_high/low |
B | price_volatility > rolling_quantile(0.8) |
20 | 🟡 — quantile: check pandas impl is incremental-friendly (skiplist / tdigest optional) |
volatility_breakout |
B | price_volatility > rolling_mean × 1.5 |
10 | 🟡 |
volatility_regime_change |
S | price_vol > price_vol.shift(1) × 1.5 |
1 | 🟢 |
velocity_consistency |
B | rolling std of pct_change | 3 | 🟡 |
volatility_regime |
B | (high − low) / rolling_mean(high−low, N) |
N | 🟡 |
market_trending_strength |
B | rolling std + diff composition | N | 🟡 |
intraday_return_volatility |
S | (high − low) / open |
0 | 🟢 |
4.3 Momentum / support-resistance¶
| Column | Class | Formula | WARMUP | Verdict |
|---|---|---|---|---|
price_momentum_3 |
B | rolling.apply(lambda) — first-vs-last in window |
3 | 🟡 — rewrite as closed-form (close − close.shift(N)) / close.shift(N) |
price_acceleration |
S | price_change_1h.diff(1) |
2 | 🟢 |
recent_high_5/recent_low_5 |
B | rolling max/min | 5 | 🟡 |
resistance_breach, support_breach |
S | compare close vs shifted recent extreme | 6 | 🟢 |
recent_high_N, recent_low_N (gating v2/v3/v4) |
B | rolling max/min | N | 🟡 |
dynamic_support_strength |
B | rolling_min(low) / close |
N | 🟡 |
upward_breakout_signal |
B | compare high to shifted rolling-max(high) | N | 🟡 |
resistance_break_strength |
B | same pattern | N | 🟡 |
support_level_strength, resistance_proximity |
B | 2-window max/min ratios | max(N_short, N_long) | 🟡 |
momentum_divergence |
B | price_change_1h − rolling_mean(price_change_3h) |
N | 🟡 |
acceleration_signal |
B | rolling mean of price_acceleration | N | 🟡 |
ma_cross_bullish |
B | compare two SMAs + their shifted versions | max(N) | 🟡 |
price_above_ma_strength |
S | (close − SMA) / SMA |
inherits | 🟢 |
momentum_consistency, momentum_strength |
S | sum of sign(price_change_*) | inherits | 🟢 |
higher_highs_signal, higher_lows_signal |
S | compare .shift(1) vs .shift(2) |
2 | 🟢 |
bullish_structure_score |
S | sum of above | inherits | 🟢 |
price_position_short/medium/long |
S | close / SMA_N − 1 |
inherits | 🟢 |
trend_alignment_score |
S | sum of ordinal compares | inherits | 🟢 |
4.4 Market microstructure / pattern proxies¶
| Column | Class | Formula | WARMUP | Verdict |
|---|---|---|---|---|
bid_ask_pressure |
S | sign(close − open) |
0 | 🟢 |
price_tick_movement |
S | sign(close.diff()) |
1 | 🟢 |
price_tick_persistence |
B | rolling sum of tick movement | N | 🟡 |
doji_pattern, hammer_pattern |
S | boolean on current OHLC | 0 | 🟢 |
buyer_seller_pressure |
S | (close − open) / (high − low) |
0 | 🟢 |
local_fractal_high, local_fractal_low |
S | 3-bar fractal with .shift(1) + .shift(2) |
2 | 🟢 |
price_entropy |
B | entropy of close / sum(close) over rolling window |
N | 🟡 — rewrite as incremental Shannon if the window is large (today vol_med = 10, so ring buffer is cheap) |
price_autocorr_3h |
B | rolling.corr(shift(lag)) |
win + lag |
🟡 — incremental corr via Welford works; ring buffer simpler for small N |
price_vs_vwap_strength |
B | inherits VWAP | N | 🟡 |
bullish_divergence_1h |
S | compare price_change_1h > 0 & rsi < rsi.shift(1) |
1 | 🟢 |
bullish_momentum_acceleration |
B | rolling mean of diff(price_change_1h) | N | 🟡 |
volatility_skew_up |
B | rolling mean of (high.pct_change − low.pct_change) | N | 🟡 |
price_efficiency_up |
S | (close − open) / (high − low) if up else 0 |
0 | 🟢 |
price_gap |
S | (open − close.shift(1)) / close.shift(1) |
1 | 🟢 |
significant_gap |
S | boolean of gap > 0.005 | 1 | 🟢 |
up_down_battle_score |
B | ratio of rolling sums | N | 🟡 |
candle_sentiment_momentum |
B | rolling mean of weighted candle pattern sum | N | 🟡 |
rsi_timeframe_divergence |
S | sum of indicator comparisons | inherits RSIs | 🟢 |
momentum_confluence_strength |
S | mean of |rsi − 50| |
inherits RSIs | 🟢 |
rsi_momentum |
S | RSI.diff(1) |
1 | 🟢 |
rsi_action_zone, rsi_neutral_zone |
S | boolean on RSI | 0 | 🟢 |
adx_momentum |
S | ADX.diff(1) |
1 | 🟢 |
adx_action_zone |
S | ADX > 25 |
0 | 🟢 |
4.5 Composite gating scores (v1/v2/v3/v4/v5)¶
| Column | Class | Notes |
|---|---|---|
gating_opportunity_score (v1) |
S | sum of 3 indicator booleans |
gating_opportunity_score_v2 |
S | weighted sum of clipped features |
gating_opportunity_score_v3 |
S | same pattern, more terms |
gating_opportunity_score_v4_up |
S | same |
direction_prediction_score_v5 |
S | same |
All composite scores are stateless combinations of already-computed features. Port cost: zero as long as the inputs are ported.
4.6 Chandeliers / candlestick (class S, 60+ columns)¶
ta.cdl_pattern() outputs ~60 CDL_* columns (e.g. CDL_DOJI, CDL_HAMMER, …) each of which depends on the last 3-5 bars max (stateless pattern matchers).
| Column pattern | Class | Verdict |
|---|---|---|
CDL_* (all candlestick patterns) |
S | 🟢 |
candle_sentiment_score |
S | 🟢 |
bullish_pressure_score, bearish_pressure_score, indecision_score |
S | 🟢 |
Zero risk.
4.7 Temporal cyclic (class S)¶
| Column | Class | Verdict |
|---|---|---|
hour_sin, hour_cos |
S | 🟢 |
day_sin, day_cos |
S | 🟢 |
Function of timestamp only. Trivial.
4.8 Lagged features (class S)¶
| Column | Class | Verdict |
|---|---|---|
RSI_{N}_lag_1/3/5, ADX_{N}_lag_1/3/5 |
S | 🟢 — state just needs to hold the last 5 values of the base column |
5. Multi-timeframe resamples (class T — needs dedicated design)¶
| Column | Class | Why it's T | WARMUP | Verdict |
|---|---|---|---|---|
atr_1h_price |
T | ATR on 1h-resampled OHLCV, computed from 15m base | ≈ 4 × ATR_window in 15m bars |
✅ via sub-engine |
atr_1h_pct |
T | atr_1h_price / close_1h |
same | ✅ |
atr_1h_zscore |
T | z-score of atr_1h_pct over 168h (1 week) = 672 × 15m bars — largest window in the inventory |
672 bars | 🟡 — ring buffer of 672 floats per crypto = 5.4 kB, fine |
realized_vol_1h |
T | rolling std of close.pct_change() over 1h = 4 × 15m bars |
4 | ✅ |
range_ratio_1h |
T | (high − low) / atr_1h_price (per-candle, inherits atr_1h) |
inherits | ✅ |
trend_strength_6h |
T | |close.pct_change(24)| / atr_1h_pct — 6h = 24 × 15m bars |
24 + atr_1h warmup | ✅ |
Design note: the multi-timeframe sub-engine holds: - a partial higher-tf bar (open, high, low, close, volume accumulator) updated on each base-tf candle - a sub-indicator state (Wilder ATR on the 1h axis) - a ring buffer of the last M higher-tf bars (for z-score, rolling stats)
When the base-tf timestamp crosses a higher-tf boundary, the partial bar becomes final, the sub-indicator updates, and a new partial bar starts. Aligned-close semantics (committee decision §11.3).
This is the highest-complexity sub-task of Phase 2. Budget 1 week of the implementation timeline to this engine alone (the design doc 6-8w estimate assumed this).
6. Fear & Greed integration (class X — blocker candidate)¶
df_fng = _get_fear_and_greed_cached(self.logger) # external API, process cache
df_calc["_fng_date"] = pd.to_datetime(...).dt.normalize()
df_calc["fear_and_greed"] = df_calc["_fng_date"].map(df_fng_keyed).fillna(50.0)
| Column | Class | Issue |
|---|---|---|
fear_and_greed |
X (partial) | Values change on a daily calendar key, not per candle. Current impl reads the entire F&G series and does a date-based map. Incremental port needs: (a) a daily cache of {date → F&G value} warmed at bootstrap, (b) per-candle lookup by candle.timestamp.date(). |
Verdict: ✅ portable, no blocker — incremental port is straightforward: load F&G series at bootstrap, look up by date at each candle, same API as today. Reclassified from X to S-with-external-state. Keep the flag on this row as a reminder: external-data features need their state loaded at bootstrap, not recomputed from the candle stream.
7. XGBoost post-enrichment features (class R/S/B, composites)¶
All xgb_* features are post-hoc combinations of already-enriched columns. They compute on top of the enrichment output, using the same pandas ops (rolling / diff / pct_change / ratios). None introduce new memory primitives.
| Column | Input dependency | Class | Verdict |
|---|---|---|---|
xgb_rsi_squared_grp2 |
RSI_medium | S | 🟢 |
xgb_rsi_log_transform_grp2 |
RSI_medium | S | 🟢 |
xgb_rsi_exp_decay_grp2 |
RSI_medium | S | 🟢 |
xgb_price_sma_ratio_log_grp1 |
close, SMA_medium | S | 🟢 |
xgb_price_volatility_grp1 |
close, rolling_std(close, 50) | B (N=50) | 🟡 |
xgb_network_activity_proxy_grp2 |
volume, close | S | 🟢 |
xgb_mining_pressure_grp2 |
volume, close, rolling means | B (N=20) | 🟡 — known high NaN rate (logged at runtime: >50% NaN (historique court?)), confirm porting matches |
xgb_sentiment_composite_grp2 |
RSI, pct_change | S | 🟢 |
xgb_crypto_regime_indicator_grp3 |
SMA_medium/long ratio | S | 🟢 |
xgb_momentum_volume_composite_*_grp2 |
momentum × sign(volume) | S | 🟢 |
xgb_ratio_*_grp2 (dynamic tech ratios) |
RSI, MACD, etc. | S | 🟢 |
xgb_ma_alignment_short_medium_grp3 |
SMA(5), SMA(20) | B | 🟡 |
xgb_ma_alignment_medium_long_grp3 |
SMA(20), SMA(50) | B | 🟡 |
xgb_ma_convergence_strength_grp3 |
SMA(5), SMA(50) | B | 🟡 |
xgb_vpt_momentum_grp3 |
cumulative volume × pct_change, diff(5) |
R (cumsum has state) | ✅ — state = running VPT, diff over last 6 values |
xgb_advanced_ad_indicator_grp2 |
close, high, low, volume, rolling(10) | B | 🟡 |
xgb_hurst_proxy_5d_grp3 |
rolling std/mean of pct_change(5) over 50 bars |
B | 🟡 |
xgb_volume_entropy_proxy_grp2 |
volume / rolling sum(volume, 50), log |
B | 🟡 |
xgb_accel_{N}_close_mean_grp2 (N=3/6/12) |
close.pct_change().diff() rolling mean |
B | 🟡 |
xgb_accel_{N}_volume_mean_grp2 |
same for volume | B | 🟡 |
xgb_accel_{N}_stability_grp2 |
rolling_std² of accel |
B | 🟡 |
xgb_accel_{N}_cross_mom_grp2 |
price_mom × vol_mom (both rolling) |
B | 🟡 |
xgb_accel_{N}_divergence_grp2 |
price_mom − vol_mom |
B | 🟡 |
xgb_accel_{N}_curvature_norm_grp2 |
rolling_mean(accel) / rolling_std(close) |
B | 🟡 |
xgb_accel_regime_consistency_grp2 |
compare sign(short vs long accel) | S | 🟢 |
xgb_accel_strength_ratio_grp2 |
short_accel / | long_accel | |
xgb_accel_breakthrough_{N}_grp2 (N=6/12) |
adaptive threshold via 48-bar rolling stats | B | 🟡 |
xgb_accel_persist_{N}_grp2 |
same + 3-bar momentum | B | 🟡 |
xgb_accel_{N}_range_grp2 (N=3/6) |
rolling mean of range_pct.diff() |
B | 🟡 |
xgb_accel_asymmetry_{N}_grp2 (N=12/24) |
ratio of rolling counts | B | 🟡 |
xgb_accel_amplitude_ratio_{N}_grp2 |
ratio of conditional rolling means | B | 🟡 |
xgb_ audit result: ~30 xgb_ features, zero blockers*. All fit cleanly in class B or S. xgb_vpt_momentum_grp3 is the only one with cumulative state — straightforward (running sum + diff over last 6 values).
8. Blockers identified¶
None at Phase 0 closure. The one candidate (fear_and_greed) was reclassified to S-with-external-state (§6) once the external-data state pattern was clarified.
However, three items warrant flagging for Phase 2 architectural attention:
atr_1h_zscore— largest window in the inventory (672 bars ring buffer). Not a blocker, just the sizing benchmark for state memory.- Rolling quantiles (
volatility_regime_high/low, usesrolling.quantile(0.8)/quantile(0.2)over 20 bars). Pandas' rolling.quantile is stateless; a strictly incremental version needs a skiplist or tdigest. Given N=20, a simple ring-buffer recompute per candle is O(20 log 20) = trivial. Flag as "acceptable for now, revisit if profiling shows it matters". price_autocorr_3h— rolling Pearson correlation with a shifted copy. Welford's online algorithm extends to covariance; for small window (likely 12 bars) a ring buffer is simpler.
None are scope-blockers. They are implementation-detail decisions for the relevant indicator class templates.
9. WARMUP_BARS aggregation¶
The Phase 0 deliverable feeds the Phase 2 runtime validation in bootstrap():
# src/commun/pipeline/enrichment/engine.py
WARMUP_BARS_PER_FEATURE = {
"RSI_14": 70, # 5 × 14
"ADX_14": 70,
"SMA_50": 50,
"MACD_24_64_18": 320, # 5 × 64
"atr_1h_zscore": 672, # 168h @ 15m
# ...
}
REQUIRED_BOOTSTRAP_BARS = max(WARMUP_BARS_PER_FEATURE.values())
# ≈ 672 bars at 15m timeframe = 7 days
Rough ballpark for the typical config (timeframe=15m, rsi_long=28, atr_long=30, macd_slow=64, zscore_1h_window=168h):
A typical FTF run already ingests 30k+ bars per crypto, so the warmup constraint is met 45× over. Paper/live deployments need to fetch ≥ 672 bars (≈ 7 days at 15m) at startup — matches committee decision §11.4.
10. Implementation tick-off grid¶
The Phase 2 implementation team ticks off each class block as it lands. Dependencies go top-down (recursive first, multi-tf last).
| Block | Owner | Target phase | Status |
|---|---|---|---|
| StatefulEnricher core + EMA stub template | dococeven | Phase 1 | ✅ done 2026-04-21 (63/63 tests green post-CR-rounds + post-rebase 2026-05-03 ; original 57 + 6 added during CR rounds 1-2) |
| Shadow-comparison harness wiring | dococeven | Phase 1 | ✅ done 2026-04-21 (facade flag dispatch + divergence budget) |
| Recursive/Wilder base templates (EMA, Wilder, MACD) — extend from the Phase 1 EMA template | dococeven | 2.1 | unblocked, ready to start |
| pandas-ta indicator shims (RSI, ATR, ADX, STOCH, BBands) | dococeven | 2.2 | blocked on 2.1 |
| Ring-buffer base templates (SMA, rolling std/mean/max/min, rolling quantile) | dococeven | 2.3 | blocked on 2.1 |
| Gating v1/v2/v3/v4/v5 port (stateless combinations on top of above) | dococeven | 2.4 | blocked on 2.2/2.3 |
| Candlestick pattern shims (stateless) | dococeven | 2.4 | blocked on 2.1 |
| Temporal cyclic (hour_sin, …) | dococeven | 2.4 | can start after 2.1 |
| Multi-timeframe sub-engine + atr_1h_* family | dococeven | 2.5 | blocked on 2.2/2.3 (biggest block) |
| Fear & Greed external-state loader | dococeven | 2.6 | independent, can start after 2.1 |
| xgb_* features (all ~30) | dococeven | 2.7 | blocked on 2.1-2.5 |
11. Post-inventory confidence statement (for committee)¶
- ~150 features audited, broken down across 5 compute classes.
- 0 hard blockers identified.
- Biggest single sub-task: the multi-timeframe sub-engine (§5) — budget 1 week inside the Phase 2 allocation.
- Memory budget per
(symbol, tf, feature_set)key: dominated byatr_1h_zscorering buffer (672 × 8 B = 5.4 kB) + a handful of smaller rings. Total ≈ 10–15 kB of state floats per key, well within the 2.5 MB budget estimated indocumentation/../design/CVN-N005-stateful-enrichment.md§8quater. - The 6–8 week estimate for 1 FTE is realistic given this inventory. Reducing to 4 weeks would require 2 FTE in parallel (one on core+simple, one on multi-tf).
This document is the authoritative spec of Phase 2 porting. Any indicator encountered in code that is not in this inventory must be added here and reclassified before being ported.