Architecture Cache CVNTrade - Conception Robuste¶

Version: 2.0.0 (Sprint 9 + Feast Integration) Date: Janvier 2026 Status: Production Ready

Vue d'ensemble¶

Architecture de cache hybride basée sur 6 entités métier avec gestion des dépendances, versioning automatique et récupération intelligente. Aucune dépendance aux run IDs - tout est basé sur des clés métier sémantiques.

Architecture Hybride MLflow + Feast¶

┌─────────────────────────────────────────────────────────────────────────┐
│                         CHAMPOLLION CACHE LAYER                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                    NIVEAU 1: FEATURE STORE                       │   │
│   │  ┌─────────────────┐              ┌─────────────────────────┐   │   │
│   │  │     Feast       │◄─── sync ───►│       MLflow            │   │   │
│   │  │  (Online/Redis) │              │    (Métadonnées)        │   │   │
│   │  │  (Offline/Dask) │              │                         │   │   │
│   │  └─────────────────┘              └─────────────────────────┘   │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                    │                                     │
│                                    ▼                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │              NIVEAUX 2-6: MLflow Exclusif                        │   │
│   │                                                                  │   │
│   │  L2: Labels ──► L3: FE ──► L4: Selection ──► L5: HPO ──► L6: Model │
│   │                                                                  │   │
│   │  Stockage: PostgreSQL (Registry) + S3 (Artifacts)               │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Rôles des Systèmes¶

Système	Rôle	Données
Feast	Feature serving temps réel	Features OHLCV + indicateurs (76 features)
MLflow	Experiment tracking + cache	Labels, FE, FS, HPO, Models
Redis	Online store Feast	Features matérialisées pour inference
PostgreSQL	Registry partagé	Métadonnées Feast + MLflow
S3	Artifact storage	Modèles, datasets, études HPO

Problèmes Résolus (Sprint 9)¶

Problème	Solution
Dépendance aux hashs exacts	Clés sémantiques métier
Gestion dispersée des caches	Interface unifiée CVNTradeCache
Pas de hiérarchie de fallback	4 niveaux (Exact → Context → Best → Default)
Couplage aux run IDs MLflow	Index intelligent SQLite
Pas de serving temps réel	Intégration Feast (Redis)
Maintenance non-scalable	Pattern cache-or-generate automatique

Intégration Feast (Nouveau Sprint 9)¶

Flux de Données avec Feast¶

ETL Pipeline                     Feast                          Inference
     │                             │                                │
     ▼                             ▼                                ▼
┌──────────────┐            ┌─────────────┐                 ┌─────────────┐
│ OHLCV Data   │───export──►│  Parquet    │                 │   Model     │
│ + Indicators │            │  Files      │                 │             │
└──────────────┘            └──────┬──────┘                 └──────┬──────┘
     │                             │                                │
     │                             ▼                                │
     │                      ┌─────────────┐                         │
     │                      │ PostgreSQL  │◄── registry              │
     │                      │ (Registry)  │                         │
     │                      └─────────────┘                         │
     │                             │                                │
     │                   materialize│                                │
     │                             ▼                                │
     │                      ┌─────────────┐     get_online          │
     └─────────────────────►│   Redis     │◄────features────────────┘
                            │(Online)     │
                            └─────────────┘

Feature Views Feast (76 features)¶

Feature View	Features	TTL	Usage
`raw_ohlcv`	5	7j	OHLCV brut
`momentum_indicators`	11	7j	RSI, MACD, Stochastic, MFI
`volatility_indicators`	11	7j	Bollinger, ATR, price_volatility
`trend_regime`	9	7j	ADX, SMA, market_regime
`volume_indicators`	7	7j	Volume ratios, delta
`gating_features`	14	7j	Gate model features
`direction_features`	10	7j	Direction model features
`temporal_features`	4	7j	Cyclical (hour/day sin/cos)
`external_features`	1	1j	Fear & Greed Index
`candlestick_features`	4	7j	Candle patterns

Synchronisation Cache ↔ Feast¶

# Pattern de synchronisation recommandé
class CVNTradeCache:
    def get_feature_store(self, crypto: str, interval: str) -> pd.DataFrame:
        # 1. Vérifier cache MLflow (métadonnées)
        result = self.manager.get_cached_entity(EntityType.FEATURE_STORE, criteria)

        if result.found:
            # 2. Charger depuis Feast (offline store) pour training
            return self._load_from_feast_offline(crypto, interval)

        # 3. Cache MISS → ETL Pipeline + Export Feast
        enriched_df = etl.run_feature_enrichment(coin=crypto)

        # 4. Export vers Feast (Parquet + Redis)
        exporter.export_enriched_data(enriched_df, crypto)
        exporter.materialize_to_online_store()

        return enriched_df

Points d'Intégration Clés¶

Composant Cache	Intégration Feast
`CVNTradeCache.get_feature_store()`	Export auto vers Feast après ETL
`CVNTrade_CacheManager.store_entity()`	Sync métadonnées MLflow + Feast
`CVNTrade_FeastExporter`	Bridge ETL → Feast Parquet
`CVNTrade_FeastInference`	Lecture Redis pour inference

Architecture Proposée¶

1. Entités Métier et Clés Sémantiques¶

1.1 Feature Stores¶

Clé: {crypto_symbol}_{timeframe}_{date}
Exemple: BTCUSDT_15m_20251002
Tags: crypto_symbol, timeframe, data_date, data_version
Artefacts: raw_data.parquet

1.2 Étiquetages¶

Clé: {crypto_symbol}_{timeframe}_{date}_{strategy}
Exemple: BTCUSDT_15m_20251002_SL1.2_TP1.3_H4
Tags: crypto_symbol, timeframe, data_date, strategy_config, strategy_version
Dépend: Feature Store
Artefacts: labels.parquet

1.3 Feature Engineering¶

Clé: {crypto_symbol}_{timeframe}_{date}_{strategy}_{fe_version}
Exemple: BTCUSDT_15m_20251002_SL1.2_TP1.3_H4_FE_v2
Tags: crypto_symbol, timeframe, data_date, strategy_config, fe_version, fe_config
Dépend: Feature Store + Étiquetage
Artefacts: features.pkl, feature_stats.json

1.4 Feature Selection¶

Clé: {crypto_symbol}_{timeframe}_{date}_{strategy}_{fe_version}_{model_type}_{fs_version}
Exemple: BTCUSDT_15m_20251002_SL1.2_TP1.3_H4_FE_v2_xgboost_FS_v1
Tags: crypto_symbol, timeframe, data_date, strategy_config, fe_version, model_type, fs_version
Dépend: Feature Engineering
Artefacts: selected_features.json, feature_importance.json

1.5 Hyperparamètres HPO¶

Clé: {crypto_symbol}_{timeframe}_{strategy}_{model_type}_{hpo_version}
Exemple: BTCUSDT_15m_SL1.2_TP1.3_H4_xgboost_HPO_v3
Tags: crypto_symbol, timeframe, strategy_config, model_type, hpo_version, best_score
Dépend: Feature Selection (compatible)
Artefacts: best_params.json, study_data.pkl

1.6 Modèles Entraînés¶

Clé: {crypto_symbol}_{timeframe}_{date}_{strategy}_{model_type}_{model_version}
Exemple: BTCUSDT_15m_20251002_SL1.2_TP1.3_H4_xgboost_v1
Tags: crypto_symbol, timeframe, data_date, strategy_config, model_type, model_version, performance_metrics
Dépend: Tous les précédents
Artefacts: model.pkl, metrics.json, feature_names.json

2. Hiérarchie de Cache et Stratégies de Fallback¶

2.1 Niveau 1 - Correspondance Exacte¶

# Recherche exacte sur tous les critères
cache_key = build_exact_key(crypto, timeframe, date, strategy, model_type, version)

2.2 Niveau 2 - Correspondance Contextuelle¶

# Recherche par contexte métier (même stratégie, crypto, timeframe)
# Date la plus récente compatible
context_match = find_by_context(crypto, timeframe, strategy, model_type)

2.3 Niveau 3 - Meilleur Disponible¶

# Meilleur modèle disponible pour ce crypto/timeframe
# Classé par performance descendante
best_available = find_best_by_performance(crypto, timeframe, model_type)

2.4 Niveau 4 - Configuration par Défaut¶

# Configuration baseline si aucun cache trouvé
default_config = get_default_config(model_type)

3. Architecture Technique¶

3.1 Gestionnaire Central de Cache¶

class CVNTradeCacheManager:
    def __init__(self, mlflow_uri: str):
        self.client = MlflowClient(mlflow_uri)
        self.cache_strategies = {
            CacheLevel.EXACT: ExactMatchStrategy(),
            CacheLevel.CONTEXT: ContextMatchStrategy(),
            CacheLevel.BEST_AVAILABLE: BestAvailableStrategy(),
            CacheLevel.DEFAULT: DefaultConfigStrategy()
        }

    def get_cached_entity(self, entity_type: EntityType, criteria: Dict) -> CacheResult:
        """Récupère une entité avec fallback automatique"""

    def store_entity(self, entity_type: EntityType, key: str, data: Any, metadata: Dict):
        """Stocke une entité avec métadonnées standardisées"""

    def invalidate_dependencies(self, entity_key: str):
        """Invalide les entités dépendantes"""

3.2 Registre des Entités¶

class EntityRegistry:
    DEPENDENCIES = {
        EntityType.LABELS: [EntityType.FEATURE_STORE],
        EntityType.FEATURE_ENGINEERING: [EntityType.FEATURE_STORE, EntityType.LABELS],
        EntityType.FEATURE_SELECTION: [EntityType.FEATURE_ENGINEERING],
        EntityType.HPO_PARAMS: [EntityType.FEATURE_SELECTION],
        EntityType.TRAINED_MODEL: [EntityType.HPO_PARAMS, EntityType.FEATURE_SELECTION]
    }

    def get_dependencies(self, entity_type: EntityType) -> List[EntityType]:
        """Retourne les dépendances d'une entité"""

    def validate_dependencies(self, entity_key: str) -> bool:
        """Vérifie que toutes les dépendances sont satisfaites"""

3.3 Interface Unifiée d'Accès¶

class CVNTradeCache:
    def __init__(self):
        self.manager = CVNTradeCacheManager()
        self.registry = EntityRegistry()

    # API simple pour chaque entité
    def get_feature_store(self, crypto: str, timeframe: str, date: str) -> FeatureStore:
    def get_labels(self, crypto: str, timeframe: str, date: str, strategy: str) -> Labels:
    def get_feature_engineering(self, crypto: str, timeframe: str, date: str, strategy: str) -> FeatureEngineering:
    def get_feature_selection(self, crypto: str, timeframe: str, strategy: str, model_type: str) -> FeatureSelection:
    def get_hpo_params(self, crypto: str, timeframe: str, strategy: str, model_type: str) -> HpoParams:
    def get_trained_model(self, crypto: str, timeframe: str, date: str, strategy: str, model_type: str) -> TrainedModel:

4. Workflow d'Exécution¶

4.1 Lancement d'un Nouveau Run¶

def launch_training_run(crypto: str, timeframe: str, strategy: str, model_type: str):
    cache = CVNTradeCache()

    # 1. Chercher modèle entraîné existant
    model = cache.get_trained_model(crypto, timeframe, today(), strategy, model_type)
    if model and not force_retrain:
        return model

    # 2. Chercher paramètres HPO
    hpo_params = cache.get_hpo_params(crypto, timeframe, strategy, model_type)
    if not hpo_params and not force_hpo:
        # Lancer HPO avec fallback sur paramètres existants
        hpo_params = run_hpo_with_fallback(cache, crypto, timeframe, strategy, model_type)

    # 3. Chercher feature selection
    feature_selection = cache.get_feature_selection(crypto, timeframe, strategy, model_type)
    if not feature_selection and not force_feature_selection:
        feature_selection = run_feature_selection(cache, crypto, timeframe, strategy, model_type)

    # 4. Entraîner avec les éléments récupérés/calculés
    model = train_model(hpo_params, feature_selection, ...)

    # 5. Sauvegarder le nouveau modèle
    cache.store_trained_model(model, crypto, timeframe, today(), strategy, model_type)

    return model

4.2 Variables d'Environnement de Contrôle¶

export FORCE_RETRAIN=true          # Force le réentraînement
export FORCE_HPO=true               # Force la ré-optimisation HPO
export FORCE_FEATURE_SELECTION=true # Force la re-sélection de features
export FORCE_FEATURE_ENGINEERING=true # Force le re-calcul FE
export FORCE_LABELS=true            # Force le re-calcul des labels
export FORCE_FEATURE_STORE=true     # Force le rechargement des données

export CACHE_STRATEGY=flexible      # exact|flexible|best_available
export CACHE_TTL=7d                 # Durée de vie du cache

5. Structure MLflow¶

5.1 Expériences par Type d'Entité¶

CVNTrade_FeatureStores     # Données brutes crypto
CVNTrade_Labels           # Étiquetages stratégies
CVNTrade_FeatureEng       # Feature engineering
CVNTrade_FeatureSelect    # Sélections de features
CVNTrade_HPO              # Optimisations hyperparamètres
CVNTrade_Models           # Modèles entraînés

5.2 Tags Standardisés¶

STANDARD_TAGS = {
    "entity_type": str,           # feature_store|labels|feature_eng|feature_select|hpo|model
    "crypto_symbol": str,         # BTCUSDT|ETHUSDT|...
    "timeframe": str,             # 15m|1h|4h|1d
    "data_date": str,             # 20251002
    "strategy_config": str,       # SL1.2_TP1.3_H4
    "model_type": str,            # xgboost|lightgbm|catboost
    "version": str,               # v1|v2|...
    "created_at": str,            # 2025-10-02T15:30:00Z
    "performance_score": float,   # Score de performance (pour classement)
    "dependencies": str,          # JSON des dépendances
    "compatibility_hash": str     # Hash de compatibilité (plus souple que hash exact)
}

6. Avantages de cette Architecture¶

6.1 Robustesse¶

Fallback automatique : toujours une solution trouvée
Validation des dépendances : cohérence garantie
Gestion d'erreurs : dégradation gracieuse

6.2 Scalabilité¶

Clés sémantiques : indépendantes des IDs techniques
Indexation optimisée : recherche rapide par tags
Purge automatique : gestion de la croissance

6.3 Maintenabilité¶

Séparation des responsabilités : chaque entité gérée séparément
Interface unifiée : API simple et cohérente
Debugging facilité : traçabilité complète des dépendances

6.4 Flexibilité¶

Stratégies configurables : adaptation aux besoins
Versioning automatique : évolution contrôlée
Variables d'environnement : contrôle fin du comportement

7. Migration et Déploiement¶

7.1 Phase 1 - Implémentation du Core¶

Gestionnaire central de cache
Registre des entités
Stratégies de fallback

7.2 Phase 2 - Migration des Caches Existants¶

Script de migration des caches actuels
Validation de l'intégrité
Tests de compatibilité

7.3 Phase 3 - Intégration dans les Workflows¶

Modification des lanceurs
Tests end-to-end
Monitoring et métriques

8. Architecture de Stockage S3¶

8.1 Structure des Répertoires S3¶

s3://cvntrade-mlflow-artifacts/
├── feature_stores/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── 20251002/
│   │   │   │   ├── raw_data.parquet
│   │   │   │   └── metadata.json
│   │   │   └── 20251003/
│   │   └── 1h/
│   └── ETHUSDT/
├── labels/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── SL1.2_TP1.3_H4/
│   │   │   │   ├── 20251002/
│   │   │   │   │   ├── labels.parquet
│   │   │   │   │   └── strategy_stats.json
│   │   │   │   └── 20251003/
│   │   │   └── SL1.5_TP2.0_H4/
│   │   └── 1h/
│   └── ETHUSDT/
├── feature_engineering/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── SL1.2_TP1.3_H4/
│   │   │   │   ├── FE_v2/
│   │   │   │   │   ├── 20251002/
│   │   │   │   │   │   ├── features.pkl
│   │   │   │   │   │   ├── feature_stats.json
│   │   │   │   │   │   └── processing_log.txt
│   │   │   │   │   └── 20251003/
│   │   │   │   └── FE_v3/
│   │   │   └── SL1.5_TP2.0_H4/
│   │   └── 1h/
│   └── ETHUSDT/
├── feature_selection/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── SL1.2_TP1.3_H4/
│   │   │   │   ├── xgboost/
│   │   │   │   │   ├── FS_v1/
│   │   │   │   │   │   ├── selected_features.json
│   │   │   │   │   │   ├── feature_importance.json
│   │   │   │   │   │   └── selection_metrics.json
│   │   │   │   │   └── FS_v2/
│   │   │   │   ├── lightgbm/
│   │   │   │   └── catboost/
│   │   │   └── SL1.5_TP2.0_H4/
│   │   └── 1h/
│   └── ETHUSDT/
├── hpo_params/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── SL1.2_TP1.3_H4/
│   │   │   │   ├── xgboost/
│   │   │   │   │   ├── HPO_v3/
│   │   │   │   │   │   ├── best_params.json
│   │   │   │   │   │   ├── study_data.pkl
│   │   │   │   │   │   ├── optimization_history.json
│   │   │   │   │   │   └── hyperopt_report.html
│   │   │   │   │   └── HPO_v4/
│   │   │   │   ├── lightgbm/
│   │   │   │   └── catboost/
│   │   │   └── SL1.5_TP2.0_H4/
│   │   └── 1h/
│   └── ETHUSDT/
├── trained_models/
│   ├── BTCUSDT/
│   │   ├── 15m/
│   │   │   ├── SL1.2_TP1.3_H4/
│   │   │   │   ├── xgboost/
│   │   │   │   │   ├── v1/
│   │   │   │   │   │   ├── 20251002/
│   │   │   │   │   │   │   ├── model.pkl
│   │   │   │   │   │   │   ├── metrics.json
│   │   │   │   │   │   │   ├── feature_names.json
│   │   │   │   │   │   │   ├── training_log.txt
│   │   │   │   │   │   │   └── model_summary.html
│   │   │   │   │   │   └── 20251003/
│   │   │   │   │   └── v2/
│   │   │   │   ├── lightgbm/
│   │   │   │   └── catboost/
│   │   │   └── SL1.5_TP2.0_H4/
│   │   └── 1h/
│   └── ETHUSDT/
└── cache_metadata/
    ├── indexes/
    │   ├── entity_registry.json
    │   ├── dependency_graph.json
    │   └── performance_index.json
    ├── locks/
    │   └── cache_operations.lock
    └── logs/
        ├── cache_operations.log
        └── cleanup_history.log

8.2 Configuration S3 MLflow¶

# Configuration des artefacts S3
MLFLOW_ARTIFACT_ROOT = "s3://cvntrade-mlflow-artifacts/"
MLFLOW_S3_ENDPOINT_URL = None  # AWS S3 standard
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_DEFAULT_REGION = "eu-west-1"

# Configuration avancée pour performance
S3_CACHE_SETTINGS = {
    "multipart_threshold": 64 * 1024 * 1024,  # 64MB
    "multipart_chunksize": 16 * 1024 * 1024,  # 16MB
    "max_concurrency": 10,
    "transfer_config": {
        "use_threads": True,
        "max_bandwidth": None
    }
}

9. Fonctions d'Administration des Caches¶

9.1 Gestionnaire d'Administration¶

class CVNTradeCacheAdmin:
    def __init__(self, mlflow_uri: str, s3_bucket: str):
        self.client = MlflowClient(mlflow_uri)
        self.s3_client = boto3.client('s3')
        self.bucket = s3_bucket
        self.cache_manager = CVNTradeCacheManager(mlflow_uri)

    def cleanup_expired_caches(self, ttl_days: int = 30) -> Dict[str, int]:
        """Nettoie les caches expirés selon la politique TTL"""

    def verify_cache_integrity(self, entity_type: EntityType = None) -> Dict[str, bool]:
        """Vérifie l'intégrité des caches et leurs dépendances"""

    def migrate_cache_entity(self, source_key: str, target_key: str) -> bool:
        """Déplace une entité de cache vers une nouvelle clé"""

    def delete_cache_entity(self, entity_key: str, cascade: bool = False) -> bool:
        """Supprime une entité de cache avec option de cascade"""

    def rebuild_cache_indexes(self) -> Dict[str, int]:
        """Reconstruit les index de performance et de dépendances"""

    def generate_cache_report(self) -> Dict[str, Any]:
        """Génère un rapport complet de l'état des caches"""

9.2 Commandes d'Administration CLI¶

# Nettoyage des caches expirés
python -m src.cache.admin cleanup --ttl 30 --dry-run

# Vérification de l'intégrité
python -m src.cache.admin verify --entity-type hpo_params --fix-broken

# Migration d'entités
python -m src.cache.admin migrate \
    --source "BTCUSDT_15m_SL1.2_TP1.3_H4_xgboost_v1" \
    --target "BTCUSDT_15m_SL1.2_TP1.3_H4_xgboost_v2"

# Suppression avec cascade
python -m src.cache.admin delete \
    --entity "BTCUSDT_15m_20251001_SL1.2_TP1.3_H4_FE_v1" \
    --cascade

# Rapport complet
python -m src.cache.admin report --output cache_report.json

# Reconstruction des index
python -m src.cache.admin rebuild-indexes

# Statistiques d'utilisation
python -m src.cache.admin stats --format table

9.3 Fonctions de Nettoyage Automatisées¶

class CacheCleanupOrchestrator:
    def __init__(self):
        self.admin = CVNTradeCacheAdmin()
        self.policies = CacheCleanupPolicies()

    def daily_maintenance(self):
        """Maintenance quotidienne automatisée"""
        # 1. Nettoyer les caches expirés
        expired = self.admin.cleanup_expired_caches(ttl_days=30)

        # 2. Vérifier l'intégrité
        integrity = self.admin.verify_cache_integrity()

        # 3. Optimiser les index
        if any(not status for status in integrity.values()):
            self.admin.rebuild_cache_indexes()

        # 4. Archiver les anciens modèles
        self.archive_old_models(keep_best_n=5)

        # 5. Rapport de maintenance
        return self.admin.generate_cache_report()

    def emergency_cleanup(self, free_space_gb: float):
        """Nettoyage d'urgence pour libérer de l'espace"""
        # Supprimer par ordre de priorité décroissante
        strategies = [
            lambda: self.cleanup_duplicate_features(),
            lambda: self.cleanup_failed_experiments(),
            lambda: self.cleanup_old_hpo_studies(),
            lambda: self.cleanup_intermediate_models()
        ]

        for strategy in strategies:
            if self.get_available_space() > free_space_gb:
                break
            strategy()

10. Intégration MLflow UI et Accessibilité¶

10.1 Interface MLflow Personnalisée¶

class CVNTradeCacheUIPlugin:
    """Plugin MLflow pour interface cache CVNTrade"""

    def register_custom_views(self):
        """Enregistre les vues personnalisées dans MLflow UI"""

        @app.route("/cache-dashboard")
        def cache_dashboard():
            """Tableau de bord des caches CVNTrade"""
            cache_stats = self.get_cache_statistics()
            return render_template("cache_dashboard.html", stats=cache_stats)

        @app.route("/cache-explorer/<entity_type>")
        def cache_explorer(entity_type):
            """Explorateur d'entités de cache"""
            entities = self.get_entities_by_type(entity_type)
            return render_template("cache_explorer.html", entities=entities)

        @app.route("/cache-lineage/<entity_key>")
        def cache_lineage(entity_key):
            """Visualisation de la lignée des dépendances"""
            lineage = self.build_dependency_graph(entity_key)
            return render_template("cache_lineage.html", lineage=lineage)

MLFLOW_ENRICHED_TAGS = {
    # Tags de navigation
    "cache.entity_type": str,        # feature_store|labels|feature_eng|...
    "cache.crypto_symbol": str,      # BTCUSDT|ETHUSDT
    "cache.timeframe": str,          # 15m|1h|4h|1d
    "cache.strategy": str,           # SL1.2_TP1.3_H4
    "cache.model_type": str,         # xgboost|lightgbm|catboost
    "cache.version": str,            # v1|v2|v3

    # Tags de qualité
    "cache.performance_score": float,   # Score de performance
    "cache.data_quality": str,          # excellent|good|fair|poor
    "cache.validation_status": str,     # validated|pending|failed

    # Tags de lifecycle
    "cache.created_date": str,          # 2025-10-02
    "cache.last_accessed": str,         # 2025-10-02T15:30:00Z
    "cache.access_count": int,          # Nombre d'accès
    "cache.ttl_expires": str,           # Date d'expiration

    # Tags de provenance
    "cache.source_run_id": str,         # Run source original
    "cache.dependencies": str,          # JSON des dépendances
    "cache.compute_time": float,        # Temps de calcul en secondes
    "cache.data_hash": str,             # Hash des données sources

    # Tags fonctionnels
    "cache.production_ready": bool,     # Prêt pour production
    "cache.experimental": bool,         # Version expérimentale
    "cache.archived": bool,             # Archivé
    "cache.backup_location": str        # Localisation de sauvegarde
}

10.3 Filtres et Recherches MLflow UI¶

class MLflowCacheSearchFilters:
    """Filtres de recherche optimisés pour l'UI MLflow"""

    @staticmethod
    def build_search_filters():
        return {
            # Filtres par entité
            "feature_stores": "tags.cache.entity_type = 'feature_store'",
            "best_models": "tags.cache.entity_type = 'trained_model' AND metrics.performance_score > 0.7",
            "recent_caches": "tags.cache.created_date >= '2025-10-01'",

            # Filtres par crypto
            "btc_caches": "tags.cache.crypto_symbol = 'BTCUSDT'",
            "eth_caches": "tags.cache.crypto_symbol = 'ETHUSDT'",

            # Filtres par performance
            "high_performance": "metrics.performance_score > 0.75",
            "production_ready": "tags.cache.production_ready = 'true'",

            # Filtres par timeframe
            "scalping_15m": "tags.cache.timeframe = '15m'",
            "swing_4h": "tags.cache.timeframe = '4h'",

            # Filtres par modèle
            "xgboost_models": "tags.cache.model_type = 'xgboost'",
            "ensemble_models": "tags.cache.model_type LIKE '%ensemble%'"
        }

    @staticmethod
    def get_suggested_searches():
        """Recherches suggérées pour l'UI"""
        return [
            "Meilleurs modèles BTCUSDT 15m",
            "Caches HPO récents",
            "Modèles prêts pour production",
            "Features engineering par stratégie",
            "Études d'optimisation complètes"
        ]

10.4 Tableaux de Bord Intégrés¶

<!-- Template cache_dashboard.html -->
<div class="cache-dashboard">
    <div class="stats-grid">
        <div class="stat-card">
            <h3>Entités en Cache</h3>
            <div class="entity-breakdown">
                <div>Feature Stores: {{ stats.feature_stores }}</div>
                <div>Labels: {{ stats.labels }}</div>
                <div>Feature Engineering: {{ stats.feature_engineering }}</div>
                <div>Feature Selection: {{ stats.feature_selection }}</div>
                <div>HPO Params: {{ stats.hpo_params }}</div>
                <div>Trained Models: {{ stats.trained_models }}</div>
            </div>
        </div>

        <div class="stat-card">
            <h3>Performance Overview</h3>
            <div class="performance-chart">
                <!-- Graphique de performance des modèles -->
            </div>
        </div>

        <div class="stat-card">
            <h3>Cache Utilization</h3>
            <div class="utilization-metrics">
                <div>Hit Rate: {{ stats.hit_rate }}%</div>
                <div>Storage Used: {{ stats.storage_used_gb }}GB</div>
                <div>Active Caches: {{ stats.active_caches }}</div>
            </div>
        </div>
    </div>

    <div class="quick-actions">
        <button onclick="cleanupExpired()">Nettoyer Expirés</button>
        <button onclick="rebuildIndexes()">Reconstruire Index</button>
        <button onclick="generateReport()">Rapport Complet</button>
    </div>
</div>

10.5 API REST pour Intégration¶

@app.route("/api/cache/entities/<entity_type>")
def list_cache_entities(entity_type):
    """API REST pour lister les entités de cache"""
    entities = cache_manager.list_entities(entity_type)
    return jsonify({
        "entity_type": entity_type,
        "count": len(entities),
        "entities": [entity.to_dict() for entity in entities]
    })

@app.route("/api/cache/search")
def search_caches():
    """API de recherche dans les caches"""
    query = request.args.get('q', '')
    filters = request.args.getlist('filter')

    results = cache_manager.search(query, filters)
    return jsonify({
        "query": query,
        "filters": filters,
        "results": results,
        "total": len(results)
    })

@app.route("/api/cache/dependency-graph/<entity_key>")
def get_dependency_graph(entity_key):
    """API pour récupérer le graphe de dépendances"""
    graph = cache_manager.build_dependency_graph(entity_key)
    return jsonify(graph)

11. Monitoring et Métriques¶

11.1 Métriques de Performance des Caches¶

class CacheMetricsCollector:
    def collect_metrics(self):
        return {
            "cache_hit_rate": self.calculate_hit_rate(),
            "avg_retrieval_time": self.calculate_avg_retrieval_time(),
            "storage_efficiency": self.calculate_storage_efficiency(),
            "dependency_resolution_time": self.calculate_dependency_time(),
            "cache_freshness": self.calculate_freshness_score(),
            "failed_retrievals": self.count_failed_retrievals(),
            "s3_transfer_metrics": self.get_s3_metrics()
        }

11.2 Alertes et Notifications¶

class CacheAlerting:
    def setup_alerts(self):
        alerts = [
            Alert("cache_hit_rate_low", threshold=0.7),
            Alert("storage_usage_high", threshold=0.85),
            Alert("dependency_errors", threshold=5),
            Alert("retrieval_timeout", threshold=30.0)
        ]
        return alerts

12. Points de Validation¶

Est-ce que cette architecture répond à vos 6 entités métier ?
Les clés sémantiques sont-elles suffisamment expressives ?
La hiérarchie de fallback est-elle appropriée ?
L'interface d'API vous convient-elle ?
La structure S3 est-elle optimale pour vos besoins ?
Les fonctions d'administration couvrent-elles vos cas d'usage ?
L'intégration MLflow UI facilite-t-elle la navigation ?
Y a-t-il des cas d'usage non couverts ?
Des modifications ou améliorations souhaitées ?

Cette architecture élimine la fragilité actuelle et fournit une base solide, maintenable et scalable pour la gestion des caches MLflow avec une administration complète et une accessibilité optimisée.

13. Impact Feast sur l'Architecture Cache¶

13.1 Analyse des Changements¶

L'intégration de Feast modifie fondamentalement le Niveau 1 (Feature Store) de l'architecture cache :

Aspect	Avant (MLflow seul)	Après (MLflow + Feast)
Stockage L1	Parquet dans S3 via MLflow	Parquet Feast + métadonnées MLflow
Serving Training	Lecture S3 directe	Feast offline store (Dask)
Serving Inference	Chargement complet modèle	Feast online store (Redis)
Latence Inference	~500ms	~15ms
Sync Multi-symboles	Manuel	Feast materialize automatique

13.2 Points de Coexistence¶

┌──────────────────────────────────────────────────────────────────┐
│                     COEXISTENCE CACHE/FEAST                       │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  NIVEAU 1 (Feature Store) - DUAL WRITE                           │
│  ┌─────────────────────────────────────────────────────────┐     │
│  │                                                         │     │
│  │   ETL Pipeline ──┬──► Feast Parquet (76 features)      │     │
│  │                  │                                      │     │
│  │                  └──► MLflow (métadonnées + lineage)    │     │
│  │                                                         │     │
│  └─────────────────────────────────────────────────────────┘     │
│                                                                   │
│  NIVEAUX 2-6 - MLflow Exclusif (inchangé)                        │
│  ┌─────────────────────────────────────────────────────────┐     │
│  │                                                         │     │
│  │   Labels ──► FE ──► Selection ──► HPO ──► Model        │     │
│  │                                                         │     │
│  │   Stockage: S3 artifacts + PostgreSQL métadonnées      │     │
│  │                                                         │     │
│  └─────────────────────────────────────────────────────────┘     │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

13.3 Implémentation Phase 2 (Sprint 9) - ACTIVE¶

Architecture Implémentée¶

# cvntrade_cache_interface.py - get_feature_store()
def get_feature_store(self, crypto: str, interval: str) -> pd.DataFrame:
    # 1. Essayer Feast en premier (si activé)
    if self.use_feast_l1:
        feast_df = self._get_feature_store_from_feast(crypto, interval)
        if feast_df is not None and not feast_df.empty:
            return feast_df  # ✅ Feast HIT

    # 2. Fallback MLflow (ancien comportement)
    result = self.manager.get_cached_entity(EntityType.FEATURE_STORE, criteria)
    if result.found:
        return result.entity.raw_data  # ✅ MLflow HIT

    # 3. Cache MISS → ETL Pipeline + Export Feast
    enriched_df = etl.run_feature_enrichment(coin=crypto, ...)

    # 4. Export vers Feast (Phase 2)
    if self.use_feast_l1 and self.feast_exporter:
        self._export_to_feast(enriched_df, crypto)

    return enriched_df

Fichiers Modifiés (Sprint 9)¶

Fichier	Modification
`src/commun/cache/cvntrade_cache_interface.py`	Feast comme source primaire L1
`src/ETL/cvntrade_etl_pipeline.py`	Export auto vers Feast après ETL
`dags/dag_etl_pipeline.py`	Tâche `sync_feast_feature_store`
`.env.example`	Variables `CACHE_USE_FEAST_L1`, etc.

Flag de Rollback¶

# Désactiver Feast et revenir à MLflow seul
CACHE_USE_FEAST_L1=0

Phase 3 (Future) : Unification Complète¶

Niveau	Source Données	Métadonnées	Serving
L1: Feature Store	Feast	MLflow	Redis (online)
L2: Labels	MLflow	MLflow	S3
L3: FE	MLflow	MLflow	S3
L4: Selection	MLflow	MLflow	S3
L5: HPO	MLflow	MLflow	S3
L6: Model	MLflow + Feast (optionnel)	MLflow	Redis (features) + S3 (model)

13.4 Variables d'Environnement Feast¶

# Configuration Feast dans .env
FEAST_FEATURE_STORE_PATH=feature_repo
FEAST_OFFLINE_STORE_TYPE=dask
FEAST_ONLINE_STORE_TYPE=redis
FEAST_REDIS_HOST=localhost
FEAST_REDIS_PORT=6379
FEAST_REGISTRY_TYPE=sql
FEAST_REGISTRY_PATH=postgresql+psycopg2://mlflow:mlflow@localhost:5432/champollion
FEAST_CACHE_TTL_SECONDS=60

# Contrôle intégration Cache/Feast
CACHE_USE_FEAST_L1=1              # Utiliser Feast pour Level 1
CACHE_FEAST_AUTO_MATERIALIZE=1    # Matérialisation auto après ETL
CACHE_FEAST_SYNC_MLFLOW=1         # Sync métadonnées vers MLflow

13.5 Fichiers de l'Intégration Feast¶

Fichier	Status	Description
`src/commun/cache/cvntrade_cache_interface.py`	Modifié	Feast L1 primaire + fallback MLflow
`src/ETL/cvntrade_etl_pipeline.py`	Modifié	Export auto vers Feast après ETL
`src/ETL/cvntrade_feast_exporter.py`	Créé	Export ETL → Feast Parquet
`src/inference/cvntrade_feast_inference.py`	Créé	Inference via Redis
`dags/dag_etl_pipeline.py`	Modifié	Tâche `sync_feast_feature_store`
`feature_repo/feature_store.yaml`	Créé	Config Feast
`feature_repo/features.py`	Créé	10 Feature Views (76 features)
`.env.example`	Modifié	Variables Feast ajoutées

14. Références¶

architecture/OVERVIEW.md - Vue d'ensemble système
architecture/FEAST_INTEGRATION.md - Guide détaillé Feast
architecture/MLFLOW.md - Design MLflow

Dernière mise à jour : Janvier 2026