Skip to content

MLflow Backbone 8★ Audit — Execution plan

Status: v2 — post committee review. Committee v1 verdict: REJECTED / METHODOLOGY_FLAW (Architect 8.0, Ops 8.0, Data Scientist 8.0, ML Engineer 8.0, Crypto Trader 7.0). Rejection cause: missing RBAC and comprehensive lineage pillars for a financial trading system. v2 adds both as Pillar 7 (RBAC) and Pillar 8 (Lineage), plus 8 strengthen-recommendations integrated inline. Authors: CVNTrade research, 2026-04-20 Parent issue: #604 Reference framework: MLflow official docs — Architecture Overview, Backend Stores, Artifact Stores, Model Registry Workflows + NIST RBAC principles for Pillar 7 + MLflow/Feast/Airflow lineage patterns for Pillar 8.

Changelog

  • v2 (2026-04-20) — post committee review. 10 recommendations integrated:
  • Pillar 7 RBAC / Access Control added (§12) — non-negotiable for a financial system.
  • Pillar 8 Data & Model Lineage/Provenance added (§13) — covers OHLCV → Enrichment → FE → MLflow → deployed model chain.
  • §3.1 Decision 1 partially reversed — Pillar 6 now includes hash/manifest parity checks for backtest↔live artifacts (not just grep of consumption paths).
  • §5 Phase B gains an explicit REVISE iteration pattern (targeted Phase A re-run for disputed pillars).
  • Pillar 4 section adds a dedicated ADR-42 atomic promotion subsection with explicit grep + workflow validation.
  • Pillar 5 section now verifies ADR-23 (version-pinned artifacts) and ADR-15 (OOS theta calibration) enforcement via tags.
  • §5 Phase D requires shadow-logging for one sprint before any mandatory-tag enforcement flag flip.
  • §5 Phase B adds risk-adjusted scoring per gap (P&L impact × operational stability × compliance exposure).
  • Follow-up out-of-scope: MLflow service observability (health/performance/alerting of the infrastructure itself) — noted §10 as separate issue to be opened.
  • DRI (Directly Responsible Individual) for MLflow backbone governance named in §11.
  • v1 (2026-04-20) — initial plan with 6 pillars. REJECTED by committee for missing Pillars 7 and 8.

1. Executive summary

CVNTrade's MLflow backbone has grown organically over the last year. Models, feature sets, backtest results, PDF reports, and FTF metadata all land in the same MLflow — but we have never audited whether its six structural pillars are actually respected. Recent issues surfaced indirect evidence of convention gaps:

  • #596 (cache versioning) revealed that versioning conventions were scattered across entity types and not centrally enforced.
  • #599 (shadow divergence) revealed that the claimed enrich_batch ↔ enrich_streaming invariant had never been tested.
  • ADR-42 (atomic promotion per crypto) — unknown implementation state.

Pattern: we rely on conventions that are documented but not enforced. MLflow likely has similar gaps.

This plan defines a 4-phase audit (Discovery → Analysis → Conversion → Execution) against the MLflow 6-pillar framework, with a fail-fast pivot after the two highest-risk pillars (tags + consumption) to avoid wasting time if critical gaps need immediate fixes.

Total pre-execution effort: ~4 days senior engineer. Non-blocking for P0-A tuning (pure read/query work, no production interference).


2. The 8 pillars (v2 — extended post committee review)

# Pillar Principle Source
1 Centralised tracking Remote Tracking Server, no local ./mlruns anywhere MLflow docs
2 SQL metadata backend PostgreSQL (or equivalent), not FileStore/SQLite MLflow docs
3 Object-store artifacts S3/MinIO for heavy artifacts, never in the DB MLflow docs
4 Registry as promotion boundary Model Registry is the official dev-to-prod frontier (+ ADR-42 atomic promotion) MLflow docs + ADR-42
5 Tag-driven governance Every run carries mandatory tags for traceability (code, data, feature set, business need) enforcing ADR-23 (version-pinned) + ADR-15 (OOS theta) MLflow docs + ADR-23 + ADR-15
6 Consume via alias/version Never load from raw runs:/ or hard-coded S3 paths. Hash/manifest parity between backtest and live for every consumed artifact MLflow docs + committee v2
7 Access Control & Security (RBAC) Authenticated writes, authorised reads. Boundaries between Tracking / Artifacts / Registry. Audit trail for write operations NIST RBAC + committee v2
8 Data & Model Lineage / Provenance Full traceability from raw OHLCV → Enrichment → Feature Engineering (Feast) → MLflow runs → deployed model. Every stage tagged with its upstream input hash ADR-23 + committee v2

3. Scoping decisions (proposed — to be validated by committee)

Three scoping decisions that shape the entire audit. All three recommend-then-validate patterns — the committee can flip any of them in review.

Decision 1 — Scope extended to 8 pillars, backtest↔live parity reinforced (v2)

v1 proposed: stay on 6 pillars, leave backtest↔live artifact parity entirely to #599.

Committee v2 verdict: partially overruled. Two required additions: - A dedicated Pillar 7 RBAC and Pillar 8 Lineage — non-negotiable for a financial trading system (all 5 experts agreed). - Pillar 6 must include hash/manifest identity checks between backtest and live artifacts, not only a grep of consumption paths (4/5 experts agreed).

v2 adopted: audit now covers 8 pillars. Pillar 6 gains a "artifact content parity" sub-section (§7.6.2) with hash-comparison commands for models, feature-set artifacts, and enrichment configs. The broader enrichment parity topic stays with #599 — only the artifact identity surface is audited here.

Decision 2 — Anchor to MLflow official docs, not a custom CVNTrade backbone spec

Proposed: audit against the external MLflow official architecture, not against a self-written "CVNTrade backbone v1" target.

Rationale: writing our own target first risks codifying "what we already do" instead of "what we should do" (self-validation bias). The official standard is neutral, widely understood, and the issue body is already structured around its pillars. If any pillar doesn't fit our context, we document the deviation explicitly in the §Verdict with a justification — we never bend the standard upstream.

Decision 3 — Audit order (v2): 5 → 6 → 7 → 3 → 4 → 8 → 2 → 1

v2 update: Pillars 7 (RBAC) and 8 (Lineage) inserted. Pillar 7 positioned early because security gaps would trigger immediate remediation (stop-the-world). Pillar 8 positioned after Registry because lineage tagging relies on 3/4/5 being clean.

Order Pillar Reason
1st 5 Tags / governance Highest expected gap rate — recent convention-related issues (#596, #599) suggest tags are inconsistent. v2 adds ADR-23/ADR-15 enforcement check — high blast radius if broken.
2nd 6 Consumption Downstream of 5 — aliases can't be used cleanly if tags are weak. v2 adds hash/manifest parity check.
3rd 7 RBAC (new) Security is stop-the-world if missing. Auditing early guarantees any critical finding triggers the fail-fast pivot before we invest days on the rest.
4th 3 Object storage Tightly linked to 5/6/7; hard-coded paths + permission gaps likely to appear.
5th 4 Registry Promotion logic (ADR-42) depends on 3/5/6/7 being clean. v2 adds explicit ADR-42 atomic promotion subsection.
6th 8 Lineage (new) Cross-cutting — checks the end-to-end chain. Makes sense after we've audited each stage individually.
7th 2 Backend SQL Infra configured once, probably clean.
8th 1 Tracking URI Same reasoning — basic checks only.

Fail-fast pivot (extended v2): if Pillars 5, 6 or 7 reveal critical gaps (e.g., mandatory tags missing across all runs, production consumption via raw runs:/, unauthenticated write access to the Registry), the audit pauses and pivots to immediate remediation PRs before continuing. A partial audit that triggers action beats a full audit that sits on the shelf.


4. Methodology (per-pillar audit template)

Every pillar produces one markdown section in the deliverable, following the same template:

### Pillar N — <name>

**Principle**: <one-sentence restatement>

**Method**:
- Evidence gathered: <commands run, files read, DB queries, kubectl dumps>

**State observed**: <factual findings>

**Gaps identified**:
| Gap | Severity | Evidence | Proposed fix sketch |
|-----|---|---|---|
| ... | Critical/Major/Minor | link or snippet | one line |

Severity classification (same as CodeRabbit, to keep continuity): - Critical: blocks a downstream guarantee (e.g., backtest-live parity, security, atomicity) - Major: creates drift or maintenance burden under growth but system works today - Minor: polish, hygiene, not operational

Per-pillar commands/queries listed in appendix §A.


5. Phases

Phase A — Discovery (2–3 days, senior engineer)

Per-pillar audit in the order 5→6→3→4→2→1 with the template in §4.

Deliverable: documentation/MLFLOW_BACKBONE_AUDIT.md with 6 pillar sections.

Exit criteria: - Every pillar has a populated state-observed + gaps table - All evidence commands are reproducible (listed inline in the section, not just described) - If a critical gap is found on Pillar 5 or 6, Phase A pauses and triggers the fail-fast pivot

Phase B — Analysis + risk-adjusted prioritisation (1 day + committee session)

Consolidate Phase A into: - A pillar × gap × severity matrix on a single page - Risk-adjusted scoring per gap (committee rec. 8) — each gap carries three scores: - P&L impact (Low/Med/High): how much real money is at risk - Operational stability (Low/Med/High): likelihood of production breakage - Compliance exposure (Low/Med/High): regulatory / auditor implications - Composite score = weighted average, used to sort the backlog - Top-3 production risks (what breaks if MLflow misbehaves today?) — each with its risk-adjusted score - Quick wins (gaps fixable in < 1 day each with composite Low/Low/Low scores — first batch of fix PRs) - A §Verdict per pillar: compliant / deviation-justified / non-compliant - A §Backlog ordered by composite risk-adjusted score

Deliverable: §Verdict + §Backlog + §RiskScores appended to MLFLOW_BACKBONE_AUDIT.md, submitted to the expert committee for independent review.

Exit criteria: committee PASSED verdict on the audit report. If REVISE, follow the v2 iteration pattern below.

v2 REVISE iteration pattern (committee rec. 4): 1. Identify the specific pillars in dispute from the committee feedback 2. Re-execute Phase A commands only for those pillars (not a full re-audit) 3. Re-analyse findings + re-score with the risk-adjusted method 4. Re-priority the backlog with the revised findings integrated 5. Submit a delta document (v2 of the audit) to committee, not a full rewrite 6. Target one revision cycle; a second REJECT triggers a scoping escalation — not another iteration.

Phase C — Conversion to child issues (0.5 day)

One gap = one GitHub issue, each carrying: - Pillar reference + section of the audit report - Severity - Effort estimate (T-shirt: XS/S/M/L) - Closing criteria (what evidence proves the gap is fixed)

Issue #604 shifts to umbrella role — status tracked via its list of child issues.

Deliverable: N child issues (N depends on Phase A findings).

Exit criteria: all non-Minor gaps have an assigned child issue. Minor gaps may be batched into a "polish" issue.

Phase D — Execution (duration depends on Phase C output)

Individual PRs, one per child issue, normal PR/CR workflow. Not planned at this stage — sequencing is decided by the committee in Phase B.

Some gaps (e.g. adding a required tag) are cross-cutting and may be grouped into a single PR that touches multiple emission sites; the committee arbitrates.

v2 shadow-logging requirement (committee rec. 9): for any "mandatory tag enforcement" flip, the flag must run in shadow mode for one sprint first — the tag is logged but not required — with zero WARN for the final week. Only then can the enforcement flip hard (ADR-25). Same pattern used by the stateful enrichment refactor P7 (#599).


6. Deliverables

Phase Deliverable Size Where
A documentation/MLFLOW_BACKBONE_AUDIT.md v1 (6 pillar sections) ~500 lines repo
B Same doc v2 with §Verdict + §Backlog + committee session +200 lines repo + committee/sessions/
C N child issues of #604 GitHub
D N PRs closing child issues GitHub

7. Timing and non-blockness

This audit is pure read/query work. It touches no production path, no running run. It can be executed:

  • In parallel with P0-A tuning, as a background senior-engineer track, OR
  • After P0-A stabilises (estimated mid-next-week once the proper baseline-reference run lands and we pick a direction on lever results)

v2 effort estimate (revised after adding Pillars 7 + 8)

Phase v1 estimate v2 estimate Why the change
A — Discovery 2–3 days 3–4 days Pillars 7 (RBAC) + 8 (Lineage) add ~1 day of cross-system audit (S3 ACL, Ingress auth, Feast+MLflow join)
B — Analysis + committee 0.5 day 1 day Risk-adjusted scoring per gap + explicit REVISE pattern
C — Child issues 0.5 day 0.5 day unchanged
D — Execution TBD TBD unchanged, depends on B

Total pre-execution : ~5 days senior engineer (was 4).

Recommendation: start Phase A in parallel — value of surfacing conventions gaps early is high, cost is zero on P0-A.


8. Risks

Risk Mitigation
Audit surfaces many gaps, backlog explodes, nothing gets fixed Fail-fast pivot after Pillars 5/6/7 (§3.3) — if critical, pause audit and fix first
Senior engineer gets pulled back to P0-A firefighting Phase A is 3-4 days of read-only work — can be paused and resumed without state
Report turns into a political document ("we already do that") Committee review in Phase B is the independent check — §Verdict is committee-signed, REVISE pattern documented
Gap fixes break model loading (if tags become mandatory retroactively) Any "mandatory tag" enforcement runs in shadow-logging for one full sprint before the hard flip (ADR-25). Same pattern as stateful enrichment (#599)
Auditing ADR-42 implementation state reveals it's not implemented Expected outcome, not a risk — that's exactly what an audit is for
Pillar 7 (RBAC) audit reveals publicly writable Tracking Server Stop-the-world: immediate remediation PR (fail-fast pivot), no audit continuation until fixed
Pillar 8 (Lineage) audit finds entire stages without upstream tags Document as backbone-critical, treat as pre-requisite to any future production scale-up

9. Interfaces with in-flight work

  • #596 (cache versioning): the monotonic accumulation pattern used there is a model for tag governance (Pillar 5). Cross-reference in Phase B.
  • #599 (enrichment refactor): the design stipulates the enricher state could become an MLflow artifact (P6 of that doc). The MLflow audit may surface constraints that make this more or less viable — align in Phase B.
  • ADR-23 (version-pinned MLflow artifacts): direct support material for Pillar 5 audit. Cross-reference.
  • ADR-15 (OOS theta calibration): Pillar 5 audit now enforces ADR-15 compliance via tag check (committee rec. 6).
  • ADR-42 (atomic promotion per crypto): direct audit target for Pillar 4 — dedicated subsection (committee rec. 5).
  • #608 (ThresholdCalibrator): Pillar 8 (Lineage) will verify the calibrator artifact is properly linked to its model version in Registry.
  • #612 (Inference gate audit): out-of-scope here but the findings on threshold resolution may cross-reference Pillar 5 tag gaps.
  • Console UI #611: not in scope; if the audit surfaces a need for an MLflow-viewing page in Console, it's a follow-up issue.

Follow-up issues to open during Phase B (v2 — committee rec. 7)

  • MLflow service observability — a separate audit of the MLflow infrastructure's own health, performance, and alerting (pod health, query latency, S3 artifact access latency, Registry operations throughput). Explicitly out of scope of this usage audit per committee recommendation, to be opened as a sibling issue to #604.

10bis. Ownership — DRI (Directly Responsible Individual)

Committee rec. 10: a single named owner for MLflow backbone governance.

  • DRI for the MLflow backbone: TBD — to be named at kickoff of Phase A. Candidates are the senior engineer allocated to the audit plus someone from the ops rotation.
  • Responsibilities:
  • Owns documentation/MLFLOW_BACKBONE_AUDIT.md and the Backbone Ops runbook to be derived from it (Phase C deliverable).
  • Reviews every PR touching mlflow imports, src/commun/mlflow/, infra/helm/mlflow/.
  • Co-signs child issues that promote a gap to "Critical" severity.
  • Points of escalation: re-auditing cadence (quarterly), incident response if an MLflow outage affects trading.
  • Handover pattern: if the DRI rotates, the successor re-runs the full Phase A grep/kubectl commands on the new codebase state and certifies the result (takes ~1 day, keeps the audit living).

10. Decisions taken (v2 — was "open questions" in v1)

All v1 open questions are resolved post committee review. Kept here as a decision log with the source recommendation.

  1. Fail-fast pivot threshold (§3.3) — DECISION: triggered on critical gaps in Pillars 5, 6 or 7 (v2 expansion). Audit pauses, pivot to remediation, resume after fix.

  2. ADR-42 implementation assessmentDECISION: stays inside Pillar 4 with a dedicated subsection §12 Pillar 4.2 (committee rec. 5).

  3. Scope Decision 1 (§3.1) — DECISION: Pillar 6 gains explicit backtest↔live artifact hash/manifest parity (§12 Pillar 6.2, committee rec. 3). Broader enrichment parity stays with #599.

  4. Scope Decision 2 (§3.2) — DECISION: MLflow official docs as the primary referential, augmented with NIST RBAC (Pillar 7) and ADR-23/ADR-15 tag enforcement (Pillar 5).

  5. Audit order (§3.3) — DECISION: 5 → 6 → 7 → 3 → 4 → 8 → 2 → 1 (v2). RBAC inserted early because security findings trigger stop-the-world.

  6. Phase B exit on REVISEDECISION: targeted re-execution of Phase A for disputed pillars only, not a full re-audit. Pattern spelled out in §5 Phase B (committee rec. 4). Max one revision cycle; a second REJECT triggers scoping escalation.

  7. Blind spotsDECISION: two new pillars added in v2:

  8. Pillar 7 RBAC (committee rec. 1)
  9. Pillar 8 Lineage (committee rec. 2) Neither is optional — required by 5/5 experts for a financial trading system.

Remaining open questions (for future iterations)

  1. DRI identity (§10bis) — to be named at kickoff of Phase A, not decidable in this plan.
  2. MLflow service observability (§9 follow-up) — to be scoped as a sibling issue, not blended into this audit.
  3. Quarterly re-audit cadence — the plan is one-shot ; committee rec. 10 implies the DRI runs it periodically. Proposal: quarterly re-audit (same commands, delta document). Not blocking.

11. Acceptance criteria for this plan document

  • Problem framed with evidence from adjacent issues
  • The 8 pillars recapped (v2)
  • Three scoping decisions made with explicit rationale, + 2 additions from committee
  • Methodology (per-pillar template with risk-adjusted scoring)
  • Phases A/B/C/D with exit criteria + REVISE iteration pattern
  • Timing and non-blockness stated (5 days pre-exec, revised from 4)
  • Risks + mitigations (including RBAC/Lineage blast-radius)
  • Interfaces with other in-flight work (#596, #599, #608, #612, ADR-15/23/42, #611)
  • DRI named as §10bis (committee rec. 10)
  • Shadow-logging requirement for mandatory-tag enforcement (committee rec. 9)
  • Follow-up MLflow service observability noted separately (committee rec. 7)
  • All v1 open questions resolved (7/7) — see §10
  • v1 Committee review completed: REJECTED / METHODOLOGY_FLAW, 8.0/8.0/8.0/8.0/7.0 scores, 10 recommendations fully integrated in v2
  • v2 Committee review (pending — submission in §14)

12. Appendix A — Per-pillar evidence commands (reproducible)

Pillar 5 — Tags

# Emission sites in code
grep -rn "mlflow.set_tag\|mlflow.log_param\|mlflow.set_experiment_tag" src/ scripts/ dags/ --include="*.py"

# Sample 10 recent FTF + 10 recent training runs, dump tags
kubectl exec -n cvntrade mlflow-XXXXX -- python -c "
import mlflow
client = mlflow.tracking.MlflowClient()
# list recent runs, print run.data.tags
"

# Build matrix: required_tag × (present | absent | partial)

Pillar 6 — Consumption

grep -rn 'runs:/\|artifact_uri\|file://\|s3://' src/ scripts/ dags/ --include="*.py" | grep -v test
grep -rn 'models:/' src/ scripts/ dags/ --include="*.py"
grep -rn '@production\|@staging' src/ scripts/ dags/ --include="*.py"
# Ratio of models:/ vs runs:/ = consumption-via-registry score

Pillar 6.2 — Backtest↔Live artifact hash parity (v2, committee rec. 3)

# For each deployed model version, compute SHA-256 of:
#   - model.pkl (or equivalent)
#   - feature_engineering_artifacts.pkl
#   - enrichment_config.json
# Compare to the SHA emitted by the backtest job that validated the model.
# Any divergence = Critical gap (backtest validated a different artifact
# than what's in production).

kubectl exec -n cvntrade mlflow-XXXXX -- python -c "
import mlflow, hashlib, os
client = mlflow.tracking.MlflowClient()
for m in client.search_registered_models():
    for v in client.search_model_versions(f'name=\"{m.name}\"'):
        # Download artifacts locally, sha256 them
        local = client.download_artifacts(v.run_id, 'model')
        h = hashlib.sha256(open(os.path.join(local, 'model.pkl'), 'rb').read()).hexdigest()
        print(m.name, v.version, v.current_stage, h[:16])
"

# Cross-check with backtest-emitted hash (if backtests log it — if they
# don't, that's itself a gap).

Pillar 3 — Object storage

# Artifact root URI
kubectl exec -n cvntrade mlflow-XXXXX -- env | grep ARTIFACT_ROOT

# Bucket inspection (Scaleway S3 API) — count, size, largest objects
# (requires S3 CLI creds)

# Runs with abnormally large DB-resident artifacts (anomaly)
kubectl exec console-XXXXX -- python3 -c "
# SELECT run_uuid, SUM(size) FROM artifacts GROUP BY run_uuid HAVING SUM(size) > 100e6
"

Pillar 4 — Registry

# Registered models
kubectl exec mlflow-XXXXX -- python -c "
from mlflow.tracking import MlflowClient
c = MlflowClient()
for m in c.search_registered_models():
    for v in c.search_model_versions(f'name=\"{m.name}\"'):
        print(m.name, v.version, v.current_stage, v.aliases)
"

Pillar 4.2 — ADR-42 atomic promotion audit (v2, committee rec. 5)

# Find the promotion code path
grep -rn 'promote\|atomic\|transition_model_version_stage\|set_registered_model_alias' \
    src/commun/mlflow/ src/commun/pipeline/ scripts/ dags/ --include="*.py"

# Document the workflow:
#   1. What triggers a promotion? (launcher DAG, manual, CI)
#   2. Is it atomic per-crypto (ADR-42 invariant) or does it cross-promote?
#   3. Does it use aliases (@production) or stages (deprecated since MLflow 2.x)?
#   4. Is there a rollback path?

# Audit the promotion transactional boundary:
#   - If a promotion fails mid-way (e.g., 3 of 5 cryptos done, 4th crashes),
#     what's the observed state? Partial promotion = ADR-42 violation.

Pillar 7 — RBAC / Access Control (v2, new, committee rec. 1)

# Authentication
kubectl exec mlflow-XXXXX -- env | grep -i 'auth\|token\|password\|cred'
kubectl get ingress -n cvntrade | grep mlflow   # TLS + auth middleware?
kubectl get secret -n cvntrade | grep mlflow    # service account creds

# Who writes?
grep -rn 'mlflow.set_tracking_uri\|MLFLOW_TRACKING_USERNAME\|MLFLOW_TRACKING_PASSWORD' \
    src/ scripts/ dags/ infra/ --include="*.py" --include="*.yaml"

# Can anyone POST?
curl -sS $MLFLOW_URL/api/2.0/mlflow/experiments/create \
    -X POST -d '{"name":"audit_probe_ignore"}' -H "Content-Type: application/json"
# Expected: 401 or 403. If 200 = CRITICAL gap (unauthenticated writes).

# Artifact store access
aws s3api get-bucket-acl --bucket cvntrade-artifacts --endpoint-url ...
aws s3api get-bucket-policy --bucket cvntrade-artifacts --endpoint-url ...
# Verify: restricted to the MLflow service account, not public, not open to all cluster pods.

# Registry mutation permissions
grep -rn 'transition_model_version_stage\|set_registered_model_alias\|delete_model_version' \
    src/ scripts/ --include="*.py"
# Audit: is there ANY non-CI path that can promote to @production?
# Expected: only the promotion DAG, and only via a service account with scoped permissions.

# Audit trail
# Every write operation on the Registry should be traceable to an identity
# (user or service account). Check MLflow access logs (if collected) and
# `tags.mlflow.user` on runs — should never be blank.
kubectl logs mlflow-XXXXX --since=24h | grep -iE 'POST|PUT|DELETE' | head -20

RBAC severity grid: - Unauthenticated writes to Tracking Server → Critical - Public S3 artifacts → Critical - Registry mutations without identity trace → Major - Stale shared API tokens → Major - Missing role separation (one token does everything) → Minor (if single-tenant today, acceptable; document the limit).

Pillar 8 — Data & Model Lineage / Provenance (v2, new, committee rec. 2)

The audit walks the end-to-end chain and verifies that each stage carries a verifiable pointer back to its upstream input.

OHLCV (raw)  →  Enrichment  →  Feature Engineering (Feast)  →  MLflow run  →  Registered model  →  Deployed artifact

For each stage, check: 1. Stage identity: the stage output has a version or hash 2. Upstream pointer: the stage's artifact tags carry the upstream version/hash 3. Determinism: given the upstream version, rerunning the stage produces the same output hash (or documents why not)

# Stage 1 → 2: Enrichment carries OHLCV source version?
grep -rn 'ohlcv_version\|data_hash\|data_source' src/ETL/ src/commun/pipeline/enrichment* --include="*.py"

# Stage 2 → 3: FE artifact carries enrichment version?
kubectl exec console-XXXXX -- python3 -c "
# SELECT run_uuid, tags FROM runs, experiment_tags WHERE tag_key LIKE '%enrich%'
"

# Stage 3 → 4: MLflow run has complete lineage tags?
# Required tags (committee rec. 6, aligning with ADR-23):
#   mlflow.source.git.commit, cvn.data_version, cvn.feature_set_version,
#   cvn.enrichment_version, cvn.calibrator_version
# Missing any one = lineage broken at that boundary.

# Stage 4 → 5: Registered model version points back to the training run?
kubectl exec mlflow-XXXXX -- python -c "
from mlflow.tracking import MlflowClient
c = MlflowClient()
for m in c.search_registered_models():
    for v in c.search_model_versions(f'name=\"{m.name}\"'):
        run = c.get_run(v.run_id)
        print(m.name, v.version, 'run:', v.run_id[:8],
              'feature_set:', run.data.tags.get('cvn.feature_set_version', 'MISSING'))
"

# Stage 5 → 6: Deployed artifact identity (what the runtime loads)
# matches the Registry version (not a stale cached artifact)
grep -rn 'from_mlflow_run\|download_artifacts' src/commun/pipeline/*.py src/backtest/*.py
# Check: what identifier is used to load? run_id (weakest), version, or alias (strongest)?

Lineage severity grid: - Any stage produces output without a version identifier → Critical - Upstream pointer missing in tags → Major per boundary - Determinism broken (same input → different output) → Critical (non-reproducible pipeline) - Runtime loads by run_id instead of alias → Major (breaks rollback atomicity)

Pillar 2 — Backend SQL

kubectl exec mlflow-XXXXX -- env | grep BACKEND_STORE_URI

# Volumetry
kubectl exec console-XXXXX -- python3 -c "
# SELECT count(*) FROM experiments, runs, metrics, params;
# pg_size_pretty(pg_total_relation_size('runs'))
"

Pillar 1 — Tracking URI

grep -rn 'MLFLOW_TRACKING_URI\|set_tracking_uri' src/ scripts/ dags/ airflow_docker/ infra/ --include="*.py" --include="*.yaml"
kubectl get configmap cvntrade-env-config -n cvntrade -o yaml | grep -A1 MLFLOW

# Multi-instance detection
kubectl get svc -A | grep -i mlflow

13. References

  • Parent issue: #604
  • Reference framework: MLflow docs — Architecture Overview / Backend Stores / Artifact Stores / Model Registry Workflows
  • Related: #566 (architecture gaps epic), #596 (cache versioning), #599 (enrichment refactor), ADR-23, ADR-42
  • Patterns reused: same 4-phase structure as #608 and #599; same committee submission template

14. Committee submission v2 (resubmission after REJECTED v1)

Title: MLflow Backbone 8★ Audit — plan v2 review (#604)

Question (target score: 8+):

Resubmission of documentation/../needs/CVN-N006-mlflow-backbone.md after the v1 REJECTED/METHODOLOGY_FLAW verdict. The v2 addresses all 10 committee recommendations — two blockers (new Pillars 7 RBAC and 8 Lineage) and eight strengthening points integrated inline. Audit is now on 8 pillars, order 5 → 6 → 7 → 3 → 4 → 8 → 2 → 1, pre-execution effort revised from 4 to 5 days senior.

Committee v2 review requested on:

  1. Pillar 7 RBAC scope (§12): proposed evidence commands cover authentication, authorisation, and audit trail at Tracking / Artifacts / Registry boundaries. Is the severity grid (unauthenticated write = Critical, public S3 = Critical, mutations without identity trace = Major) aligned with how the committee would score a financial-system RBAC audit?

  2. Pillar 8 Lineage depth (§12): we check per-stage identity + upstream pointer + determinism across the OHLCV → Enrichment → FE → MLflow → Registered model chain. Is this the right granularity, or should the audit also cover label-generation and PTE choice lineage (arguably upstream of OHLCV vs downstream of Enrichment)?

  3. Backtest↔Live hash parity (§12 Pillar 6.2): proposed sub-audit computes SHA-256 on model.pkl and FE artifacts, cross-checking against backtest-emitted hashes. Is the committee OK with the scope being just "artifact content identity", leaving "operational behaviour parity" (same prediction for same input) to #599?

  4. Phase B REVISE iteration pattern (§5): targeted Phase A re-run for disputed pillars only, max one cycle, second REJECT escalates. Is the one-cycle cap right, or should the committee keep unlimited cycles (at the cost of slower convergence)?

  5. Risk-adjusted scoring formula (§5): composite score = weighted average of (P&L impact, operational stability, compliance exposure). Which weights would the committee suggest for a crypto trading platform (probably not equal, e.g. P&L > ops > compliance)?

  6. Audit order (§3.3): we insert RBAC at position 3 (after 5, 6) to keep fail-fast semantics, and Lineage at position 6. Would the committee prefer RBAC earlier (stop-the-world reasoning)?

  7. Effort 5 days (§7): realistic for 1 FTE including the two new pillars, or does the committee see the lineage audit being larger than 1 day (e.g. Feast integration complexity)?

  8. DRI naming process (§10bis): proposal is to name the DRI at Phase A kickoff, not in this plan. Does the committee see a risk in deferring — should the DRI be named now?

  9. Remaining blind spots: does the committee still see missing pillars beyond RBAC and Lineage? Candidates we explicitly de-scoped to follow-up issues: MLflow service observability (ops), cost/storage lifecycle policies (finops), cross-environment drift detection (staging vs prod).

Deliverable: per-expert opinion (score, confidence, findings, risks, recommendations) + consolidated verdict (PROCEED / REVISE / REJECT). If PROCEED, Phase A starts in parallel with P0-A tuning once a senior DRI is named. If REVISE, we target one iteration cycle before escalating.

Artifacts referenced: - documentation/../needs/CVN-N006-mlflow-backbone.md v2 (this plan, ~500 lines) - v1 committee session (REJECTED) and its 10 recommendations — audit trail - Parent issue: #604 - Related: #596, #599, #608, #612, ADR-15, ADR-23, ADR-42, ADR-25