Skip to content

Operator-visible runtime invariants — make every operator-triggered run independently verifiable

epic_id: CVN-N014-EB need_id: CVN-N014 (parent — see Continuous improvement) OP: wp#197 GH issue: #1006 Status: planned Created: 2026-05-20 Owner: dococeven Trigger: CVN-N001-EE-S22A5 (OP wp#185) 2026-05-19 double-crash on a stale DAG (PR #1002 merged at 22:29:41Z ; operator re-triggered 3 min later, before DAG-repo sync at 22:38:55Z and K8s deploy at ~22:39:02Z ; pod ran the pre-fix DAG/image and crashed identically with no surface in the Airflow UI allowing the operator to detect the staleness).


1. Objective

Make every operator-triggered runtime artefact (DAGs first, then their dependent images and ConfigMaps) carry an operator-visible, self-describing build provenance, so that the operator can verify before triggering that the loaded build is the one expected. This is the operational-discipline complement to the ADR-25 (no silent fallback) and feedback_no_python_crash_visible policies — same family of "the operator must not be misled by silent state".

The Epic does not address propagation latency (that is intrinsic to the sync chain and acceptable). It addresses operator visibility into that latency.

2. Scope (files + components expected to change)

  • dags/_common.py — new dag_build_info() helper + dag_version_banner() for doc_md.
  • dags/*.py (32 DAGs) — one-liner per DAG calling the helper into doc_md + the first task log line.
  • dags/.dag_build_info.json (gitignored locally, written by sync) — the stamp file.
  • cvntrade-airflow-dags sync workflow — additional step that writes dags/.dag_build_info.json at sync time with {champollion_sha, synced_at_utc}.
  • .github/workflows/pr-workflow-guardrails.yml — new guardrail G6 (AST / regex check) that every DAG embeds the build banner + emits the event=dag_loaded first-task log line.
  • documentation/adr/0092-dag-versioning-and-build-provenance.md — new ADR formalising the invariant.
  • CLAUDE.md — mention the policy + link to ADR-92.

3. Stories (initial seed)

Future Stories (out of scope of S01 ; placeholders) :

  • CVN-N014-EB-S02 — extend the build-provenance surface to the K8s ConfigMap layer (operator sees which Helm release is live for cvntrade-env-config before triggering anything that reads from it).
  • CVN-N014-EB-S03 — operator-facing checklist or runbook page rendering the latest sync + deploy timestamps for cross-checking before any high-stakes manual trigger.

4. Acceptance criteria (Epic-level)

  • Every DAG file under dags/ surfaces its champollion build SHA in doc_md AND emits the event=dag_loaded log line on every task (enforced by CI guardrail G6 — without the gate the policy will drift over the next mission).
  • The auto-sync workflow on cvntrade-airflow-dags writes dags/.dag_build_info.json deterministically at every sync.
  • ADR-92 merged on main and live at docs.cvntrade.eu/adr/0092-....
  • Operator can verify the loaded build of any DAG (concrete acceptance check : diagnostic__s22_a5) from the Airflow UI alone, without any external SHA lookup.

5. Dependencies

None blocking — depends only on the existing sync workflow being editable. The CVN-N015-EC audit-trail Epic is adjacent (build provenance feeds audit-trail context) but independent in delivery order.

6. Cross-references

  • Project memory : project_dag_versioning_policy, feedback_dag_resync_delay, feedback_dag_sync, feedback_no_python_crash_visible.
  • Adjacent ADRs : ADR-25 (no silent fallback), ADR-30 (structured logs = stable interface), ADR-31/32 (logging discipline).
  • Trigger incident : OP wp#185 (S22A5) ; PR #1002 merged 2026-05-19T22:29:41Z ; both re-runs (22:32:49Z and the in-flight one) on the stale DAG.