Operator-visible runtime invariants — make every operator-triggered run independently verifiable¶
epic_id: CVN-N014-EB
need_id: CVN-N014 (parent — see Continuous improvement)
OP: wp#197
GH issue: #1006
Status: planned
Created: 2026-05-20
Owner: dococeven
Trigger: CVN-N001-EE-S22A5 (OP wp#185) 2026-05-19 double-crash on a stale DAG (PR #1002 merged at 22:29:41Z ; operator re-triggered 3 min later, before DAG-repo sync at 22:38:55Z and K8s deploy at ~22:39:02Z ; pod ran the pre-fix DAG/image and crashed identically with no surface in the Airflow UI allowing the operator to detect the staleness).
1. Objective¶
Make every operator-triggered runtime artefact (DAGs first, then their dependent images and ConfigMaps) carry an operator-visible, self-describing build provenance, so that the operator can verify before triggering that the loaded build is the one expected. This is the operational-discipline complement to the ADR-25 (no silent fallback) and feedback_no_python_crash_visible policies — same family of "the operator must not be misled by silent state".
The Epic does not address propagation latency (that is intrinsic to the sync chain and acceptable). It addresses operator visibility into that latency.
2. Scope (files + components expected to change)¶
dags/_common.py— newdag_build_info()helper +dag_version_banner()fordoc_md.dags/*.py(32 DAGs) — one-liner per DAG calling the helper intodoc_md+ the first task log line.dags/.dag_build_info.json(gitignored locally, written by sync) — the stamp file.cvntrade-airflow-dagssync workflow — additional step that writesdags/.dag_build_info.jsonat sync time with{champollion_sha, synced_at_utc}..github/workflows/pr-workflow-guardrails.yml— new guardrail G6 (AST / regex check) that every DAG embeds the build banner + emits theevent=dag_loadedfirst-task log line.documentation/adr/0092-dag-versioning-and-build-provenance.md— new ADR formalising the invariant.CLAUDE.md— mention the policy + link to ADR-92.
3. Stories (initial seed)¶
CVN-N014-EB-S01— DAG-versioning policy : helper + retrofit of 32 DAGs + sync-workflow stamp + CI guardrail G6 + ADR-92. Issue #1005 · OP wp#198 · plan dossier2026-05-20-cvn-n014-eb-s01-dag-versioning-policy-plan.md(in flight, awaiting committeeplan_review).
Future Stories (out of scope of S01 ; placeholders) :
CVN-N014-EB-S02— extend the build-provenance surface to the K8s ConfigMap layer (operator sees which Helm release is live forcvntrade-env-configbefore triggering anything that reads from it).CVN-N014-EB-S03— operator-facing checklist or runbook page rendering the latest sync + deploy timestamps for cross-checking before any high-stakes manual trigger.
4. Acceptance criteria (Epic-level)¶
- Every DAG file under
dags/surfaces its champollion build SHA indoc_mdAND emits theevent=dag_loadedlog line on every task (enforced by CI guardrail G6 — without the gate the policy will drift over the next mission). - The auto-sync workflow on
cvntrade-airflow-dagswritesdags/.dag_build_info.jsondeterministically at every sync. - ADR-92 merged on
mainand live atdocs.cvntrade.eu/adr/0092-.... - Operator can verify the loaded build of any DAG (concrete acceptance check :
diagnostic__s22_a5) from the Airflow UI alone, without any external SHA lookup.
5. Dependencies¶
None blocking — depends only on the existing sync workflow being editable. The CVN-N015-EC audit-trail Epic is adjacent (build provenance feeds audit-trail context) but independent in delivery order.
6. Cross-references¶
- Project memory :
project_dag_versioning_policy,feedback_dag_resync_delay,feedback_dag_sync,feedback_no_python_crash_visible. - Adjacent ADRs : ADR-25 (no silent fallback), ADR-30 (structured logs = stable interface), ADR-31/32 (logging discipline).
- Trigger incident : OP wp#185 (S22A5) ; PR #1002 merged 2026-05-19T22:29:41Z ; both re-runs (22:32:49Z and the in-flight one) on the stale DAG.