ADR-0100 — Inter-task data transport: object-storage XCom backend, threshold-driven, no bespoke per-task storage¶
Status: proposed (CVN-N014-ED-S01 ; plan_review 8d7efca0 PASSED, consensus strong, 0 blocker)
Context-of-record: the CVN-N001-EI-S05 / s43 incident — cross-pod prediction transport via np.savez to a pod-local /tmp (not shared) → empty cohort → INCONCLUSIVE_TOOLING. Root cause: no project standard for inter-task transport — each DAG improvises its intermediate storage, with a configurable target that can be wrong and that unit tests (tmp_path) never see. See Epic CVN-N014-ED.
Context¶
Airflow XCom is persisted in the metadata DB: designed for small messages (return values, flags), bounded in size (~48 KB on Postgres → silent truncation beyond) and in serializable types. Large arrays/DataFrames in XCom are a documented anti-pattern. The sound practice is pass-by-reference: small messages → XCom; large data → external store, push the reference (URI), read it downstream.
The XComObjectStorageBackend (apache-airflow-providers-common-io) makes this a centralized policy rather than per-task code: a task does return value / xcom_pull, and the backend offloads values above a size threshold to object storage (S3 — already the project substrate), keeping only the URI in the DB. The "wrong-target" bug class becomes structurally impossible for the payloads the backend can serialize.
Verified mechanics (in-pod, common-io 1.4.2 — CVN-N014-ED-S01 v4) — these bound the standard and are load-bearing:
serialize_valuerunsjson.dumps(value, cls=XComEncoder)unconditionally, before the threshold check.XComEncoderdoes not serialize numpy (TypeErroron a bare/unicodendarrayand on a nested dict-of-arrays). So the backend transports JSON-serializable values, not arbitrary large objects: numpy/pandas fail at any size.- Below threshold the backend returns the same
XComEncoderJSON bytesBaseXComwould store → behavior is byte-identical (a flip on a JSON-only, sub-threshold parc is near-inert). - Above threshold the DB stores the object-store path string; the value lives in S3. A revert to
BaseXComthen hands downstream the path string, not the data → rollback is revert + re-trigger, not a bare revert.
enable_xcom_pickling=False + donot_pickle=True are effective in-pod (env does not override), so the current XCom universe is JSON-only and enumerable.
Decision¶
Adopt the object-storage XCom backend as the project standard for inter-task data transport, threshold-driven, with no bespoke per-task storage — under the following disciplines:
- Global flip, threshold-gated incrementality.
AIRFLOW__CORE__XCOM_BACKENDis instance-global (all DAGs at once). The migration is not per-DAG opt-in; the threshold provides incrementality — set above the control-flow p99.9 so existing small XComs stay inline in the DB, and only large payloads offload. Rejected: a custom per-dag_idallowlist backend (tests the wiring, not the shipped backend). - Serialization is JSON-by-
XComEncoder, not "arbitrary object". A value transported by XCom MUST beXComEncoder-serializable. numpy/pandas require a registered serializer (airflow.serializationserde) or explicit pass-by-reference (XCom = pointer, data in store). Pass-by-reference is the preferred default for numpy/large ML objects (CVN-N014-ED-S02). - No bespoke per-task intermediate storage. A task does
return value/xcom_pull; it MUST NOT hand-roll its own S3/FS write for inter-task transport. (Reviewer gate, CVN-N014-ED-S04.) - Object storage, not a shared RWX filesystem. S3 is already the project substrate; a shared mutable RWX FS (concurrency, SPOF, manual GC) is rejected as against the stateless-pods architecture, unless a real POSIX need is opened separately.
- Flip is gated on proof, not hope. A hardened pre-flight (
scripts/xcom_flip_preflight.py) MUST prove in the real worker/scheduler context: provider present, S3 connection usable, and that the dotted-sectioncommon.ioconfig actually took (airflow config get-value, not the file — guards the silent env-var drop). A serialization audit (scripts/xcom_serialization_audit.pystatic +scripts/xcom_roundtrip_probe.pydynamic in-pod) MUST show control-flow round-trips green before flip. - Rollback = revert + re-trigger, rehearsed. The kill-switch reverts
XCOM_BACKENDtoBaseXComand re-triggers in-flight runs that wrote offloaded XComs (per-run, self-heals on re-run). The sequence MUST be rehearsed before the prod cut-over. Runbook:runbook_xcom_backend_flip.md. - Bounded object-store growth. An S3 lifecycle rule expires the
xcom/prefix; the backend purges objects on XCom clear. Orphan-sweeping (if needed) is CVN-N014-ED-S03.
Invariants¶
- Inv 1 — JSON-serializable or pass-by-reference. Any XCom value is
XComEncoder-serializable; numpy/pandas go by registered serializer or explicit reference. Never assume the backend "natively" carries arrays. - Inv 2 — no bespoke per-task transport storage. Inter-task data moves via
return/xcom_pull(+ the backend) or an explicit, reviewed pass-by-reference — never a hand-rolled per-task store target. - Inv 3 — flip is proof-gated. No flip without a green pre-flight (provider + S3 conn + effective config proven in-pod) and a green control-flow round-trip.
- Inv 4 — rollback is revert + re-trigger, rehearsed. A bare backend revert is a defect once offload traffic exists; the runbook sequence is rehearsed before cut-over.
- Inv 5 — threshold protects location, not serialization. Calibrate the threshold (p99.9 control-flow) to keep control-flow inline; never treat it as a serialization gate (the
json.dumpsprecedes it).
Relation to other ADRs¶
- ADR-25 (no silent fallback) — an unreachable object store fails the task loudly; no fallback path.
- ADR-26 — the flip/rollback runbook starts at Grafana.
- Scoped by the Epic CVN-N014-ED; S02 (numpy routing X′/X″/Y, default Y) and S04 (author guideline + review gate) build on this standard.