CVN-N001-EI-S07 — Validation runs (Gate-3 size + clean-fail) — 2026-05-27¶
Two operator-triggered captures via diagnostic__s18_step1_4_chain, build 5b26d1b (PR #1080 merged + DAG-synced 09:43:53Z), defi_top5 fold 3. Serialized (max_active_runs=1, ADR-22): LDOUSDC then AAVEUSDC.
Results¶
| Run | Step-0 | observed / expected f1_buy | |Δ| vs ε=0.005 | parquet_bytes |
n_features | n_train / n_val | Phase-A elapsed | Task |
|---|---|---|---|---|---|---|---|---|
| LDOUSDC/3 | PASS (reproduced) | 0.3092 / 0.3092 | 0.0000 | 21 835 187 (20.8 MiB / 21.84 MB) | 319 | 9060 / 1923 | 1340 s (~22 min) | GREEN |
| AAVEUSDC/3 | FAIL (non-reproducing) | 0.3591 / 0.3520 | 0.0070 > ε | 24 370 983 (23.2 MiB / 24.37 MB) | 320 | 9853 / 2093 | 1377 s (~23 min) | GREEN |
Both lines carry the new parquet_bytes field → confirms the instrumented 5b26d1b build is what ran.
Finding 1 — Clean-fail validated (PR #1080) ✅¶
AAVEUSDC re-diverged (live data drift, the §2bis phenomenon: today's fetched data ≠ canonical anchor, observed 0.3591 vs expected 0.3520) — exactly the case that previously crashed with a RuntimeError traceback. With #1080 the chain now surfaces it cleanly:
ERROR - event=s18_chain_verdict severity=error outcome=PHASE_A_FAIL phase_a_status=FAIL
observed_f1=0.3591 expected_f1=0.3520 next_action=ESCALATE reason=non_reproducing_baseline
- Task stayed GREEN; no traceback from the chain task.
- The no-Python-crash rule is satisfied: a non-reproducing baseline is a loud structured
severity=errorverdict (ADR-25/26/30 — Loki→Grafana is the alert channel), not a stacktrace. - LDOUSDC reproduced cleanly (
NO_DIVERGENCE, path already fixed by #947) → its verdict is the happy-path control.
Finding 2 — Gate-3 (artifact size) — PASS, knee raised to 30 MB (safety buffer) ⚠️¶
Original threshold (design §274): p95 ≤ 25 MB/fold · audit ≤ 1 GB · read+verify ≤ 10 s. Knee raised 25 → 30 MB on 2026-05-27 (operator-directed, this evidence) — a 5 MB/fold safety buffer so the gate doesn't start at the limit.
- Per-fold size: max observed 24.37 MB (AAVEUSDC) — was hugging the old 25 MB knee; now ≤ 30 MB with ~5.6 MB headroom → PASS.
- Audit budget: unchanged ≤ 1 GB ⇒ at 30 MB × ~30 folds ≈ ~900 MB (worst case at the new knee); observed ~24 MB × 30 ≈ ~720 MB → within budget.
- Caveat A — bigger than the §9 estimate: the parametric estimate predicted ~3–6 MB/fold (snappy on float32 ~10k×320). Observed is ~22–24 MB = 4–8× larger. Likely drivers: train+val both persisted, label/weight/split columns, object/index columns, codec defaults. → flag for §9: revisit compression (codec/level), and whether dedup (§4b) holds at this real size. (The buffer is headroom, not a substitute for this.)
- Caveat B — n=2, both fold 3: not a true p95. The 30 MB knee gives margin, but larger cells/folds should be re-checked on a wider sample during Gate 4.
- Not measured here: read+verify ≤ 10 s and S3 upload/read latency — those are Gate 4 (cold→warm in-cluster), deferred.
Verdict: Gate-3 size does not block Lever #1 implementation (within both knees, now with a 5 MB/fold buffer), but §9 storage/compression should still be revisited given the 4–8× gap vs estimate, and the per-fold size re-checked on a wider sample during Gate 4.
Caveat — infra noise observed (not a regression, not our DAG)¶
Two psycopg2.OperationalError tracebacks at 10:32:35 in process DagFileProcessor12001 (Airflow's DAG-file parser losing its connection to the metadata DB at 172.16.16.4:5432 — "server closed the connection unexpectedly"). This is a transient infra fault (scheduler ↔ Postgres reconnect), explicitly exempt from the no-crash rule, and separate from our chain task — AAVEUSDC completed GREEN with its clean verdict. No action required beyond noting it.
Net¶
- #1080 fix proven on the real divergence case (AAVEUSDC): clean
severity=errorverdict, GREEN task, no traceback. - Gate-3 measured: ~21–24 MB/fold; knee raised 25 → 30 MB (5 MB/fold safety buffer) → Lever #1 unblocked on size with headroom; §9 compression still to revisit.
- Lever #1 entry gates status: Gate 1 (value/reuse) = GO (Phase-0), Gate 3 (size) = PASS-with-caveat (this run). Gate 2 (drift-rate) still pending; Gate 4 is the in-cluster release gate.