Skip to content

Runtime orchestrator — Decision dossier

Status : v1 — neutral decision dossier, no recommendation. For committee / operator review. Authors : CVNTrade research, 2026-04-21 Decision required : §9 Revisits : ADR-60 (hot-path CandlePipeline) and ADR-61 (batch Hamilton).


1. Executive summary

CVNTrade needs an orchestrator for per-candle runtime flows (backtest, paper trading, live trading). Two credible options exist. Both have been used in production systems at similar scale. Both have measurable costs and benefits. This dossier gathers the facts to make a decision.

  • Option 1 — Keep the home-grown CandlePipeline at runtime (current state, ADR-60). Hamilton stays scoped to batch training (ADR-61).
  • Option 2 — Unify on Hamilton for both runtime and batch. ADR-60 retracted, ADR-61 widened.

The decision affects ~250 lines of orchestration code, the surface of the parity certification (#614), and future maintenance burden. It does not affect the correctness of the trading logic itself.


2. Problem statement

The #599 stateful refactor restructures feature computation into atomic update / emit functions. These functions are shared between two orchestration contexts :

  • Batch (training) — Hamilton orchestrates a DAG over all historical candles once, to produce the fitted artifacts the model is trained on.
  • Runtime (backtest, paper, live) — an orchestrator iterates candles one at a time and applies the feature/inference/filter/execution pipeline.

Current state : batch uses Hamilton (ADR-61), runtime uses CandlePipeline (ADR-60). The two orchestrators share the atomic functions but not the orchestration layer itself. The decision under consideration is whether unifying the orchestration layer on Hamilton would be a net improvement.

Why now

  • 599 Phase 2 (bulk indicator port) will multiply the number of orchestrated nodes from 6 to ~150. The cost of the orchestration layer changes at that scale.

  • 614 (FTF↔Runner parity certification) needs to prove behavioural equivalence. A single orchestrator reduces the surface of equivalences to prove. Two orchestrators keep the current surface.

  • Decisions taken now shape ~8 weeks of Phase 2 work. Changing mind mid-Phase 2 is expensive.

Why it matters independently of the "right" answer

Either choice, adopted explicitly and coherently, is better than the current drift. The current drift is : ADRs written, not enforced, with a home-grown orchestrator accumulating edge cases while Hamilton grows elsewhere. The monolith cvntrade_backtest_engine.py (~2 800 lines) is the cautionary tale of "home-grown that kept growing" ; §7 addresses whether CandlePipeline is on the same trajectory.


3. Current architecture (context for the decision)

3.1 ADR-60 as it stands

"Hot-path sequential flows use CandlePipeline — NOT Hamilton. Hot path = sub-ms latency per candle ; Hamilton adds orchestration overhead that doesn't amortize on linear flows."

Concrete artefacts : - src/commun/pipeline/runner/pipeline.py (~50 lines) — CandlePipeline class (a for-loop with short-circuit) - src/commun/pipeline/runner/builder.py (~80 lines) — wires steps into a pipeline - src/commun/pipeline/runner/steps.py (~200 lines) — step implementations (CUSUM gate, Enrichment, FE, Inference, FilterChain) - src/commun/pipeline/runner/step.py (~30 lines) — PipelineStep ABC - src/commun/pipeline/runner/context.py (~60 lines) — CandleContext + SkipReason - src/commun/pipeline/runner/candle_loop.py (~90 lines) — iterates a pipeline over a candle stream - src/commun/pipeline/runner/assertion.py (~60 lines) — runtime invariant checks

Total home-grown surface : ~570 lines.

3.2 ADR-61 as it stands

"Batch dataflow computations — enrichment, feature engineering (fit), labelling — MUST use Hamilton."

Concrete artefacts : none yet on main. Hamilton is in requirements.txt but the feature DAG files (src/commun/pipeline/features/*.py) are planned for Phase 5 of #568.

3.3 The atomic layer (shared by both ADRs)

From #599 Phase 1 : - src/commun/pipeline/enrichment/protocol.pyIndicator Protocol (bootstrap_state, update, emit) - src/commun/pipeline/enrichment/engine.pyStatefulEnricher (orchestrates indicators bar-by-bar, O(1) per candle per indicator) - src/commun/pipeline/enrichment/indicators/ema.py — reference R-class indicator

Both ADR-60 and ADR-61 consume this layer ; the decision is about which orchestrator wraps the atomic functions at runtime.


4. Option 1 — Keep CandlePipeline at runtime, Hamilton at batch

4.1 What it is

Status quo. CandlePipeline orchestrates per-candle flow. Hamilton (once Phase 5 of #568 ships) orchestrates batch training.

4.2 Measured / documented facts

Fact Source Measurement
Per-candle orchestration overhead CandlePipeline.process() structure : Python for-loop, time.perf_counter_ns() per step, isinstance check. Measured : ~600 ns for 6-step pipeline, empty steps. ~100 ns per step added by the orchestrator (rest is Python function call cost)
Code to maintain wc -l src/commun/pipeline/runner/*.py ~570 lines (+ tests)
Short-circuit on skip Native — returns SkipReason on first failing step No orchestration cost on skip (common case in CUSUM gate)
Compile / warmup cost None 0 ms
Memory overhead Holds step list + timings dict ~1 KB per pipeline instance
Lineage emission None natively — would need to be implemented Manual list of step names
Parallelism Sequential only Single thread per candle
Learning curve for new contributors Stdlib Python patterns only Zero external framework

4.3 Pros

  • P1 — Minimal orchestration overhead : measured ~600 ns per candle for 6 steps, will scale linearly to ~15 µs for 150 steps (Phase 2 complete). At 8 000 candles × 150 nodes × 100 ns per Python call = 120 ms per fold for pure orchestration — ~0.6 % of a typical 20 s fold.
  • P2 — Zero cold start : no DAG compilation step. A fresh Python process can start executing candles within milliseconds.
  • P3 — No external dependency for runtime-critical path : requirements.txt still has Hamilton (for batch), but a broken Hamilton release or regression cannot affect trading runtime if Hamilton is not on the runtime hot path.
  • P4 — Short-circuit native : CUSUM rejects ~80 % of candles in standard regimes (stable markets). Returning SkipReason on the first failing step means the remaining steps never execute — the orchestrator is free for 80 % of candles.
  • P5 — Debugging experience : a stack trace through a Python for-loop is trivially readable. Step-through in a debugger works with no framework-specific tooling.

4.4 Cons

  • C1 — Two orchestrators to maintain : 570 lines of hot-path code vs Hamilton for batch. When a pattern is introduced (e.g. a new tag, a new context field), it must be added in both places. Shared tests also live in two surfaces.
  • C2 — Dual parity surface : #614 certifies feature-level equivalence between the batch (Hamilton) and runtime (CandlePipeline) paths. The orchestrators themselves must be proven equivalent — by tests that walk real data through both paths and assert matching trade lists. This is ≥20 % of the parity test budget.
  • C3 — No built-in lineage : tracing which feature depends on what requires manual docs. When a feature breaks, debug time grows with indirection.
  • C4 — FTF factor gating is custom : every A/B lever (per ADR-56) needs its own conditional plumbing in the step code. Growing factor count → growing conditional surface.
  • C5 — Monolith-like trajectory risk : the home-grown orchestrator accumulates edge cases over time. Precedent : cvntrade_backtest_engine.py grew from a minimal backtest engine to 2 800 lines over a year.
  • C6 — No parallelism : multi-timeframe branches (1h and 6h) execute sequentially even when they have no data dependency. Phase 2 of #599 will make this measurable.

5. Option 2 — Unify on Hamilton (runtime + batch)

5.1 What it is

Hamilton orchestrates both batch training and per-candle runtime. CandlePipeline and its supporting files are removed. Features, filter chain, inference, and execution become Hamilton nodes, invoked per candle at runtime and on the full OHLCV at training.

5.2 Measured / documented facts

Fact Source Measurement
Per-candle orchestration overhead (150-node DAG) Hamilton docs + community benchmarks (Stitch Fix, DAGWorks Inc.). Actual overhead depends on node count, dict operations, traceability level. 5–50 µs per execute() call after driver warmup. Benchmark needed on our workload for precision.
Code to maintain (runtime side) Hamilton @tag / @config.when decorators on existing functions. No orchestrator class to maintain. ~80 lines of decorators + module organisation ; orchestrator layer reduced to Hamilton's Driver (not ours).
Short-circuit on skip Not native — requires custom : node-level exceptions caught by driver, or sentinel return values, or dynamic graph pruning via Driver.raw_execute. Adds framework-specific code or forces non-idiomatic patterns.
Compile / warmup cost Driver compilation walks the function registry, builds the graph. 100–500 ms per process at first use. Amortized across the run's candles.
Memory overhead Hamilton Driver holds the compiled DAG. ~50–200 KB per Driver instance.
Lineage emission Native — Driver.visualize_execution() produces a graph image. Driver.list_available_variables() lists nodes. Zero-cost for the developer.
Parallelism hamilton.parallelism.ParallelExecutor opt-in for independent branches. Multi-tf 1h and 6h could run in parallel.
Learning curve for new contributors Hamilton requires familiarity with the decorator-based node contract and the Driver API. 1–2 days of onboarding for a new engineer. Hamilton docs are thorough.
Dependency stability Hamilton is maintained by DAGWorks Inc. (formerly Stitch Fix open source). Production use since 2019. Semver released, breaking changes documented. Low-risk dep. Version pin in requirements.txt.

5.3 Pros

  • P1 — One orchestrator, one parity surface : feature functions are invoked through the same Hamilton execute() in batch and runtime. #614's "orchestration equivalence" equivalence becomes trivial by construction — no separate Option 3 proof needed.
  • P2 — ~570 lines removed : src/commun/pipeline/runner/*.py files deletion. Net lines-of-code reduced by ~500.
  • P3 — Lineage for free : every backtest / live run emits a DAG visualisation as an MLflow artefact. Debug and audit gain an automatic tool.
  • P4 — FTF factor gating native : @hamilton.function_modifiers.config.when(enabled="confidence_v2") is the canonical way to A/B-test a feature set. ADR-56 compliance by construction, not by manual plumbing.
  • P5 — Parallelism ready : multi-timeframe independent branches can use ParallelExecutor without refactoring. Expected gain on the multi-tf sub-engine of Phase 2.
  • P6 — Industry-tested pattern : Hamilton is used in production at Stitch Fix, Axon, Ryan Abernathey's climate pipeline. Non-experimental.

5.4 Cons

  • C1 — Higher orchestration overhead : 5–50 µs per execute() call vs ~100 ns per step in CandlePipeline. Benchmark needed ; literature reports suggest 0.5–3 % wall-clock impact on 8 000-candle × 150-node workloads.
  • C2 — Short-circuit unnatural : Hamilton's execution model is "compute all requested nodes". CUSUM-gated skip requires sentinel-value propagation or custom driver behaviour. Possible but adds idiomatic complexity.
  • C3 — Cold-start cost : 100–500 ms to compile the DAG at first execute() in each process. Negligible for long runs, noticeable for short test runs (pytest).
  • C4 — Framework dependency on hot path : a Hamilton release regression can break trading runtime. Lock version in production, have rollback procedure.
  • C5 — Debugging more indirect : Hamilton's internal call stack appears between user code and node functions. Pretty-printed tracebacks require Hamilton knowledge.
  • C6 — ADR-60 retraction + ADR-61 extension : two ADRs change, documentation drift, team has to internalise the new mental model.
  • C7 — Committee review needed : not a tactical call. Architectural decision that affects the parity certification track (#614) and the #599 Phase 2 plan. Formal review process.

6. Quantitative side-by-side

Measured / estimated from existing code, Hamilton docs, and community benchmarks. Precision of Hamilton benchmarks should be improved by local measurement before committing — see §8.

Dimension Option 1 (CandlePipeline) Option 2 (Hamilton) Delta
Orchestration latency per candle (150 nodes) ~15 µs ~75 µs +60 µs (≈ 4× slower on raw orchestration)
Per-fold overhead (8 000 candles) ~120 ms ~600 ms +480 ms
Per-FTF-run overhead (45 cells × 15 trials × 20 s fold) +10 min on 3h45 run (≈ +4.4 %) +10 min / full sweep
Cold start per process <1 ms 100–500 ms +100–500 ms one-off
Code maintained locally ~570 lines (orchestrator) ~80 lines (decorators) + external dep −490 lines
Parity certification (#614) equivalences to prove 4 atomic + 1 integration + 1 orchestration = 6 4 atomic + 1 integration = 5 −1 equivalence
Lineage emission Manual (if needed) Native (dr.visualize_execution) qualitative gain
FTF factor gating Manual (conditional in step code) @config.when decorator qualitative gain
Parallelism on independent branches Sequential Opt-in ParallelExecutor qualitative gain (~5-20 % gain on multi-tf)
Dependency on external framework for runtime No Yes (Hamilton pin) qualitative change
Short-circuit on CUSUM skip Native, zero cost Requires custom logic qualitative difference
Debug experience stdlib Python Hamilton framework knowledge qualitative difference

7. Non-technical considerations

7.1 Team familiarity

  • CandlePipeline : any Python developer can read a 50-line for-loop. Zero onboarding.
  • Hamilton : declarative DAG model, @config.when, ParallelExecutor. Estimated 1–2 days of ramp-up per engineer, one-time.

7.2 Drift risk

  • CandlePipeline : 570 lines, likely to grow as the pipeline evolves. Precedent (cvntrade_backtest_engine.py : 2 800 lines accumulated) shows home-grown orchestrators do grow.
  • Hamilton : 80 lines of decorators scale sub-linearly with pipeline complexity. New features add function definitions, not orchestrator code.

7.3 Contributor attraction

  • Hamilton is a known pattern that transfers across projects. A Python/ML engineer joining CVNTrade likely has (or can easily get) Hamilton experience.
  • CandlePipeline knowledge is project-specific.

7.4 Production operational surface

  • CandlePipeline : one Python file, easy to reason about during an incident.
  • Hamilton : richer call stack on errors, but the visualisation tool (dr.visualize_execution) gives incident responders a clearer model.

7.5 Hamilton's maturity (addressing C4)

  • Hamilton 2.x in production at DAGWorks / Stitch Fix since 2019.
  • Stable API since 2022 ; breaking changes gated by major version.
  • Active maintenance : 50+ contributors, regular releases.
  • Licensed under BSD-3. No licensing risk.

8. Blind spots / unknowns

Facts we do not have precisely enough :

  • Exact Hamilton overhead on our workload. The 5–50 µs range from the literature covers DAGs of varying complexity. A real benchmark on our 150-node target DAG, with CUSUM short-circuit on 80 % of candles, is required before committing. Proposed : 1-day benchmark spike on a fork of #615 before the architecture decision is finalised.
  • Hamilton's short-circuit idioms on our specific pattern. CUSUM gate is the entry point ; rejecting 80 % of candles early is essential for runtime perf. Hamilton-native patterns exist (mutate, custom nodes, sentinel values) but none is "obviously right" for this case. Requires design exploration.
  • FTF cumulative perf impact at full 150-node scale. Option 2 C1 estimates +4 % wall-clock on full FTF runs. The ~4 % is a literature extrapolation. Real measurement on Phase 2 would tighten or widen this.
  • Rollback procedure if Hamilton regresses in production. Pinning a version works, but a breaking change to our pinned version force a migration. Mitigation procedure not yet documented.

9. Decision required

Question to the committee / decision-maker :

Should the CVNTrade per-candle runtime orchestrator be (Option 1) the current home-grown CandlePipeline, with Hamilton scoped to batch training only, or (Option 2) unified on Hamilton for both batch and runtime — at the cost of retracting ADR-60 and widening ADR-61?

The decision is binary. No middle-ground option has been identified that preserves a single parity surface (the main benefit of Option 2) while avoiding the external-dependency risk on runtime (the main benefit of Option 1).

Suggested inputs before deciding

  • Run a 1-day benchmark spike to measure actual Hamilton overhead on our target DAG (§8 blind spot #1 and #3).
  • Review Hamilton's short-circuit idioms for the CUSUM-gate case (§8 blind spot #2).
  • Agree on Hamilton version-pin + rollback procedure (§8 blind spot #4).

What this doc does NOT recommend

No recommendation is made. Both options have a coherent rationale. The decision depends on :

  • Weight given to "reduce parity surface" (favours Option 2)
  • Weight given to "no external framework on hot path" (favours Option 1)
  • Weight given to "eliminate home-grown orchestrators to avoid monolith drift" (favours Option 2)
  • Weight given to "preserve rejection-heavy short-circuit efficiency" (favours Option 1)
  • Weight given to "one orchestrator reduces maintenance, one parity surface simplifies #614" (favours Option 2)
  • Weight given to "Hamilton overhead on our actual workload is not yet measured" (favours deferring to Option 1 until measured)

These weights are outside the scope of this dossier.


10. References

  • ADR-60 (current) — Hot-path sequential flows use CandlePipeline. Candidate for retraction under Option 2.
  • ADR-61 (current) — Batch DAGs use Hamilton, not imperative code. Candidate for widening under Option 2.
  • ADR-40 — Paper/live same kernel, adapter seul diffère. Orthogonal — both options satisfy this.
  • #568 — Pipeline runner implementation track. Current work is on the CandlePipeline codebase (Phases 1–4 shipped).
  • #599 — Stateful enrichment refactor. Provides the atomic layer shared by both options.
  • #614 — FTF ↔ Runner parity certification. Scope shrinks by one equivalence (orchestration) under Option 2.
  • Hamilton documentation — https://hamilton.dagworks.io/ — for the facts cited in §5.
  • Existing code
  • src/commun/pipeline/runner/pipeline.py (Option 1 artefact)
  • src/commun/pipeline/enrichment/engine.py (atomic layer, shared)

11. Appendix — how to measure before deciding (proposed spike)

  1. Clone a minimal branch from feat/599-phase1-stateful-core.
  2. Implement the 6-step current pipeline (CUSUM / Enrich / FE / Inference / FilterChain / Execution) as Hamilton nodes, with @config.when gating the CUSUM branch.
  3. Run backtest on a 8 000-candle BTCUSDC fixture, measure :
  4. Wall-clock total
  5. Per-candle orchestration time (profile with cProfile)
  6. Cold-start time (first execute() after module import)
  7. Compare against the current CandlePipeline on the same fixture.
  8. Publish the benchmark as an appendix to this dossier. Decision is taken on measured data, not estimates.

Time estimate : 1 day senior engineer. Can be run in parallel with any other work, does not block Phase 2 of #599 if started immediately.