Unified Configuration Platform — Product Need¶
need_id: CVN-N008
Status: draft (awaiting committee review)
Created: 2026-04-24
Owner: to be assigned
Source: GitHub issue #675
Related: GitHub issue #673 (PTE silent override incident — 3h compute wasted, ADR-65 fix merged), ADR-59 (config in PostgreSQL, Console-only write path), ADR-65 (DAG params run-level only)
1. Executive Summary¶
The trading system is moving from batch experimentation toward continuous paper-then-live operation. In that regime the configuration surface — every named parameter driving feature engineering, model inference, filtering, execution and risk — must be safely editable, fully traceable, and propagatable to running workers in real time. The current configuration stack cannot meet those requirements: the admin tool is a developer-grade Streamlit prototype, the parameter catalog is hardcoded in Python source (every change = code release + deploy), and there is no formal binding between an experimental run and the exact parameter snapshot that produced it. A recent incident (issue #673) wasted three hours of compute because a deprecated parameter value was silently picked up from a Python default; the root cause was structural, not a bug.
This Need defines the product-level direction for a Unified Configuration Platform: a single web application that owns every parameter (catalog + active values + history), pins every run to an immutable parameter snapshot for reproducibility, and distributes changes to paper and live trading workers within seconds under controlled, auditable conditions.
The Need decomposes into four independently-deliverable epics listed in §9 Breakdown. Each epic can be shipped on its own; together they replace the legacy tooling end to end.
2. Context¶
2.1 What the system does today¶
The product is an automated cryptocurrency trading system with three operating modes sharing a common pipeline (enrichment → feature engineering → inference → filter chain → execution):
- Research / experimentation — a fine-tuning framework (FTF) systematically sweeps parameter variants across cryptos × folds × trials to pick the best settings before promotion. Runs are batch, orchestrated by Airflow, typically 30 min to several hours
- Paper trading — the same pipeline runs candle-by-candle against real market data without capital at risk, long-lived worker pods
- Live trading — the same pipeline runs with capital committed, same long-lived worker shape
All three modes consume the same configuration schema: ~50 to 70 named parameters (feature flags, thresholds, model hyperparameters, execution rules) stored as a JSONB row in a PostgreSQL table (ftf_config.base_env). Changing any parameter can alter trading behavior.
2.2 How configuration changes are made today¶
The sanctioned path is the internal Streamlit admin tool (~800 lines, single file) with three pages: parameter editor, experiment run history, audit trail. Behind the scenes there are in fact three unconsolidated sources of a parameter's value: the PostgreSQL row, a local JSON fallback file (config/ftf_baseline.json), and per-pod environment variables. Precedence is implicit, inconsistent across call sites, and not enforced by the type system.
The parameter catalog itself — the list of known parameters, their allowed drop-down values, their descriptions, their grouping — is not in the database. It is hardcoded in two Python dictionaries (PARAM_OPTIONS, PARAM_DESCRIPTIONS) in the Streamlit source. Adding a new parameter, widening allowed values, or deprecating one requires a pull request and a deploy. The same is true of the experimental grid: each AblationFactor in the fine-tuning framework lists its variants inline in Python, so narrowing or widening an ablation study requires a code change.
2.3 How configuration reaches the runtime today¶
Workers read the PostgreSQL row at pod start and parse it into environment variables. Any change made after a worker starts is ignored until the pod is restarted. Operators avoid touching configuration in paper or live mode because the only effective propagation path is a restart, and a restart loses in-memory state. The configuration therefore drifts away from the market conditions it was supposed to adapt to, exactly the opposite of the intended design.
2.4 Triggering evidence¶
- Issue #673 (2026-04-24) — a 3-hour FTF run on
defi_top5produced unusable results (f1_buy < 0.20) because the run executed under the deprecated PTE envelopeATR1.5_3.0_H5instead of the current policyATR0.5_1.5_H4. Root cause: dual source of truth (Python default in the DAG vs. operator-set value in the Console), precedence not enforced. ADR-65 was introduced to close the specific run-level vs. configuration boundary but did not address the broader platform-level issue. - Issues #606 / #607 — a succession of bugs in Streamlit's session-state handling (pending deletes, pending adds, concurrent save races) surfaced during progressive hardening of the editor. Each fix introduced a new complication; the pattern suggests the framework is now bearing weight it was not designed for.
- Issue #580 — Superset was previously decommissioned because the same class of friction (schema drift, hardcoded queries, opaque admin surface) made it unfit for production operator workflows. The Streamlit tool is approaching the same fate.
3. Problem¶
Four specific, coupled problems require a platform-level answer rather than successive point fixes:
-
The admin tool has hit its engineering ceiling. Streamlit was chosen for rapid research dashboards, not production configuration management. It has no role-based access control, no approval workflow, no multi-environment separation, no client-side validation, a poor diff viewer, and a mobile experience that is effectively unusable. Its logic is not unit-testable in isolation. Further hardening returns diminishing value.
-
The catalog is in code. Adding a parameter, widening an allowed-value set, or deprecating an obsolete entry is a code release. For a platform whose entire purpose is configurability, this is structurally wrong. It also means the experimental grid (FTF variants) is frozen in source — narrowing or widening a sweep takes hours of engineering instead of a 30-second operator decision.
-
Runs are not bound to parameter snapshots. Two runs a week apart, nominally under "the same configuration," may not actually be comparable: the catalog may have drifted, a default may have changed, a worker pod may have inherited a stale environment variable. There is no formal artifact tying a run to the exact, reproducible parameter state that produced it. The audit trail is best-effort JSON on the run row, not a first-class, immutable, content-addressable object.
-
Paper and live trading cannot absorb configuration changes safely at runtime. Changes require a pod restart, which loses state. The operator therefore does not change configuration on live workers, even when the market would justify it. This is a symptom of an architecture that never treated the runtime as a configuration subscriber — only as a one-shot reader at boot.
Treating any of these problems individually is possible but would move the friction sideways rather than resolve it: a prettier UI on top of a hardcoded catalog still requires releases; an editable catalog without run-binding worsens traceability; run-binding without runtime distribution has no teeth in paper/live; runtime distribution without RBAC and safety boundaries is dangerous. The four must be addressed as a single platform direction.
4. Impact¶
If nothing changes, the following failure modes compound as the system moves from research into paper and live operation:
- Recurring wasted compute. The #673 incident (3h wasted) is not an outlier. It is the expected outcome of a platform with two sources of truth and no structural disambiguation. As experimental cadence increases, the expected value of wasted runs grows linearly.
- Unreproducible results. Statistical claims produced by the fine-tuning framework (BH-corrected p-values, bootstrap confidence intervals, Cohen's d effect sizes) require byte-identical re-runnability to hold their meaning. Without run-to-ParamSet binding, the claims are defensible only as long as the catalog and defaults have not drifted — a window that shrinks as the platform grows.
- Operator aversion to live configuration changes. Operators who cannot safely change a parameter without a restart will default to not changing it, independent of whether a change is warranted. The platform's adaptability is then limited by its deployment mechanics, not by the strategy.
- Audit deficit. A trading system must be able to answer, after the fact, "under what exact configuration did this decision run, and who changed what when?" The current platform cannot produce that answer without manual reconstruction across PostgreSQL, JSON files, Git history, and pod environment snapshots.
- Continued organic growth of the Streamlit tool. Each new feature requested costs increasing engineering time to integrate safely. The trajectory is toward a rewrite anyway; delaying the rewrite multiplies the throwaway work in the meantime.
Bottom line: without this platform, the transition from research to live operation is gated not by strategy performance but by the inability to configure, trace, and adapt the system safely.
5. KPIs¶
Each KPI has a measurable target and a source of truth.
- KPI 1 — Catalog edits require no code release. Adding a new parameter, widening its allowed values, or deprecating it is performed entirely through the Console by an authorized operator. Target: 100% of parameter-catalog changes go through the Console, zero through PRs to
PARAM_OPTIONSor equivalent. Measured by: Git history of the legacy tool (post-decommission, zero commits touching parameter definitions) + Console audit log. - KPI 2 — Every run bound to an immutable parameter snapshot. Target: 100% of experiment / paper / live runs carry a foreign key to a
parameter_setsrow; database schema rejects inserts without it. Measured by: NOT NULL + FK constraint at the DB level + CI test. - KPI 3 — Real-time propagation latency. Time from "Save" in the Console to a paper or live worker applying the new value in memory. Target: under 5 seconds p95 for hot-reloadable parameters, over a 30-day window. Measured by: instrumentation on the Console save path + worker-emitted
config_version_appliedmetric, aggregated in Grafana. - KPI 4 — Zero mid-trade application. No hot-reload applies in the middle of an open trade, an active decision cycle, or a critical section. Target: zero violations recorded in the worker's structured log over any release window. Measured by: a specific log event category that the worker must never emit; alert on any occurrence.
- KPI 5 — Reproducibility of statistical claims. Given a Parameter Set hash and a crypto list, a second run produces outputs matching the first within the documented numeric tolerance (deterministic paths) or the documented seed-dependent band (non-deterministic paths). Target: 100% match on deterministic paths, documented reproducibility interval on non-deterministic ones. Measured by: reproducibility test suite run at each release.
- KPI 6 — Parity with legacy Console at MVP. No operator workflow currently available in the Streamlit tool is lost in the MVP. Target: zero regressed workflows. Measured by: end-to-end Playwright test suite covering every Streamlit flow, passing on the new Console before decommission.
- KPI 7 — RBAC compliance. A viewer cannot write. A change to a parameter tagged "critical" cannot be applied without an approver step. Target: 100% enforcement. Measured by: RBAC integration tests + audit log coverage.
- KPI 8 — Mobile usability of read-only surfaces. The dashboard, run list, and rollback action work on a phone screen without layout breakage. Target: Lighthouse accessibility + responsive scores ≥ 90. Measured by: Lighthouse CI on deployed URL.
6. Constraints¶
- ADRs in scope: ADR-25 (no silent fallbacks), ADR-26 (Grafana single entry point — observability dashboards must flow there), ADR-56 (every pipeline change FTF-testable), ADR-59 (configuration in PostgreSQL, Console-only write path), ADR-65 (DAG params run-level only). This Need extends ADR-59 and ADR-65; any new invariant introduced must reach ADR status before it becomes enforceable in CI.
- Source of truth: PostgreSQL. No JSON fallback in production code paths. Local development retains a convenience file for offline tests only.
- Deployment platform: the existing Scaleway Kapsule cluster. The Console, API, and runtime subscribers run on Kubernetes, behind the existing nginx + cert-manager ingress on the
cvntrade.eudomain. - Existing data: the
ftf_configandftf_config_historytables and their audit semantics must be preserved (history is append-only, no data loss during migration). - Budget: no new paid SaaS. Transport and storage must reuse existing infrastructure (PostgreSQL, Redis if needed, Prometheus, Loki, Grafana). OIDC provider is acceptable if already deployed; otherwise basic authentication over HTTPS is the starting point.
- No regression window: operators keep their current workflows until the MVP ships with end-to-end parity. The Streamlit tool remains live until the new Console passes the parity gate.
- Code-level behavior guardrails stay in code: hard blocks on known-bad parameter values (e.g., the deprecated PTE envelope rejected by
AblationRunner) are enforced in the pipeline code next to the usage site. The catalog describes what operators may enter; the code describes what the pipeline refuses to execute. The two are consistency-tested in CI.
7. Out of Scope¶
- Model artifacts and training data — this Need covers operational configuration only. MLflow and the data lake are not restructured here.
- General-purpose feature-flag service — the parameter catalog serves the dual role of configuration and flags for the trading platform. No separate feature-flag service is introduced.
- Broad observability redesign — beyond the specific
config_versiontagging required for attribution, the observability stack (Grafana, Prometheus, Loki) is used as-is. - Retroactive restructuring of past runs — parameter-set reconstruction for runs that predate the FK constraint is best-effort; pre-migration runs are flagged "legacy" and exempted from comparability rules.
- Rename semantics for parameters — deprecation + replacement pointer covers the operator need; a full rename with history rewriting is explicitly not supported.
- Choice of pub/sub transport — the transport (PostgreSQL
LISTEN/NOTIFY, Redis, NATS, other) is a design-doc decision, not a Need-level commitment. This Need fixes the SLO (5s p95), not the mechanism. - ML Platform governance (CVN-N004) — parameter metadata overlaps with model tags but the two surfaces are governed separately. This Need does not redesign MLflow tagging.
8. Risks¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Scope grows into a platform-of-everything and never ships | Medium | High | Strict increment gates. MVP (epic EA) must ship in parity before any Value-add epic starts; the Advanced increment starts only after Value-add closes. Committee signs off on the Need with this gating as a binding constraint. |
| Catalog metadata drifts from code that consumes the parameters (unused catalog entries, consumed parameters absent from catalog) | High | Medium | CI job scans source for parameter references, cross-checks against the catalog row set, fails the build on either-direction drift. |
| Mid-trade configuration change destabilises a live position | Low | Critical | Safe-boundary contract declared per parameter category, unit-tested. Hot-reload is opt-in per parameter in the catalog (default: restart-required). Freeze windows configurable on the live environment. |
| Real-time distribution transport fails silently (worker stops receiving) | Medium | High | Heartbeat + lag-behind alert on the observability dashboard. Automatic fallback from push to poll if the channel is lost. Alert pages on-call if any worker falls more than N versions behind. |
| Schema migration of existing parameters loses metadata | Low | Medium | One-shot migration script with superset-check before write. Dry-run mode gates the live run. Reversible via restore-from-history. |
| Streamlit parity missed in MVP, operator rejects cut-over | Medium | Medium | End-to-end Playwright tests cover every legacy flow. Cut-over is gated on zero regressed workflows. Streamlit stays live until the new Console passes the gate. |
| RBAC rollout breaks existing automated writers (DAGs, CI) | Medium | High | Service-account tokens issued per writer before RBAC flips on. Shadow-mode period (one sprint) during which RBAC violations are logged but not enforced — same pattern used in #604 Phase D. |
| Real-time distribution stalls under many concurrent subscribers | Low | Medium | Load test in staging with 10× expected worker count before production enablement. Design doc captures fan-out limits. |
| Committee rejects the Need (methodology flaw) | Medium | Low | Target one revision cycle per committee rejection; a second REJECT triggers scoping escalation (split into smaller Needs). Same REVISE pattern as #604 v1 → v2. |
9. Breakdown (epics)¶
The Need decomposes into four independently-deliverable epics. Each epic has its own acceptance criteria, stories, and release. They are sequenced, not strictly parallel: EA gates EB, EB gates EC and ED; EC and ED can run in parallel once EB ships.
| Epic | Title | Increment | Gate |
|---|---|---|---|
| CVN-N008-EA | Modern Console — MVP parity with the legacy Streamlit tool, decommission plan | MVP | Required before EB starts |
| CVN-N008-EB | Catalog-driven parameter management — variable catalog in DB, FTF variants derived from LoV, Admin pages for self-service parameter administration | Value-add | Requires EA shipped |
| CVN-N008-EC | Run ↔ Parameter Set strict traceability — immutable snapshots, FK on runs, diff + re-run workflows | Advanced | Requires EB shipped |
| CVN-N008-ED | Real-time distribution to paper and live trading — hot-reload, safe boundaries, config-version tagging, rollback, freeze windows, observability dashboard | Advanced | Requires EB shipped |
Each epic document (to be created after committee approval of this Need) will carry its own stories, design decisions, and ADRs.
9.1 EA — Modern Console (MVP parity)¶
- Replace the Streamlit admin tool with a Next.js web application, shadcn/ui components, typed end-to-end via TypeScript, Zod-validated forms shared between browser and API
- Three pages ported with full workflow parity: parameter editor, experiment run history + PDF download, audit trail with diff and restore
- Basic authentication over HTTPS at minimum; OIDC optional
- Deployment via existing Helm chart pattern, same ingress and certificate mechanisms
- End-to-end parity tests (Playwright) cover every legacy operator flow
- Streamlit tool decommissioned on cut-over; old URL redirects or is removed
9.2 EB — Catalog-driven parameter management¶
- New database table
variable_catalogstores every parameter's metadata: name, type (enum/bool/number/string), allowed values or numeric bounds, default, description, display group, required flag, deprecation state, optional validator expression - New Admin pages in the Console: edit existing parameter metadata, create a new parameter, mark a parameter deprecated with a replacement pointer
- The fine-tuning framework reads allowed values from the catalog at run time instead of hardcoded inline dictionaries; editing the catalog changes the next run's experimental grid with no code release
- RBAC: three roles (viewer, editor, approver); catalog mutations restricted to an
adminrole on top of these - Environment separation: distinct active-configuration rows per environment (dev/staging/prod), promotion flow between environments is explicit and audited
9.3 EC — Run ↔ Parameter Set traceability¶
- New database table
parameter_setsstores content-hashed immutable snapshots of the full parameter state (active values + catalog metadata + resolved experimental variants + code versions) captured at run trigger - Database trigger enforces immutability past a short grace window; de-duplication via content hash
finetune_runs.parameter_set_hashforeign key, NOT NULL after migration- Console workflows: "Re-run identical" button, Parameter Set diff between two runs, list of runs sharing a given Parameter Set
- Workers read the Parameter Set (not live catalog state) — zero drift between trigger and execution
- Comparability rules (strict / relaxed / forbidden) documented for statistical analysis
9.4 ED — Real-time runtime distribution¶
- Each parameter declares in the catalog whether it is hot-reloadable (default: restart-required)
- Distribution transport: the design doc selects the mechanism (PostgreSQL
LISTEN/NOTIFYis the default candidate); the SLO is 5 seconds p95 end-to-end - Workers apply new values only at safe boundaries (between decision cycles, between trades) — the contract is declared per parameter category and unit-tested
- Every worker output (PnL, signals, metrics) is tagged with the active
config_version_idso post-hoc analysis can attribute effects to the correct configuration epoch - Rollback from the audit trail uses the same distribution path and safety boundaries
- Guardrails: per-parameter bounds, cross-parameter coherence rules, freeze windows, RBAC checks — enforced client-side AND server-side
- Observability dashboard: per worker, current
config_version_id, last reload timestamp, lag behind the active version, pending restart-required changes; alert when a worker falls too far behind
10. Stakeholders¶
- Operators — primary users. Day-to-day edits to active configuration; approval gates for critical parameters. Must sign off on MVP parity and on the paper/live rollout of Advanced.
- Research / ML engineers — consumers of the variant grid. Must sign off on the catalog-driven FTF change (EB) and on the Parameter-Set comparability model (EC).
- On-call / infra — run the Kubernetes deployment, observability, alerting. Must sign off on the real-time transport (ED) and the observability dashboard.
- Security / compliance — RBAC design, audit completeness, immutability invariants. Informed at EA; sign off on RBAC enablement at EB and on Advanced before live cut-over.
- Committee (architectural review) — signs off on this Need document (PROCEED / REVISE / REJECT) before any epic is opened.
11. Interfaces with In-Flight Work¶
- ADR-65 (merged) — Airflow DAG params run-level only. First step toward the principle this Need generalizes. The broader audit of other DAGs (tracked as P1 in #673) will land under this Need's epic EB.
- CVN-N004 ML Platform — neighbor Need. Parameter metadata overlaps with model tags (training-time parameters are captured by both). The two are governed separately but cross-referenced.
- CVN-N006 MLflow backbone audit — overlaps on Lineage (Pillar 8 of that audit): parameters are upstream inputs to model training and must appear in the lineage. EC (Parameter Set) naturally feeds that.
- CVN-N007 Infra dashboard — EA's observability additions (config version per worker, distribution lag) are candidates for inclusion in the infra dashboard panels rather than a separate dashboard.
- Shadow-logging patterns established in #599 and #604 Phase D — reused for RBAC enablement rollout (EB) and hot-reload enablement per parameter (ED).
12. Committee Submission¶
Title: Unified Configuration Platform — Need review (CVN-N008)
Question (target score: 8+):
This Need document proposes a platform-level direction rather than a point fix. It responds to a structural pattern surfaced by incident #673 (PTE silent override, 3h compute wasted) and by the accumulated friction of the Streamlit admin tool (#606 / #607). The document decomposes into four epics (EA/EB/EC/ED) with explicit gating between increments.
Committee review requested on:
-
Scope framing — is "Unified Configuration Platform" the right envelope for these four concerns, or should Run-to-ParamSet traceability (EC) and Real-Time Distribution (ED) be lifted into their own Needs given their cross-cutting impact on the trading runtime?
-
KPI completeness — do the 8 KPIs in §5 cover the Need adequately? In particular, is KPI 3 (5 s p95 propagation) the right target, or should we fix a tighter SLO for live trading specifically (e.g., 1 s p99 on a subset of "trading-critical" parameters)?
-
Sequencing — EA (MVP parity) gating EB is natural. Should EC and ED gate on each other, or can they truly run in parallel once EB ships? The design-doc level decision has implications on database migration ordering.
-
Safe-boundary contract — §9.4 sketches per-parameter-category boundaries. Is the committee comfortable with this being defined in the ED design doc, or should there be a prior, Need-level table listing categories?
-
RBAC depth — three roles (viewer / editor / approver) + one
adminrole for catalog mutations. Is this sufficient for a financial trading platform, or should we match the NIST pattern used in CVN-N006's Pillar 7? -
Non-goals — §7 explicitly excludes renaming parameters (deprecation + replacement only). Is this pragmatic choice acceptable, or should the platform support rename with history rewriting?
-
Parity gate for cut-over — §9.1 requires "zero regressed workflows" before Streamlit decommission. Is Playwright coverage the right mechanism, or should the committee require a user-acceptance test with named operators before cut-over?
-
Budget for pub/sub transport — we name PostgreSQL
LISTEN/NOTIFYas the default candidate but leave the final choice to the ED design doc. Is the committee comfortable with deferring this, or should the Need fix the transport now? -
Migration of pre-N008 runs — §7 states retroactive Parameter-Set reconstruction is best-effort. Is the committee OK with pre-N008 runs being flagged "legacy" and exempted from comparability rules, or should a full back-fill be required?
-
DRI — to be named at kickoff of epic EA, not in this Need document. Is deferring acceptable given the MVP lead time, or should the DRI be named before the committee approves the Need?
Deliverable: per-expert opinion (score, confidence, findings, risks, recommendations) + consolidated verdict (PROCEED / REVISE / REJECT). If PROCEED, epic EA is opened and its design doc drafted. If REVISE, we target one iteration cycle before escalating.
Artifacts referenced:
- This Need: documentation/needs/CVN-N008-NEED-unified-configuration-platform.md
- Source discussion: GitHub issue #675 (summary + four expanded comments)
- Incident trigger: GitHub issue #673 (closed via ADR-65)
- ADRs in scope: ADR-25, ADR-26, ADR-56, ADR-59, ADR-65
- Neighbor Needs: CVN-N004, CVN-N006, CVN-N007
13. Acceptance Criteria for This Need Document¶
- Context, problem, impact framed with evidence (incident #673, #606/#607, #580)
- Four coupled problems articulated
- Eight measurable KPIs with targets and sources of truth
- Constraints and out-of-scope lists
- Risk register with likelihood, impact, mitigation
- Breakdown into four epics with explicit gating
- Stakeholder list
- Interfaces with in-flight work (ADR-65, CVN-N004, CVN-N006, CVN-N007)
- Committee submission section with 10 review questions
- Committee v1 review (pending — submission above)
14. Closure¶
Filled when status moves to delivered. Will summarize:
- Which epics shipped, in what order
- Measured KPI outcomes vs. targets (especially propagation latency and parity)
- Streamlit decommission PR link
- ADRs introduced (expected: at least one for the immutability model, one for the real-time distribution SLO, one for the RBAC scheme if not covered by ADR-65's extension)
- Lessons learned for future cross-cutting platform Needs