ADR-0068 — Expert Committee is the default channel for plan and PR reviews¶
Status: active
Date: 2026-04-26
Introduced by: ADR-52 (committee auditability) + dev-process step 3 (CLAUDE.md)
Supersedes: none
Context¶
The development process documented in CLAUDE.md (steps 1-13) makes step 3 — plan review — mandatory before any implementation. The same process makes step 8 — PR review — mandatory before merge. The text reads "Le plan est revu par une IA externe (ou un pair) AVANT toute implémentation".
In practice, this has been applied informally: - Sometimes a plan is pasted into a third-party chat (ChatGPT, Gemini) and the verdict is forgotten the same day - Sometimes the operator self-reviews, which defeats the angle-blindness purpose - Sometimes the review is skipped entirely under time pressure - The verdict, when produced, is not archived alongside the issue / PR
We already built a structured tool for this: the Expert Committee (scripts/expert_committee.py, GUI at port 8502). It runs 5 expert personas (crypto-trader, data-scientist, ml-engineer, architect, ops) plus a consolidator across LiteLLM providers, with RAG-injected per-expert bibliography, and writes auditable session JSON per ADR-52. We use it occasionally; we should use it by default.
The forcing function for this ADR: 2026-04-26 Track A signal-ceiling-audit plan got REJECTED by the committee for a methodology flaw the operator would have shipped without the review (Fano-derived f1 ceiling on imbalanced minority class is unsound). The committee paid for itself in one session.
Decision¶
Every plan review (process step 3) and every PR review of substantive code (process step 8) MUST go through the Expert Committee unless explicitly waived in the issue.
The committee output is the system of record for the review verdict — not chat transcripts, not Slack threads.
The trigger conditions are:
| Process step | Committee invocation | session_type |
|---|---|---|
| Step 3 — plan review | mandatory before implementation | plan_review |
| Step 8 — PR review for substantive code | mandatory before merge | pr_review |
| Experiment / FTF run results interpretation | recommended | experiment_review |
| Anything else | optional | general |
"Substantive code" is defined as: anything that touches src/commun/pipeline/, src/commun/finetune/, src/commun/cache/, src/backtest/, model training, label generation, or production trading code. Pure docs / dashboards / config changes are exempt.
The committee verdict is advisory — the operator retains final authority. But a verdict of REJECTED requires either (a) addressing the blockers and re-submitting, or (b) recording an explicit waiver in the issue with a written justification.
Invariants¶
- Committee invocation traceable per issue — the
--issueCLI flag MUST be set to the GitHub issue ref (e.g.#690). Sessions withoutissue_refare non-compliant per ADR-52 and should not be cited as review evidence. - Verdict archived before merge — for PR reviews, the committee session JSON path MUST be linked from the PR description. Reviewer comments alone are not sufficient.
- Dossier is self-contained — the artifact submitted to the committee MUST be readable without project context. The committee runs without filesystem access; assume a fresh reader. Standard location:
documentation/reviews/YYYY-MM-DD-<slug>.md. - Question is sharp — the
--questionCLI flag MUST list the explicit decisions the committee is asked to validate. Vague questions ("est-ce bon ?") produce vague verdicts. - REJECTED is not optional to address — a REJECTED verdict blocks the next process step unless the operator records a waiver in the issue. "I disagree with the LLM" is a valid waiver if argued; silence is not.
- Session ID surfaced in commits — when committing work that addresses committee feedback, the commit body SHOULD reference the session id (e.g.
addresses committee session b2e4c384). - Cost ceiling per session — a single committee invocation costing more than $2.00 SHOULD be flagged in the session log; iterate by submitting a tighter dossier rather than retry-spamming the full panel.
Alternatives rejected¶
- Pair review only (single human) — defeats the angle-blindness purpose; the operator works mostly alone and a single pair reviewer reproduces the same blind spots.
- Generic ChatGPT / Gemini chat — no archive, no per-expert specialization, no FinOps tracking, no link to the issue. We tried this informally; verdicts evaporated.
- CodeRabbit only on PRs (skip plan review) — CodeRabbit is excellent for code-level diff review but cannot evaluate methodology, hypothesis quality, or fitness-of-plan questions that step 3 is meant to catch.
- Committee on every change — too expensive (~$0.20 per session); over-applies expert capacity to trivial changes. Hence the "substantive code" gate.
Consequences¶
- Positive: methodology flaws caught before implementation, not after; verdicts archived in
committee/sessions/alongside issues; per-expert specialization lets us see issues a generalist reviewer would miss; cost-tracked via FinOps log. - Positive: forces dossier discipline — an unreviewable plan becomes obvious because the dossier won't write itself.
- Negative: ~5-10 min latency per invocation; ~$0.10–$0.30 per session at current Mistral/Gemini rates.
- Negative: produces verbose JSON; the GUI at port 8502 is the right way to consume it, not raw cat.
- Neutral: the committee remains advisory. Final authority stays with the operator.
Rollback¶
This ADR is process, not code — rollback = remove the file and restore CLAUDE.md step 3 to its prior text. No technical migration required.
If the committee proves systematically wrong (false REJECTED rate > 50% over a sample of ≥ 10 sessions), revisit the prompts in prompt-library/experts/*.yaml before retiring the policy.
References¶
- ADR-52 — Auditabilité des sessions du comité d'experts (the data model)
- CLAUDE.md — process steps 3 and 8
scripts/expert_committee.py— CLIsrc/scripts/committee_gui.py— Streamlit GUI (port 8502)documentation/OPERATIONS.md§15 — runbook (how to invoke, how to interpret)config/committee/experts.yaml— bibliography and RAG tuning per expertprompt-library/experts/*.yaml— versioned prompts- Issues: #476 (committee inception), #480 (GUI), #481 (auditability), #690 (Track A — first ADR-68 application)