Skip to content

ADR-0097 — Every experiment report MUST follow the canonical template structure

Status: active (operator-mandated 2026-06-03 ; committee plan_review optional — process ADR)

Context-of-record: established producing the HP-stability report reports/2026-06-03-multi-fold-hp-stability-filtering.md (CVN-N001-EI-S04 HP-swap deliverable). Template: templates/TEMPLATE_experiment_report.md.

Context

The diagnostic programme produces experiment reports — documents that report a diagnostic/experiment, its method, and its verdict (distinct from a lightweight internal review dossier). Without a fixed structure these vary in quality and in what they expose: pre-registration is implicit or absent, decision rules are stated after results are seen (confirmatory vs exploratory blur), effect sizes appear as bare significance stars, limitations are buried, reproducibility is incomplete (fold indices instead of calendar windows under a now-anchored generator → non-reproducible across months), and project jargon makes them unreadable outside the team. Several of these are not cosmetic: an after-the-fact decision rule is the analysis-flexibility ("garden of forking paths") the programme's pre-registration discipline exists to forbid, and fold-index reproducibility silently breaks. A canonical structure makes the load-bearing disciplines mandatory and auditable rather than author-dependent.

Decision

Every experiment report (a report of a diagnostic/experiment that produces a verdict or a recommendation) MUST follow the canonical structure of TEMPLATE_experiment_report.md: front-matter → Abstract → Introduction → Background/Related Work → Hypotheses & Pre-Registration → Data & Setup → Methods → Results → Discussion → Threats to Validity/Limitations → Conclusion & Next Steps → Reproducibility Statement → Glossary → References → Appendices (Full results, Pre-registration snapshot).

Reports live under documentation/reports/ (force-tracked — reports/ is a generic gitignore pattern) and are added to the MkDocs Reports: nav (ADR-77, docs-as-SSoT).

Scope. Applies to experiment reports. A short internal review dossier (rapid decision aid for the operator/committee) is exempt from the full structure, but a review whose findings are promoted to a shareable/publishable artefact MUST be reshaped into the template.

Invariants

  • Invariant 1 — pre-registration before results. §3 records hypotheses + decision rules with a link + timestamp registered before the confirmatory (out-of-sample) data were observed. A report whose decision rule post-dates its results is exploratory and MUST declare itself so. Editing §3 after seeing confirmatory data voids the confirmatory claim.
  • Invariant 2 — effect sizes + confidence intervals, not bare p-values, for every reported comparison [Wasserstein2016]; equivalence claims use TOST, not "n.s.".
  • Invariant 3 — Threats to Validity/Limitations is a first-class section (§8); each limitation names the inference it threatens and its mitigation/follow-up. Not a footnote.
  • Invariant 4 — Reproducibility Statement (§10): run identifiers, code commit/SHA, calendar windows (never fold indices) when the fold generator is time-anchored, and software versions — enough to reconstruct every number.
  • Invariant 5 — standalone + anonymisable. A Glossary defines every internal term; an outside reader follows the report without project jargon. Asset/product names are anonymised for external circulation (real-name mapping kept internal).
  • Invariant 6 — pending analysis is an explicit placeholder, never an omission, so an in-flight report is honest about what is not yet measured.

Alternatives rejected

  • Free-form reports + a style guide — rejected: a slogan, not a control. The disciplines that matter (pre-registration, calendar-window reproducibility, first-class limitations) rot when optional. The template + this ADR make them structural.
  • Full structure for every internal note — rejected as overhead: rapid review dossiers stay lightweight; the mandate triggers when a finding becomes a shareable/publishable report.
  • Indices over calendar windows — rejected: the walk-forward generator is now-anchored, so an index is not a stable identity across months (Inv 4).

Consequences

  • Experiment reports are publishable and template-able by construction; the asymmetric multi-fold principle, pre-registration, and selection-bias framing become reusable best-practice, not one-off prose.
  • New report = copy the template, fill section-by-section, keep §3/§8/§10 honest. A CI/lint check for the mandatory section headers + a populated §10 is a candidate follow-up (not blocking).
  • Minor authoring overhead, paid back in auditability and external shareability.

References

  • Template: templates/TEMPLATE_experiment_report.md · worked example: reports/2026-06-03-multi-fold-hp-stability-filtering.md
  • Related ADRs: ADR-77 (MkDocs/Structurizr = docs SSoT — reports live on the docs site), ADR-0095 (diagnostic-story canonical template — sibling structural-template ADR), ADR-79/80 (FTF results dossier + Story closure — experiment reports are the publishable complement), ADR-68 (committee review channel).
  • Method canon embedded in the template: Wasserstein2016 (p-values), Gelman2014/Simmons2011 (forking paths), Cawley2010/Bailey2014 (selection bias), LopezDePrado2018/Bergmeir2012 (walk-forward), Efron1993 (bootstrap), Lakens2017/Schuirmann1987 (TOST).