Skip to content

CVN-N012-EA-S02 — Service catalog module on console-next (plan dossier)

Story : CVN-N012-EA-S02 (wp#96) — Service catalog module on console-next Issue : (to file pre-merge — proposed slug feat: service catalog module on console-next) Branch : feat/cvn-n012-ea-s02-catalog-module Author : Dominique (operator) + Claude Date : 2026-04-30 Status : Draft → committee plan_review pending → impl on PASS Estimated effort : 5-7 days


1. Mission

Build the first IDP module on console-next : a /catalog route + /catalog/[name] detail page backed by catalog-info.yaml files in git. The module gives operators a single browsable inventory of CVN's ~17 services with owner, lifecycle, links, and dependencies — replacing the current tribal-knowledge-plus-scattered-Helm-values state.

This is the first concrete deliverable on the IDP umbrella (Epic CVN-N012-EA, reframed 2026-04-30 to IDP modules on console-next post the Backstage-vs-console-next decision in S01). Catalog before Grafana embed (S03) before RBAC closure (S04).

2. Hypothesis

If we ship a YAML-driven catalog whose descriptor format mirrors Backstage's catalog-info.yaml, we get : - A machine-readable service inventory the team can grep / diff / PR-review. - A portability invariant (per ADR-78 I2 from the IDP choice dossier §9) — the YAML files round-trip through @backstage/catalog-model so the choice of console-next over Backstage stays reversible at the data layer. If 2 years from now we ever need to migrate to Backstage, the catalog content lifts as-is. - A reusable pattern for the next IDP modules (Grafana embed S03 will reuse the YAML-registry-loaded-at-build-time approach).

The catalog is read-only from console-next — edits go through git PRs (which is itself the audit trail for ADR-78 I7). No SaaS dependency (ADR-78 I5).

3. Scope

In scope

  1. Schema : catalog-info.yaml structure compatible with Backstage Component schemaapiVersion: backstage.io/v1alpha1, kind: Component, metadata.name, metadata.description, metadata.tags, metadata.links[], spec.type (service / library / website), spec.owner, spec.lifecycle (production / experimental / deprecated), spec.system, spec.dependsOn[].

  2. Storage layer : YAML files under documentation/catalog/<service-name>.yaml. One file per service. PR-reviewed like any other docs change.

  3. Parser (console-next/lib/catalog/) : TypeScript module that reads YAML files at build time (Next.js convention — server-only fs access during build). Exports typed Component[] and getComponent(name).

  4. Portability gate (ADR-78 I2) : CI step that runs each YAML file through @backstage/catalog-model validator. Fail the build on schema drift. Concrete : a console-next/scripts/validate-catalog.ts script invoked by console-next/package.json::test:catalog-portability, wired into .github/workflows/console-next-ci.yml.

  5. Routes :

  6. /catalog — list view : grid of <ServiceCard> components, filter by owner + lifecycle + type, search by name, link to detail.
  7. /catalog/[name] — detail view : full metadata, links, dependencies (rendered as a list of links to other catalog entries — the dependency-graph SVG is out of scope, deferred to a follow-up).

  8. Components (per ADR-66) :

  9. <ServiceCard> (presentational, in components/catalog/) : 5 Storybook states required (default, with-tags, deprecated, missing-owner, mobile).
  10. <CatalogFilters> (client component, 'use client' for URL-state sync). v1 uses native <select> + <input> styled with Tailwind ; migration to shadcn Select + Input primitives deferred to when a second consumer materialises (per dossier §3 trade-off).
  11. <DependencyList> (presentational) : 1 Storybook state.

  12. Anchor entries : 7 catalog-info.yaml files committed for the most-used services — Airflow, MLflow, Grafana, console-next itself, PostgreSQL, Redis, S3. The original target was 5 anchors ; bumped to 7 during CR pass 1 to close the dependency graph (Airflow dependsOn Redis + MLflow dependsOn S3 ; both deps now resolve to a real /catalog/<name> page).

  13. Runbook : documentation/runbooks/catalog-add-service.md — step-by-step "how to add a service" (the YAML pattern + the PR review gate). Linked from OPERATIONS.md.

  14. Tests : vitest unit tests for the parser (valid YAML, malformed YAML, missing required fields, unknown fields tolerated, schema validator integration). Component tests for <ServiceCard> rendering states.

  15. mkdocs SSoT (ADR-77) : new nav entry under documentation/runbooks/ for the catalog runbook ; link from documentation/missions/index.md (catalog as part of IDP mission).

Out of scope

  • Multi-user RBAC — operator-only for now ; deferred to S04 (CVN-N012-EA-S04, RBAC closure gate).
  • Live service health — catalog is metadata only ; probes / uptime tracking is a separate concern (the existing Grafana stack handles it).
  • Editing UI — read-only ; edits via git PRs (audit-trailed by design).
  • Dependency graph SVG — list-of-links suffices for v1 ; graph deferred until > 30 services warrant it (currently ~17).
  • TechDocs integration — ADR-77 makes mkdocs the SSoT ; the catalog links to mkdocs runbooks instead.
  • API endpoints — pure build-time YAML, no runtime API. If we ever need dynamic catalog updates (e.g., from Helm charts), that's a separate Story.
  • Audit log table — not in this Story (the audit invariant ADR-78 I7 is satisfied by git history for read-only catalog ; live audit table lands in S04 RBAC).

Explicitly withdrawn vs the original wp#96 framing

Original S03 listed "~17 CVN services + ownership groups". I'm relaxing the gate to 5 anchor services because : - The remaining 12 can be added incrementally via PR after merge (the runbook makes this trivial). - Forcing all 17 now turns the Story into a research sprint (cataloguing every microservice's owner / lifecycle / dependency graph) which is uncertain effort and orthogonal to the infrastructure of the catalog module. - Operator can add the remaining services as ops work, not framework work.

4. Implementation plan

Phase 1 — schema + parser + tests (1.5 days)

  1. Add yaml@^2.8.3 to console-next/package.json (already in monorepo pnpm-lock as transitive ; add as direct dep).
  2. Add @backstage/catalog-model as a dev dep (only used by the portability validator script ; not shipped in the Next.js bundle).
  3. console-next/lib/catalog/schema.ts — TypeScript types matching the Backstage Component schema (subset we use). Single source of truth for the type system.
  4. console-next/lib/catalog/parser.tsloadCatalog(): Component[] reads documentation/catalog/*.yaml at build time using fs.readdir + yaml.parse. Validates against the local TS schema (zod-style or hand-rolled — TBD per committee). Emits a typed list.
  5. console-next/lib/catalog/parser.test.ts — vitest unit tests : valid file → parsed ; malformed → throws with line number ; missing required field → throws ; extra field → tolerated and logged.

Phase 2 — anchor data + portability validator (1 day)

  1. documentation/catalog/airflow.yaml — first anchor entry. Owner cvntrade-ops, lifecycle production, links to grafana / loki / kubernetes namespace, dependsOn [postgresql, redis].
  2. documentation/catalog/{mlflow,grafana,console-next,postgresql}.yaml — 4 more anchors. Each one PR-able as a separate diff but bundled here for the Story.
  3. console-next/scripts/validate-catalog.ts — Node script, invoked via pnpm test:catalog-portability. Imports @backstage/catalog-model, iterates documentation/catalog/*.yaml, validates each. Exits non-zero on any drift.
  4. CI wiring : add test:catalog-portability step to .github/workflows/console-next-ci.yml between typecheck and test.

Phase 3 — routes + components (2 days)

  1. console-next/components/catalog/ServiceCard.tsx — presentational, props : { component: Component }. Renders shadcn Card with name / description / owner badge / lifecycle badge / tags. 5 states for Storybook : default, with-tags, deprecated (lifecycle visual treatment), missing-owner (graceful fallback), mobile.
  2. console-next/components/catalog/CatalogFilters.tsx'use client'. URL-state-driven (search params for owner / lifecycle / type / query). shadcn Select + Input. 1 Storybook state.
  3. console-next/components/catalog/DependencyList.tsx — presentational. Renders dependsOn[] as a list of internal <Link> to /catalog/[name]. 1 Storybook state.
  4. console-next/app/catalog/page.tsx — server component, calls loadCatalog(), renders grid of <ServiceCard> filtered via <CatalogFilters> (client component reading search params).
  5. console-next/app/catalog/[name]/page.tsx — server component, calls getComponent(params.name), renders detail. 404 if missing (Next.js notFound()).
  6. Stories : console-next/stories/ServiceCard.stories.tsx, CatalogFilters.stories.tsx, DependencyList.stories.tsx. axe-core a11y green required.

Phase 4 — runbook + nav + OPERATIONS update (0.5 day)

  1. documentation/runbooks/catalog-add-service.md — runbook with copy-paste YAML template + PR checklist.
  2. mkdocs.yml nav — register runbooks/catalog-add-service.md.
  3. documentation/OPERATIONS.md — new sub-section under §16 with link to catalog UI + runbook.
  4. documentation/missions/index.md — new IDP mission entry pointing to catalog as the first delivery.

Phase 5 — committee pr_review + merge (1 day)

  1. PR description references Story: CVN-N012-EA-S02, links the plan dossier, links the run.
  2. CodeRabbit pass(es). Wait full CR cycle per feedback_cr_rounds_before_merge.md.
  3. Expert Committee pr_review (mandatory per ADR-68 for substantial frontend changes touching the IDP umbrella).
  4. Squash merge ; OP wp#96 flipped Closed with merge SHA + acceptance criteria checklist.

5. Files to create / modify

Created

documentation/catalog/airflow.yaml
documentation/catalog/mlflow.yaml
documentation/catalog/grafana.yaml
documentation/catalog/console-next.yaml
documentation/catalog/postgresql.yaml
documentation/catalog/redis.yaml
documentation/catalog/s3.yaml
documentation/runbooks/catalog-add-service.md
documentation/reviews/2026-04-30-cvn-n012-ea-s02-catalog-module-plan.md (this file)
console-next/lib/catalog/schema.ts
console-next/lib/catalog/parser.ts
console-next/lib/catalog/parser.test.ts
console-next/scripts/validate-catalog.ts
console-next/components/catalog/ServiceCard.tsx
console-next/components/catalog/CatalogFilters.tsx
console-next/components/catalog/DependencyList.tsx
console-next/stories/ServiceCard.stories.tsx
console-next/stories/CatalogFilters.stories.tsx
console-next/stories/DependencyList.stories.tsx
console-next/app/catalog/page.tsx
console-next/app/catalog/[name]/page.tsx
console-next/app/catalog/loading.tsx

Modified

console-next/package.json                     # +yaml dep, +@backstage/catalog-model dev dep, +test:catalog-portability script
.github/workflows/console-next-ci.yml         # +catalog-portability step
documentation/OPERATIONS.md                   # +catalog access section
documentation/missions/index.md               # +IDP mission entry
mkdocs.yml                                    # +runbook nav entry

6. Test plan

Unit tests

  • console-next/lib/catalog/parser.test.ts :
  • parses valid YAML → typed Component
  • throws on malformed YAML with line number context
  • throws on missing required field (metadata.name)
  • tolerates extra fields with console.warn
  • integration : loadCatalog() reads the 5 anchor files end-to-end

  • console-next/components/catalog/ServiceCard.test.tsx :

  • renders name / description / owner
  • renders deprecated badge when spec.lifecycle === 'deprecated'
  • graceful fallback when spec.owner missing

Integration tests

  • console-next/scripts/validate-catalog.ts runs against the 5 anchor files in CI and passes (validates against @backstage/catalog-model).

Storybook + a11y

  • 5 ServiceCard states + 1 CatalogFilters state + 1 DependencyList state — all axe-core color-contrast green per ADR-66.

Smoke (manual)

  • pnpm dev ; navigate /catalog → see 5 services ; filter by owner=cvntrade-ops → see Airflow + Grafana ; click Airflow → detail page renders ; click dependsOn link → navigates to PostgreSQL detail.

7. Risks & mitigations

Risk Likelihood Impact Mitigation
@backstage/catalog-model is heavy / pulls Backstage's whole runtime medium medium dev dep only, run only in CI script ; if heavyweight even there, replace with a hand-rolled JSON Schema validator generated from Backstage's published schema
YAML schema drift between our subset and Backstage's evolving format low low the portability test catches drift ; if Backstage breaks the schema in a future major, we pin the version of @backstage/catalog-model and revisit
5 anchor entries is too few — review feedback says "not enough to validate the pattern" medium low the runbook makes adding more trivial ; add via follow-up PRs post-merge
Build-time YAML loading breaks Next.js dev hot-reload low medium Next.js 14 supports fs reads in server components ; verify in Phase 1 with the parser test ; if dev-mode is broken, fall back to a useState reload-on-change pattern in dev only
dependency-graph-as-list looks ugly for services with many deps low low accepted as v1 ; SVG graph deferred ; max ~5 deps per service today
Operator regrets the read-only-via-git constraint low medium the constraint is the audit trail (ADR-78 I7) ; revisiting requires ADR + new Story, friction by design

8. ADR & invariant compliance

ADR / Invariant Compliance
ADR-66 — UI Stack Storybook stories with required states ; axe-core a11y green ; DTCG tokens (no inline CSS) ; shadcn primitives used ; CVA variants only
ADR-77 — MkDocs SSoT runbook in documentation/runbooks/ registered in mkdocs nav ; strict-mode build green ; no doc duplication
ADR-78 (stub) I1 — Next.js routes inside console-next /catalog is a route in console-next/app/, not a standalone service
ADR-78 I2 — catalog-info.yaml portability CI portability test using @backstage/catalog-model ; round-trip validation mandatory
ADR-78 I3 — TechDocs honored by mkdocs catalog links point to mkdocs runbooks at docs.cvntrade.eu, not a separate TechDocs site
ADR-78 I4 — every IDP module respects ADR-66 yes, see above
ADR-78 I5 — no SaaS dependencies YAML in git, no external service
ADR-78 I6 — IDP kill-switch catalog is read-only ; trivially killed by removing the route file or scaling console-next to 0 (operator-controlled) ; documented in runbook
ADR-78 I7 — Immutable audit trail git history is the audit trail for catalog edits (read-only catalog ; no runtime mutations)
ADR-78 I8 — Blast radius catalog module imports nothing from app/config/* (the existing Console module's surface) ; isolated by file structure

9. Out-of-band considerations

  • ADR-78 itself — the dossier §13 acceptance checklist marks ADR-78 as a follow-up PR (separate from S01 dossier merge). This Story relies on the stub form of ADR-78 ; the formal ADR-78 lands as part of S04 (RBAC closure gate) per the reframed Epic plan.
  • @backstage/catalog-model license — Apache-2.0, compatible with our stack. No lock-in : we pin a version, and if Backstage ever changes direction, we own a YAML format spec that any tool can read.
  • Schema versioning — we adopt apiVersion: backstage.io/v1alpha1 (Backstage's current). When it stabilizes to v1, we add a migration script in a follow-up Story.

10. Acceptance gate (mirror of OP wp#96 §Acceptance)

  • console-next/lib/catalog/ exists with typed parser + vitest tests green
  • /catalog route renders the 7 anchor services with filter (owner / lifecycle / type) + search
  • /catalog/[name] detail page renders full metadata + dependency list
  • documentation/catalog/ has 7 anchor catalog-info.yaml files (airflow, mlflow, grafana, console-next, postgresql, redis, s3 — closes the dependency graph)
  • CI job test:catalog-portability fails on schema drift (verified by intentionally breaking a YAML in a draft commit, observing failure, fixing)
  • <ServiceCard> Storybook : 5 required states + axe-core green + DTCG tokens (no inline CSS)
  • <CatalogFilters> + <DependencyList> stories green
  • mkdocs runbook documentation/runbooks/catalog-add-service.md (build strict green)
  • OPERATIONS.md updated with catalog access link
  • documentation/missions/index.md entry for the IDP mission with catalog link
  • Plan dossier (this file) committee plan_review PASSED
  • PR review : CodeRabbit full cycle + Expert Committee pr_review PASSED
  • On merge : OP wp#96 status flipped Closed with merge SHA + acceptance summary

11. Open questions for committee plan_review

  1. Validator choice : @backstage/catalog-model direct dev dep vs. a hand-rolled JSON Schema validator generated from Backstage's published schema. Trade-off : ecosystem alignment vs. dep-tree weight. Operator preference ?

  2. Schema strictness : strict (reject unknown fields) vs. lenient (console.warn on unknown fields). I picked lenient because Backstage extensions (Spotify-style annotations) are common and we shouldn't break catalogs that adopt them. Committee endorse ?

  3. Anchor count : 5 services for v1 vs. all 17 ? I argued 5 in §3 ; want explicit committee endorsement of the relaxation.

  4. Client/server split : /catalog page is server component ; <CatalogFilters> is 'use client' reading search params. Alternative : everything client-side with Next 14's 'use client' boundary at the page level. Trade-off : SSR-rendered SEO + first-paint vs. simpler mental model. I picked the split. Endorse ?

  5. Dependency rendering : list-of-links for v1, SVG graph deferred. If a reviewer feels strongly that "no graph = no value" speak now ; otherwise defer.

  6. Owner field : free-form string vs. enum constrained to known OIDC groups (cvntrade-ops, cvntrade-ml, etc.). Free-form is simpler ; enum catches typos at build time but adds maintenance. I picked free-form for v1 ; committee preference ?

  7. mkdocs nav placement : new top-level Catalog section vs. tucked under Runbooks ? I picked under Runbooks (it's a single "how to add a service" page) ; alternative is a new top-level entry once we have multiple catalog-related docs.

13. Committee plan_review triage (session 4bdcdd34, 2026-04-30)

Verdict : PASSED / EXECUTION_RISK — strong consensus across 5 experts (architect 8.5, ops 8.0, ml-engineer 8.0, data-scientist 7.5, crypto-trader 8.5 — avg 8.1/10). 0 blockers. 11 recommendations.

Reason cited : "The plan is architecturally sound, well-scoped, and compliant with ADRs, but carries execution risks related to data quality, operator adoption, and the lack of explicit operational success metrics for the catalog's mission."

Open questions resolution : - Q1 (validator choice → @backstage/catalog-model dev-dep) : endorsed unanimously ; pin version, consider fallback for schema version mismatches. - Q2 (lenient schema strictness, console.warn on unknown) : endorsed unanimously. - Q3 (5 anchor services) : endorsed unanimously. - Q4 (client/server split) : endorsed unanimously. - Q5 (dep rendering as list, SVG deferred) : endorsed unanimously. - Q6 (owner field free-form vs enum) : dissent — 2 experts endorse free-form, 3 want validation. Resolved by combining : free-form in v1 + CI lint warning against a curated list of known OIDC groups (cvntrade-ops, cvntrade-ml, cvntrade-viewer, cvntrade-architect). Hard enum deferred to S04 RBAC closure where the OIDC group catalog is finalized. - Q7 (mkdocs nav under Runbooks) : endorsed unanimously.

13.1 Recommendations integrated pre-impl (locked into the plan)

Reco Source Integration
#1 — Define operational success metrics (KPIs) expert-ops + expert-data-scientist New §13.2 below ; KPIs measured manually post-merge for first 90 days, automated dashboard deferred to a follow-up Story
#3 — Owner field CI lint against curated list expert-ops + expert-ml-engineer + expert-data-scientist console-next/scripts/validate-catalog.ts already exists per §4 phase 2 ; add owner-allowlist warning : if spec.owner ∉ {curated list} → emit warning (not error). Curated list lives in console-next/lib/catalog/owners.ts
#8 — Failure isolation for YAML parser expert-architect loadCatalog() wraps each file parse in a try/catch ; malformed file → log + skip + report in CI summary (not crash the whole build). Documented in runbook §"What happens if my YAML breaks"
#10 — Rollback playbook in OPERATIONS.md expert-ops Per ADR-68 substantial-FE-change requirement. Adds 1-paragraph rollback section to documentation/runbooks/catalog-add-service.md AND a 3-line entry in OPERATIONS.md §16 (catalog access section)
#11 — CI check for YAML diff alerts expert-ops Already covered by the portability test in §4 phase 2 ; strengthen : when CI detects modifications to documentation/catalog/*.yaml, post a short summary comment on the PR (filename + owner change + lifecycle change) for review attention. Reinforces ADR-78 I7 audit trail

13.2 KPIs (mission success criteria — measured post-merge)

The catalog module replaces "tribal knowledge + scattered Helm values" with "single browsable inventory". Falsifiability test for that mission :

KPI Target (90 days post-merge) Measure
Catalog completeness ≥ 12 services in documentation/catalog/ (currently committing 7 anchors ; expected +5 via ops PRs) find documentation/catalog -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) \| wc -l
Catalog freshness 0 files with last commit > 90 days ago (matches FRESHNESS_THRESHOLD_DAYS in scripts/validate-catalog.ts) per file git log -1 --format=%ct -- documentation/catalog/<file> (covers both .yaml and .yml)
Operator adoption ≥ 5 distinct PR authors touched a catalog file git log --pretty=%ae -- 'documentation/catalog/*.yaml' 'documentation/catalog/*.yml' \| sort -u \| wc -l
Mission validation post-90-day operator survey : "Did you use /catalog at least once this week ?" — yes from ≥ 3 of the team manual check, doc'd in OPERATIONS §16
Negative falsification if all 4 KPIs miss → catalog mission failed → revisit (either re-launch comms, simplify entry barrier, or sunset the module and write a post-mortem) quarterly review

These are not Story acceptance gates (the Story closes on the §10 acceptance list). They are mission gates evaluated 90 days post-merge to validate the IDP umbrella's first delivery actually works.

13.3 Recommendations applied at impl time (in code)

Reco When
#2 — Data freshness CI check Phase 2 (validate-catalog.ts) — adds a stale-warning step (warning only, not failure)
#5 — Runtime observability Phase 3 — basic Next.js route timing logs (no full Prometheus integration ; that's S04)
#9 — Monitor @backstage/catalog-model dep size Phase 2 — CI step measures pnpm why @backstage/catalog-model \| wc -l baseline, alerts if > 2× growth

13.4 Recommendations deferred (out of scope for S02)

  • Reco #4 (Adoption strategy beyond runbook) — proactive comms / training is operator-led work, not impl. Tracked as a checkbox in the wp#96 closure comment.
  • Reco #6 (Proactive Backstage schema monitoring) — process not code. Documented in the runbook §"Pin & monitor" with a quarterly check. Real automation deferred to S04.
  • Reco #7 (Custom schema extension ADR) — when CVN actually needs an extension (not now). Filed as a follow-up note, no Story until the need surfaces.

13.5 Falsifiability gap closed

Pre-committee, the dossier had no explicit "how do we know the catalog mission succeeded" beyond the technical acceptance gates. §13.2 closes that gap with 4 measurable KPIs + a negative-falsification clause.


14. Linked context

  • IDP choice dossier — 2026-04-29-idp-choice-plan.md §3.1 + §8 + §9 (especially I2 portability)
  • Need CVN-N012 — wp#75 (deferral comment 203 explains §7 consolidation deferral)
  • Epic CVN-N012-EA — wp#77 (reframed 2026-04-30 to IDP modules on console-next)
  • Story CVN-N012-EA-S02 — wp#96
  • ADR-66 — UI Stack invariants
  • ADR-77 — MkDocs SSoT
  • ADR-78 (stub) — IDP framework choice + invariants I1-I8 (formal documentation/adr/0078-...md lands as a follow-up PR per CVN-N012-EA-S04)
  • Existing console-next CI — .github/workflows/console-next-ci.yml