Runbook — Adding a service to the catalog¶
Owner : cvntrade-ops
Status : production
Story : CVN-N012-EA-S02 (wp#96)
Linked module : console-next/lib/catalog/
This runbook covers the day-2 operation of adding (or updating) a service entry in the IDP catalog. Catalog content is read-only at runtime — every change goes through a git PR, which is itself the audit trail per ADR-78 stub invariant I7 (formal documentation/adr/0078-...md lands as a follow-up PR per CVN-N012-EA-S04).
1. Add a new service¶
1.1 Create the YAML file¶
Create documentation/catalog/<service-name>.yaml (one file per service, kebab-case name, must match metadata.name inside the file).
Template — copy and adjust :
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: <service-name>
description: >-
Short paragraph (one or two sentences). What does this service do, where
does it run, who depends on it.
tags:
- <free-form keyword>
- production
links:
- url: https://<service>.cvntrade.eu
title: <service> UI
- url: https://grafana.cvntrade.eu/d/<dashboard-uid>
title: Grafana — <service>
spec:
type: service # service | library | website | documentation
owner: cvntrade-ops # OIDC group claim (see lib/catalog/owners.ts)
lifecycle: production # production | experimental | deprecated
system: cvntrade-platform
dependsOn:
- <other catalog entry name>
1.2 Validate locally¶
This runs each YAML file through @backstage/catalog-model's canonical validator (the portability gate, ADR-78 I2) plus two non-fatal checks :
- Owner allowlist : warning if
spec.ownerfalls outside the curatedKNOWN_OWNERSlist inconsole-next/lib/catalog/owners.ts. Update that list when adding a new OIDC group. - Freshness : warning if a YAML file's last commit is more than 90 days old.
Errors → fix the YAML. Warnings → triage on the PR review.
1.3 PR¶
Branch convention : feat/catalog-<service-name> or chore/catalog-<service-name> for cross-cutting catalog touch-ups.
PR title : docs(catalog): add <service-name> (under 70 characters).
PR body must reference a Story or GH issue per STORY_WORKFLOW.md §5 rule 1.
CI gate : the Catalog portability step in console-next-ci.yml runs the same validator — drift between local + CI surfaces here.
CodeRabbit review handles the prose. The portability gate handles the schema. Operator review handles the truth of the entry (is the owner correct ? is the dependency real ?).
2. Update an existing service¶
Same pattern — edit the YAML in place, re-validate locally, PR. The git history of documentation/catalog/<name>.yaml is the per-service changelog.
3. Deprecate a service¶
Two phases :
- Flip
spec.lifecycle: deprecatedand add alinks[]entry pointing to the replacement service or a deprecation note. Merge. - After a sunset period (typically a quarter),
git rm documentation/catalog/<name>.yamlto remove from the catalog. Detail page returns 404 from then on.
Don't delete the YAML before the sunset window — operators may still reference it for context.
4. What happens if my YAML is broken¶
Per committee 4bdcdd34 reco #8, parsing is failure-isolated :
- A broken YAML file is logged + skipped at build time. The catalog page renders without it. The error message names the file + line number.
- The CI portability test fails the build — broken YAML never lands in main.
If a YAML somehow lands broken (e.g., a hot-fix that bypassed CI) :
/catalogstill renders (the file is skipped, not crashing the route).- Build console emits
[catalog] skipped <path>: <reason>. - Open a PR fixing the file ; portability gate re-runs.
5. Rollback playbook (per ADR-68)¶
The catalog module is read-only, so rollback is symmetric to deployment :
| Symptom | Action |
|---|---|
/catalog route returns 500 |
Last code change to console-next/lib/catalog/ or app/catalog/ ; revert that PR. |
/catalog shows wrong data |
Last YAML PR introduced a typo or stale dependency ; revert via git revert <sha> and merge. The service detail page reverts on the same merge. |
| Portability gate (CI) starts failing globally | A @backstage/catalog-model upgrade broke our subset ; pin the prior version in console-next/package.json and open a Story to migrate. |
| Catalog kill-switch triggered (per ADR-78 I6) | Operator can scale console-next to 0 replicas (kubectl scale deploy console-next --replicas=0) ; the catalog dies with the rest of the IDP. To kill just the catalog : git mv console-next/app/catalog console-next/app/catalog.disabled then pnpm build (Next.js skips routes outside app/) — the /catalog route returns 404 in the next deploy. Restore by reversing the rename. Single-route surgical kill is intentional ; don't add a feature flag for it without an ADR. |
Full operator escalation : OPERATIONS.md §16 → IDP modules section.
6. Maintaining the curated OIDC owner allowlist¶
console-next/lib/catalog/owners.ts lists the OIDC groups we recognize. Pre-merge :
- New OIDC group landed in the cluster ? Add it to
KNOWN_OWNERS. - An OIDC group renamed ? Update both the source list and any
spec.ownerreferences in the catalog.
A hard enum (reject unknown owners as build errors, not warnings) lands with CVN-N012-EA-S04 when the OIDC group catalog is finalized in documentation/rbac/console-next-rbac.yaml.
7. Pin & monitor @backstage/catalog-model¶
The portability invariant (ADR-78 I2) depends on the canonical validator. Per committee reco #6 :
- The version is pinned in
console-next/package.json(currently1.7.0). - Quarterly check : compare against the latest published version. Major bumps → open a Story to evaluate schema changes before merging the upgrade.
- If Backstage ever changes the schema in a way that breaks our
apiVersion: backstage.io/v1alpha1subset, don't auto-migrate : the schema choice is a deliberate ADR-78 decision and the migration deserves its own dossier.
8. KPIs (mission gates — measured 90 days post-merge)¶
Per dossier §13.2, the catalog mission "replace tribal knowledge" is falsified if all 4 KPIs miss at the 90-day review :
| KPI | Target | How to measure |
|---|---|---|
| Catalog completeness | ≥ 12 services | find documentation/catalog -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) \| wc -l |
| Catalog freshness | 0 files with last commit > 90 days (matches FRESHNESS_THRESHOLD_DAYS in scripts/validate-catalog.ts) |
portability gate's freshness warning ; per file git log -1 --format=%ct -- documentation/catalog/<name> (covers both .yaml and .yml) |
| Operator adoption | ≥ 5 distinct PR authors | git log --pretty=%ae -- 'documentation/catalog/*.yaml' 'documentation/catalog/*.yml' \| sort -u \| wc -l (quoted globs — git handles expansion) |
| Mission validation | post-90d operator survey says yes from ≥ 3 of the team | manual check, recorded in OPERATIONS.md §16 |
If all 4 miss → revisit (relaunch comms, simplify entry barrier, or sunset the module with a post-mortem).
9. Linked context¶
- Plan dossier — full impl plan + committee
4bdcdd34triage - IDP choice dossier — why console-next over Backstage
- ADR-66 — Storybook + a11y + DTCG tokens
- ADR-77 — MkDocs as docs SSoT
- ADR-78 stub — IDP framework + I1-I8 invariants (formal ADR-78 file lands as a follow-up PR)
- Source code :
console-next/lib/catalog/+console-next/app/catalog/