Skip to content

Runbook — Adding a service to the catalog

Owner : cvntrade-ops Status : production Story : CVN-N012-EA-S02 (wp#96) Linked module : console-next/lib/catalog/

This runbook covers the day-2 operation of adding (or updating) a service entry in the IDP catalog. Catalog content is read-only at runtime — every change goes through a git PR, which is itself the audit trail per ADR-78 stub invariant I7 (formal documentation/adr/0078-...md lands as a follow-up PR per CVN-N012-EA-S04).


1. Add a new service

1.1 Create the YAML file

Create documentation/catalog/<service-name>.yaml (one file per service, kebab-case name, must match metadata.name inside the file).

Template — copy and adjust :

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: <service-name>
  description: >-
    Short paragraph (one or two sentences). What does this service do, where
    does it run, who depends on it.
  tags:
    - <free-form keyword>
    - production
  links:
    - url: https://<service>.cvntrade.eu
      title: <service> UI
    - url: https://grafana.cvntrade.eu/d/<dashboard-uid>
      title: Grafana — <service>
spec:
  type: service          # service | library | website | documentation
  owner: cvntrade-ops    # OIDC group claim (see lib/catalog/owners.ts)
  lifecycle: production  # production | experimental | deprecated
  system: cvntrade-platform
  dependsOn:
    - <other catalog entry name>

1.2 Validate locally

cd console-next
pnpm test:catalog-portability

This runs each YAML file through @backstage/catalog-model's canonical validator (the portability gate, ADR-78 I2) plus two non-fatal checks :

  • Owner allowlist : warning if spec.owner falls outside the curated KNOWN_OWNERS list in console-next/lib/catalog/owners.ts. Update that list when adding a new OIDC group.
  • Freshness : warning if a YAML file's last commit is more than 90 days old.

Errors → fix the YAML. Warnings → triage on the PR review.

1.3 PR

Branch convention : feat/catalog-<service-name> or chore/catalog-<service-name> for cross-cutting catalog touch-ups.

PR title : docs(catalog): add <service-name> (under 70 characters).

PR body must reference a Story or GH issue per STORY_WORKFLOW.md §5 rule 1.

CI gate : the Catalog portability step in console-next-ci.yml runs the same validator — drift between local + CI surfaces here.

CodeRabbit review handles the prose. The portability gate handles the schema. Operator review handles the truth of the entry (is the owner correct ? is the dependency real ?).


2. Update an existing service

Same pattern — edit the YAML in place, re-validate locally, PR. The git history of documentation/catalog/<name>.yaml is the per-service changelog.


3. Deprecate a service

Two phases :

  1. Flip spec.lifecycle: deprecated and add a links[] entry pointing to the replacement service or a deprecation note. Merge.
  2. After a sunset period (typically a quarter), git rm documentation/catalog/<name>.yaml to remove from the catalog. Detail page returns 404 from then on.

Don't delete the YAML before the sunset window — operators may still reference it for context.


4. What happens if my YAML is broken

Per committee 4bdcdd34 reco #8, parsing is failure-isolated :

  • A broken YAML file is logged + skipped at build time. The catalog page renders without it. The error message names the file + line number.
  • The CI portability test fails the build — broken YAML never lands in main.

If a YAML somehow lands broken (e.g., a hot-fix that bypassed CI) :

  • /catalog still renders (the file is skipped, not crashing the route).
  • Build console emits [catalog] skipped <path>: <reason>.
  • Open a PR fixing the file ; portability gate re-runs.

5. Rollback playbook (per ADR-68)

The catalog module is read-only, so rollback is symmetric to deployment :

Symptom Action
/catalog route returns 500 Last code change to console-next/lib/catalog/ or app/catalog/ ; revert that PR.
/catalog shows wrong data Last YAML PR introduced a typo or stale dependency ; revert via git revert <sha> and merge. The service detail page reverts on the same merge.
Portability gate (CI) starts failing globally A @backstage/catalog-model upgrade broke our subset ; pin the prior version in console-next/package.json and open a Story to migrate.
Catalog kill-switch triggered (per ADR-78 I6) Operator can scale console-next to 0 replicas (kubectl scale deploy console-next --replicas=0) ; the catalog dies with the rest of the IDP. To kill just the catalog : git mv console-next/app/catalog console-next/app/catalog.disabled then pnpm build (Next.js skips routes outside app/) — the /catalog route returns 404 in the next deploy. Restore by reversing the rename. Single-route surgical kill is intentional ; don't add a feature flag for it without an ADR.

Full operator escalation : OPERATIONS.md §16 → IDP modules section.


6. Maintaining the curated OIDC owner allowlist

console-next/lib/catalog/owners.ts lists the OIDC groups we recognize. Pre-merge :

  • New OIDC group landed in the cluster ? Add it to KNOWN_OWNERS.
  • An OIDC group renamed ? Update both the source list and any spec.owner references in the catalog.

A hard enum (reject unknown owners as build errors, not warnings) lands with CVN-N012-EA-S04 when the OIDC group catalog is finalized in documentation/rbac/console-next-rbac.yaml.


7. Pin & monitor @backstage/catalog-model

The portability invariant (ADR-78 I2) depends on the canonical validator. Per committee reco #6 :

  • The version is pinned in console-next/package.json (currently 1.7.0).
  • Quarterly check : compare against the latest published version. Major bumps → open a Story to evaluate schema changes before merging the upgrade.
  • If Backstage ever changes the schema in a way that breaks our apiVersion: backstage.io/v1alpha1 subset, don't auto-migrate : the schema choice is a deliberate ADR-78 decision and the migration deserves its own dossier.

8. KPIs (mission gates — measured 90 days post-merge)

Per dossier §13.2, the catalog mission "replace tribal knowledge" is falsified if all 4 KPIs miss at the 90-day review :

KPI Target How to measure
Catalog completeness ≥ 12 services find documentation/catalog -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) \| wc -l
Catalog freshness 0 files with last commit > 90 days (matches FRESHNESS_THRESHOLD_DAYS in scripts/validate-catalog.ts) portability gate's freshness warning ; per file git log -1 --format=%ct -- documentation/catalog/<name> (covers both .yaml and .yml)
Operator adoption ≥ 5 distinct PR authors git log --pretty=%ae -- 'documentation/catalog/*.yaml' 'documentation/catalog/*.yml' \| sort -u \| wc -l (quoted globs — git handles expansion)
Mission validation post-90d operator survey says yes from ≥ 3 of the team manual check, recorded in OPERATIONS.md §16

If all 4 miss → revisit (relaunch comms, simplify entry barrier, or sunset the module with a post-mortem).


9. Linked context