A service catalog fails when it becomes a wiki with a prettier search box.

Situation

Platform engineering has made the service catalog a central object in the delivery system. Backstage popularized the idea that every service, API, library, resource, owner, and operational link should be discoverable from one place. Internal developer portals then extended that idea into scorecards, deployment views, incident context, onboarding workflows, software templates, and compliance evidence.

That shift is useful because modern systems are no longer understandable from source control alone. A production service is the intersection of a repository, a deployment pipeline, runtime infrastructure, ownership rules, on-call policy, observability, API contracts, data dependencies, and operational history.

The service catalog is the map engineers reach for when something breaks, when a team wants to reuse a capability, when a platform team wants to standardize production readiness, or when leadership asks which systems still depend on an old runtime.

The temptation is to put everything there.

The Problem

The catalog becomes unreliable when it stores information that changes faster than the ownership model around it. Engineers stop trusting it when service owners are stale, dashboards point nowhere, lifecycle state disagrees with deployment reality, or a page says a service is deprecated while traffic is still flowing through it.

The deeper issue is not documentation hygiene. It is source-of-truth confusion.

Some facts belong in the catalog because the catalog is the right authority. Other facts belong in CI, deployment systems, observability tools, cloud inventory, incident systems, API gateways, policy engines, or runtime control planes. If the catalog copies those facts, it becomes a cache. If it becomes a manually edited cache, it becomes fiction.

The question is not, “What can we display in the service catalog?”

The question is, “Which facts should the catalog own, and which facts should it resolve from systems that already own them?”

The Catalog Is a Control Surface, Not a Database

A good service catalog owns stable identity and stewardship. It links to volatile operational state. It should answer who owns a thing, what kind of thing it is, how it relates to other things, and which workflows apply to it. It should not pretend to be the deployment system, observability backend, asset inventory, CMDB, or incident database.

flowchart TD
  A[service catalog — identity and ownership] --> B[repository — source metadata]
  A --> C[ci system — build metadata]
  A --> D[deployment platform — release state]
  A --> E[observability — runtime signals]
  A --> F[incident system — operational history]
  A --> G[policy engine — readiness checks]

  B -->|publishes| A
  C -->|reports| A
  D -->|reports| A
  E -->|links| A
  F -->|links| A
  G -->|evaluates| A

What belongs in the catalog:

  • Service identity: canonical name, description, type, lifecycle, tier, domain, and system grouping.
  • Ownership: accountable team, escalation path, on-call rotation link, Slack or mailing list, and technical owner.
  • Relationships: upstreams, downstreams, APIs consumed, APIs provided, data dependencies, and shared resources.
  • Entry points: repository, runbook, dashboard, logs, traces, alerts, deployment page, incident queue, and API documentation.
  • Standards metadata: production readiness status, dependency freshness, ownership completeness, documentation coverage, and policy exceptions.
  • Workflow hooks: create service, request access, register API, rotate secret, deprecate service, start incident review, and archive component.

What does not belong as manually maintained catalog data:

  • Current deployment version.
  • Live health state.
  • Request rate, latency, error rate, or saturation.
  • Active incidents.
  • Cloud resources discovered from runtime inventory.
  • Vulnerability findings copied from scanners.
  • CI status copied from build tools.
  • Access control state copied from identity providers.
  • Cost numbers copied from billing systems.

Those may absolutely belong on the catalog page. They should be resolved, embedded, or linked from the authoritative system.

The architectural rule is simple: the catalog should own nouns and relationships; other systems should own fast-changing facts.

In Practice

Context: Spotify’s Backstage model treats the catalog as a graph of entities such as components, APIs, resources, systems, domains, groups, and users. The documented pattern is that each entity carries metadata and a spec, including ownership and lifecycle fields, while integrations surface information from tools around the entity.

Action: Use that pattern to make owner, system, lifecycle, and type first-class catalog fields. Then attach tool-specific state through plugins or resolvers instead of pasting values into YAML.

Result: The catalog remains stable enough to be reviewed in code, while CI, deployment, observability, and security systems continue to publish the volatile facts they already know.

Learning: A catalog entity should be durable. A dashboard panel, alert state, deployment version, or vulnerability count should be fetched from the system that produces it.

Context: Kubernetes demonstrates the difference between identity metadata and runtime state. Labels and annotations describe objects and enable selection or integration, while status is maintained by controllers. The documented system behavior is that controllers continuously reconcile desired state and observed state.

Action: Apply the same boundary to service catalogs. Put durable service metadata in catalog definitions. Let controllers, scanners, and platform integrations report current state.

Result: The catalog can drive automation without becoming responsible for every operational fact. It can say which services must meet a policy, while the policy engine decides whether they currently pass.

Learning: If a value changes because a controller, deployer, scanner, or monitor observed something, the catalog should reference that source rather than own the value.

Context: OpenAPI and AsyncAPI specifications provide documented contract formats for HTTP and event-driven interfaces. They are better authorities for operation names, schemas, payloads, and compatibility rules than a manually written catalog summary.

Action: Register the API in the catalog, link it to the owning service, and attach the actual contract from the API specification repository or registry.

Result: Engineers can discover the API through the catalog while contract validation remains tied to the artifact used by producers and consumers.

Learning: The catalog should explain that an API exists, who owns it, and how it fits into the system. The API specification should define the contract.

Where It Breaks

Failure modeWhat caused itBetter boundary
Stale ownershipTeam names are edited by hand and never reconciledSync owners from identity or team registry, then require catalog references
Fake healthCatalog stores manual status fields like healthy or degradedPull health from observability or deployment systems
Broken scorecardsReadiness checks depend on optional links and human updatesCompute checks from repositories, pipelines, alerts, and policy results
Catalog sprawlEvery repository becomes a serviceModel libraries, jobs, APIs, resources, and services as different entity types
Compliance theaterExceptions live in comments or wiki pagesStore exception metadata with owner, expiry, approver, and policy reference
Unclear authorityCatalog duplicates CMDB, cloud inventory, and monitoring dataCatalog owns identity and relationships, integrations own operational state

A service catalog also breaks when every entry is treated equally. A batch job, shared library, customer-facing API, data pipeline, and production service have different operational responsibilities. If the catalog forces them into one shape, it either becomes too vague for production use or too heavy for lightweight components.

The catalog should support different entity types with different required fields. A tier-one customer service may require on-call, SLOs, runbooks, dashboards, dependency declarations, and incident review links. A library may require owner, repository, release process, language, dependency policy, and consumers. A deprecated system may require migration owner, target retirement date, replacement path, and known consumers.

The catalog is most valuable when it makes those expectations explicit.

What to Do Next

  • Problem: Your catalog probably mixes durable ownership metadata with fast-changing operational state.
  • Solution: Define the catalog as the authority for identity, ownership, lifecycle, relationships, and workflow entry points.
  • Proof: Check whether deployment versions, health, vulnerabilities, costs, incidents, and CI results are copied by hand. If they are, move them behind integrations.
  • Action: Start with a small schema: name, type, owner, lifecycle, system, repository, runbook, dashboard, on-call, APIs, dependencies, and policy status. Then enforce freshness through automation instead of reminders.