A service catalog is not a directory of teams and repositories; it is the control plane schema for how engineering work becomes operable.

Situation

Platform engineering has moved a large part of operational knowledge out of people’s heads and into automation. CI/CD systems decide what to build. Deployment systems decide where it runs. Incident tooling decides who gets paged. Cost systems decide what to allocate. Security systems decide which controls apply.

All of those workflows need the same facts: what the service is, who owns it, what system it belongs to, what infrastructure it depends on, and what depends on it.

Without a shared model, every tool invents its own partial catalog. GitHub knows repositories. Kubernetes knows workloads. Terraform knows cloud resources. PagerDuty knows escalation policies. Datadog knows telemetry. None of them, alone, knows the product boundary.

That is the gap a service catalog fills.

The Problem

The failure mode is not that teams lack metadata. They usually have too much metadata, scattered across YAML files, spreadsheets, Terraform state, CI variables, dashboards, runbooks, and chat channels.

The problem is that the metadata does not compose.

A repository might have an owner, but not the runtime service. A Kubernetes deployment might expose labels, but not the business system. A cloud database might have tags, but not the service consuming it. An on-call rotation might know who responds, but not which dependencies determine blast radius.

When automation tries to act on this fragmented state, it either becomes brittle or dangerously broad. A deployment gate cannot know whether a missing test is critical. A security scanner cannot route findings to the right group. A migration tool cannot determine downstream impact. A cost report cannot distinguish shared platform spend from product service spend.

The core question is: what data model lets a service catalog become a trustworthy substrate for automation instead of another manually maintained wiki?

The Answer Is a Typed Ownership Graph

A service catalog should model the engineering estate as a typed graph. The important entities are services, systems, resources, owners, and dependencies. The important design choice is to keep those entities distinct.

flowchart TD
    SVC[Service — deployable capability] --> SYS[System — product boundary]
    SVC --> OWNER[Owner — accountable group]
    SVC --> REPO[Repository — source location]
    SVC --> API[API — contract surface]
    SVC --> RES[Resource — runtime dependency]
    SVC --> DEP[Dependency — upstream service]
    DEP --> DEPOWNER[Owner — upstream accountable group]
    RES --> CLOUD[Cloud asset — database queue bucket]
    SYS --> SYSOWNER[Owner — system accountability]

A service is a deployable or independently operable capability. It may be an HTTP API, worker, scheduled job, stream processor, or internal platform component. The catalog should not define a service as “one repository” or “one Kubernetes deployment.” Those mappings are useful, but they are implementation details.

A system is the product or platform boundary that groups services into a coherent operational domain. Systems answer questions like “what is the payments platform?” or “what belongs to the developer productivity surface?” They are essential for portfolio views, architecture review, and ownership escalation.

A resource is infrastructure or managed state consumed by a service: databases, queues, buckets, caches, topics, secrets, certificates, and cloud accounts. Resources need identity because they frequently outlive deployments and often carry the highest operational risk.

An owner is the accountable group for decisions and response. Ownership should point to a team or group, not a single person. People change roles. The catalog should support humans, but automation should route through durable groups.

A dependency is a typed relationship between entities. A service can consume another service, publish an API, own a resource, read from a topic, write to a database, or belong to a system. The dependency edge should carry meaning. A generic “related to” link is not enough for automation.

The minimum viable model looks like this:

service:
  id: checkout-api
  name: Checkout API
  system: commerce-platform
  owner: payments-platform
  lifecycle: production
  repository: github.com/example/checkout-api
  dependencies:
    - type: consumes
      target: pricing-api
    - type: writes
      target: checkout-orders-db
    - type: publishes
      target: checkout-events
resources:
  - id: checkout-orders-db
    type: postgres
    owner: payments-platform

This is intentionally boring. Boring is good. A catalog schema should make the common workflows reliable before it tries to model every architectural nuance.

In Practice

Context: Spotify’s Backstage project documents a catalog model built around entities such as Component, System, API, Resource, Group, and User. The documented pattern is that software ownership and relationships are first-class catalog data, not page decoration. See the Backstage system model and descriptor format in the public documentation: Backstage software catalog.

Action: Use a similar separation of concerns. Model services as components, systems as product boundaries, resources as infrastructure dependencies, and groups as owners. Keep relationships explicit in the entity graph instead of hiding them in prose fields.

Result: Automation can query the graph. A CI policy can ask whether a production service has an owner. An incident workflow can follow a service to its owning group. A migration tool can find services that consume a deprecated API. A compliance workflow can identify production resources without reverse-engineering cloud tags.

Learning: The catalog becomes useful when it answers operational questions directly. The documented Backstage pattern is not “create a portal.” The deeper pattern is “define software entities and relationships clearly enough that many tools can share them.”

Context: Kubernetes documents ownerReferences as a mechanism for connecting dependent objects to owning objects, which enables garbage collection and lifecycle behavior. That is a narrower runtime model than a service catalog, but the architectural lesson is relevant: ownership edges have operational consequences. See the Kubernetes documentation on owners and dependents.

Action: Treat ownership and dependency fields as control data. Validate them. Require stable identifiers. Reject catalog entries that point to nonexistent owners or ambiguous resources. Do not let free text become the source of truth for dependency direction.

Result: The catalog can support lifecycle automation because relationships are machine-readable. Deleting, migrating, paging, reviewing, and reporting all become graph operations rather than search exercises.

Learning: A service catalog should borrow the rigor of runtime control planes even though it operates at a higher architectural level. Loose metadata produces loose automation.

Where It Breaks

Failure modeWhy it happensMitigation
Repository equals serviceMonorepos, shared libraries, and multi-service repos break the assumptionModel repository as an attribute or relation, not the service identity
Owner equals individualPeople move faster than systemsRoute ownership through groups, then map people to groups
Resource tags become catalog truthCloud tags are inconsistent across accounts and providersIngest tags as signals, then reconcile into catalog resources
Dependencies are inferred only from trafficRuntime calls miss batch jobs, queues, and planned architectureCombine declared dependencies with observed telemetry
Catalog entries go staleManual updates lose to delivery pressureValidate catalog metadata in CI and sync from source systems
Graph becomes too genericEvery edge becomes “depends on”Use typed relationships with clear semantics
Platform team owns the catalog aloneCentral teams cannot know every service boundaryMake teams own their entries and make the platform own schema quality

The hardest tradeoff is declared versus discovered truth.

Declared metadata is intentional. It captures what a team believes the architecture should be. Discovered metadata is empirical. It captures what systems are actually doing. A serious catalog needs both.

Declared ownership should usually win. Observed traffic should not silently reassign accountability. But discovered dependencies should create review signals. If telemetry shows checkout calling pricing and the catalog does not, that is not an automatic correction; it is a drift finding.

The same rule applies to resources. Terraform state, Kubernetes objects, cloud tags, and observability data can all propose resources. The catalog should reconcile them into stable entities that have owners and relationships.

What to Do Next

Problem: Your platform workflows probably rely on fragmented ownership data across CI, cloud, incident, and observability tools.

Solution: Build the service catalog as a typed graph with separate entities for services, systems, resources, owners, and dependencies.

Proof: Start with three automation queries: “who owns this production service?”, “what resources does it depend on?”, and “what services consume this API?” If the catalog cannot answer those without human interpretation, the model is not ready.

Action: Define the schema first, then require catalog metadata in CI for every production service. Keep the first version small: service ID, system, owner, lifecycle, repository, resources, and typed dependencies. Expand only when a real automation workflow needs more structure.