Most CI systems know how to run a pipeline, but they rarely know whether the change is safe for the service that owns the blast radius.

Situation

Engineering organizations have moved from a small number of deployable systems to fleets of services, jobs, data pipelines, internal tools, and infrastructure modules. Each unit has a repository, a deployment path, a runtime footprint, an on-call owner, and some promise to users. The problem is that those facts usually live in different systems.

The service catalog knows ownership and lifecycle metadata. CI knows commits, tests, build artifacts, and release gates. Deployment systems know what reached production. Observability platforms know SLOs, incidents, and error budgets. Security tools know open findings and policy exceptions. Change risk lives across all of them, but the engineer pushing a change usually sees only a narrow CI result.

A catalog-to-CI integration makes the catalog an active participant in delivery. Instead of treating ownership metadata as documentation, the pipeline queries it, enriches runs with service context, and applies different checks based on the system being changed.

The Problem

The common failure mode is not that a test fails silently. It is that a technically correct pipeline approves a change without understanding the operational context.

A low-risk documentation edit, a database migration on a tier-one service, and a deployment to an experimental internal tool may all pass the same CI template. That uniformity looks fair, but it hides real differences in ownership, SLO pressure, production exposure, and recent deployment instability.

The result is a predictable set of operational gaps:

  • Pull requests are reviewed by people near the code, not necessarily the current accountable owners.
  • Deployment history is visible after an incident, but not used before the next risky release.
  • SLO burn is monitored by observability systems, but CI keeps shipping into an already unhealthy service.
  • Change approval rules are hard-coded in YAML, so they drift from the catalog and become another ownership problem.
  • Teams add manual release rituals because the automated path lacks enough context to be trusted.

The question is: how should a platform connect catalog data to CI without turning the catalog into a fragile release orchestrator?

Answer: Policy-Rich CI, Catalog-Led Context

The right architecture keeps CI as the execution engine and the catalog as the source of service context. The catalog should not run builds or deploy software. It should answer questions the pipeline cannot answer reliably on its own: who owns this component, how critical is it, what environments does it deploy to, what SLO applies, and what recent changes have happened?

flowchart TD
  A[developer change — pull request] --> B[CI pipeline — build context]
  B --> C[catalog lookup — service metadata]
  C --> D[ownership policy — reviewers and approvers]
  C --> E[runtime policy — tier and environment]
  C --> F[SLO policy — error budget state]
  C --> G[deployment history — recent change signals]
  D --> H[change risk score — combined decision]
  E --> H
  F --> H
  G --> H
  H --> I[release gate — allow, warn, or block]
  I --> J[deployment system — production rollout]
  J --> K[catalog update — deployed version and timestamp]

This design creates a feedback loop. The catalog informs CI before the release. CI and deployment systems then write back the facts that future risk checks need: deployed version, timestamp, environment, artifact digest, and rollout status.

The key is to keep the integration declarative. The catalog should expose stable metadata and relationships. CI should evaluate policies against that metadata. A policy engine, whether custom or off the shelf, can translate facts into decisions: require owner approval, block deploy during SLO burn, force progressive delivery, or attach a release note to the change record.

In Practice

Context: Spotify created Backstage to give teams a software catalog and a unified developer portal for services, ownership, documentation, and tooling. The documented pattern is not that a catalog replaces delivery systems, but that it gives engineering teams a shared system of record for software components and their relationships. Backstage describes the catalog as a way to model software ownership and metadata across an organization.

Action: A platform team can use that catalog model as the CI entry point. When a pull request modifies a repository, the pipeline resolves the affected component, reads its owner, lifecycle, tier, system, and dependency relationships, and annotates the run. If the component is production-facing and tier one, CI can require approval from the owning group, verify deployment freeze rules, and fetch the latest SLO state before allowing deployment.

Result: The delivery path becomes less dependent on tribal knowledge. The same CI template can behave differently for different services because the decision comes from catalog metadata rather than copied YAML. Ownership changes happen in one place. Risk policy can follow the component even if the repository moves, the team renames itself, or the service migrates to another deployment platform.

Learning: The catalog is most valuable when it becomes operational metadata, not when it becomes a second source of release logic. Keep facts in the catalog. Keep execution in CI and deployment systems. Keep policy evaluation explicit, versioned, and observable.

A second known pattern comes from Google’s Site Reliability Engineering work on SLOs and error budgets. The important architectural idea is that reliability targets should influence release behavior. If a service is burning too much error budget, the organization should reduce risky change until reliability recovers.

Applied to catalog-to-CI integration, the service catalog stores the SLO reference or links the component to the observability object that owns the SLO. CI does not calculate reliability from raw telemetry. It asks for the current SLO state and turns that state into a release decision. A healthy service may continue through the normal path. A service with severe burn may require an override, a smaller rollout, or a deploy block for non-remediation changes.

The DORA research program adds another useful pattern: deployment frequency, lead time, change failure rate, and recovery time are delivery signals, not just reporting metrics. A mature integration can feed deployment events from CI and CD back into the catalog so that each component has recent change context. That history lets the platform distinguish a quiet, stable service from one that has had repeated rollbacks, hotfixes, or failed rollouts in the last few days.

The documented pattern across these examples is consistent: connect delivery decisions to service ownership, production health, and change outcomes. Do not rely on a green build as the only proxy for safety.

Where It Breaks

Failure modeWhy it happensMitigation
Catalog data goes staleTeams update CI files but not ownership metadataMake catalog ownership required for release and sync from identity systems where possible
CI becomes too slowEvery run calls multiple external systemsCache catalog reads, separate pull request checks from deploy gates, and fail soft for non-critical metadata
Policies become opaqueEngineers see a block but not the reasonEmit policy inputs, decision traces, and the exact catalog fields used
Catalog becomes a release orchestratorPlatform teams keep adding workflow behavior to metadataKeep the catalog declarative and run workflows in CI, CD, or a policy engine
SLO gates block urgent fixesA degraded service may need a remediation deploySupport break-glass overrides with owner approval, audit trails, and incident linkage
Risk scores become theaterWeighted scoring hides the real reason for concernPrefer named rules over magic numbers, then use scores only for ranking or warnings

What to Do Next

  • Problem: CI pipelines approve changes with incomplete service context. A green build does not know ownership, SLO pressure, recent rollback history, or production criticality.
  • Solution: Use the service catalog as the context source for CI. Resolve the affected component, fetch ownership and operational metadata, evaluate explicit policies, and write deployment outcomes back to the catalog.
  • Proof: Backstage-style catalogs model ownership and component metadata; SRE error-budget practices connect reliability state to release behavior; DORA metrics show that deployment history and change failure are operational signals.
  • Action: Start with one release gate: owner resolution. Then add deployed-version writeback. After that, connect SLO state and recent deployment history. Keep every gate explainable, versioned, and visible in the CI run.