Service Lifecycle Workflow: Create, Promote, Deprecate, Archive, Delete

A service lifecycle is not a deployment pipeline. It is the control system that decides when a service is allowed to exist, when it is allowed to receive traffic, when consumers must move away, and when the organization can safely forget it.

Situation

Most platform teams start with service creation because that is where developer friction is most visible. A team wants a new API, worker, data pipeline, or internal tool. The platform provides a template, a repository, a CI workflow, a deployment target, logging, dashboards, and maybe an ownership record.

That solves the first ten minutes.

The harder problem arrives months later. The service has been promoted through environments, registered in discovery, granted secrets, attached to databases, added to dashboards, and depended on by other systems. It now has operational gravity. Creating it was easy because creation is additive. Retiring it is hard because retirement is subtractive.

A mature platform therefore treats lifecycle state as a first-class workflow: create, promote, deprecate, archive, delete. Each transition is explicit, policy checked, observable, and reversible until the final boundary.

The Problem

Many organizations encode lifecycle in scattered places. Repository existence means “created.” A production deployment means “promoted.” A Slack announcement means “deprecated.” Removing the Kubernetes deployment means “deleted.” None of those signals are authoritative.

That ambiguity creates predictable failures.

A service marked deprecated in documentation may still be receiving traffic. A repository may be archived while secrets remain active. A DNS record may point at an empty load balancer. A database may be retained forever because nobody can prove the owning service is gone. CI pipelines may still publish images for systems that cannot be deployed. Incident responders may page the last known owner of a service that was supposedly retired two quarters ago.

The underlying issue is that service lifecycle is often treated as metadata around delivery instead of a state machine governing delivery.

The core question is: how should a platform represent service lifecycle so automation can move fast without deleting the wrong thing?

The Lifecycle Control Plane

The answer is to model lifecycle as a control plane with state, transition rules, and evidence gates. The service catalog is the source of truth for lifecycle state. CI, CD, runtime infrastructure, observability, access control, and documentation consume that state rather than inventing their own.

flowchart TD
  A[request — owner and purpose] --> B[create — repository and catalog entry]
  B --> C[promote — environment readiness]
  C --> D[active — production traffic]
  D --> E[deprecate — consumer migration window]
  E --> F[archive — runtime disabled]
  F --> G[delete — durable cleanup]

  B --> H[evidence — ownership and runbook]
  C --> I[evidence — tests and rollback]
  D --> J[evidence — telemetry and alerts]
  E --> K[evidence — dependency inventory]
  F --> L[evidence — no traffic observed]
  G --> M[evidence — retention satisfied]

  H -->|required before promote| C
  I -->|required before active| D
  K -->|required before archive| F
  L -->|required before delete| G

The important design choice is that lifecycle transitions are not comments or tags. They are guarded operations.

Create should register the service before generating infrastructure. The catalog entry should include owner, purpose, classification, runtime type, data stores, on-call routing, and expected consumers. Repository scaffolding, CI setup, secret namespace creation, and baseline dashboards should be downstream effects of that registration.

Promote should be evidence based. A service should not move from development to staging or production only because a branch was merged. Promotion should require build provenance, passing checks, environment configuration, rollback capability, health checks, and observability. The exact bar can vary by risk tier, but the rule should be explicit.

Deprecate should change the service contract, not just the documentation. Once deprecated, the platform should make new consumers harder or impossible to add, surface warnings in service discovery, require migration guidance, and track remaining traffic. Deprecation is not deletion. It is the period where the platform proves who still depends on the service.

Archive should disable active operation while preserving evidence. Runtime resources may scale to zero. Scheduled jobs may be paused. CI publishing may stop. The repository may become read-only. Logs, dashboards, incidents, release history, and catalog records should remain accessible.

Delete should be the last irreversible step. It removes durable infrastructure, secrets, deployment targets, DNS records, service discovery entries, and retained data only after retention and dependency checks pass. A good delete workflow is intentionally boring because the risky work happened earlier.

In Practice

Context: Kubernetes made object lifecycle explicit through API objects, desired state, controllers, finalizers, and garbage collection. The documented pattern is that deletion is not only removal from storage. Objects can carry finalizers, and controllers complete cleanup before the object disappears.

Action: Apply the same pattern to services. A lifecycle controller can prevent a service from leaving archive while finalizers remain: active traffic, attached secrets, retained datasets, consumer dependencies, open incidents, or compliance holds.

Result: The platform gains a mechanical way to say “not yet.” That is more useful than a wiki checklist because CI and infrastructure automation can enforce it.

Learning: Service deletion needs preconditions. Human approval can be one of them, but approval is not a substitute for observable cleanup evidence.

Context: GitHub repository archiving is a public product pattern: an archived repository becomes read-only while preserving code, issues, pull requests, and history. The documented pattern is not “delete when inactive.” It is “make inactive systems visibly inactive before removal.”

Action: Use an archive state for services with the same semantics. Block new deployments, prevent new dependency registrations, freeze routine configuration changes, and keep operational history available.

Result: Teams can stop accidental resurrection while preserving auditability. Incident responders can still inspect what existed, who owned it, and how it behaved.

Learning: Archive is a lifecycle state with operational meaning. It is not a softer word for delete.

Context: CI systems such as GitHub Actions and deployment platforms commonly separate workflow execution, environment protection, and deployment approval. The documented pattern is that promotion can be gated by environment-specific checks rather than being implied by source control state.

Action: Treat promotion as a transition that consumes CI evidence. The workflow should attach build identity, test results, artifact digest, policy results, and target environment to the lifecycle record.

Result: Production status becomes explainable. The platform can answer which artifact was promoted, by whom, under which checks, and with what rollback path.

Learning: Promotion without provenance is only a deploy button. Lifecycle automation needs an audit trail that survives the pipeline run.

Where It Breaks

Failure mode	Why it happens	Platform response
Catalog drift	Teams update infrastructure without updating lifecycle state	Make lifecycle state the input to automation, not a passive record
Permanent deprecation	Owners mark services deprecated but never migrate consumers	Require migration deadlines, dependency reports, and escalation paths
Unsafe archive	Runtime is disabled before traffic reaches zero	Gate archive on observed traffic absence over a defined window
Zombie services	Deleted services leave secrets, DNS, jobs, or dashboards behind	Use finalizers and cleanup tasks for each external system
Overloaded gates	Every service must satisfy heavyweight production controls	Tier services by risk, data sensitivity, and exposure
Manual exceptions	Emergency work bypasses workflow and never reconciles	Allow breakglass transitions with expiry and mandatory reconciliation

The architecture fails when the lifecycle controller becomes theater. If people can deploy a service that the catalog says is archived, the catalog is not a control plane. If deletion can happen without checking consumers, the workflow is not protecting anything. If every exception is permanent, the model will decay into labels.

What to Do Next

Problem: Service lifecycle is usually inferred from repositories, deployments, and documentation, which leaves ownership, traffic, dependencies, and cleanup scattered across systems.

Solution: Make lifecycle an explicit state machine owned by the platform: create, promote, active, deprecate, archive, delete. Put transition rules in automation and make downstream systems consume lifecycle state.

Proof: Use evidence gates from existing architectural patterns: controller finalizers for cleanup, archive states for read-only preservation, and environment promotion checks for provenance.

Action: Start with one service type. Add catalog state, promotion evidence, deprecation warnings, archive enforcement, and delete finalizers. Then block one unsafe transition at a time until lifecycle state becomes the operational source of truth.

Situation

The Problem

The Lifecycle Control Plane

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Platform Automation Maturity Model: Scripts, Modules, Catalogs, Pipelines, Control Planes

Automation Rollback Playbook: Disable, Revert, Repair State, and Reconcile Reality

DB Team Automation Roadmap: Backups, Patching, Refreshes, Provisioning, and Guardrails