Feature Flags vs Deployments: Separating Release From Risk
A deployment moves code into production; a release changes who can be hurt by that code.
Situation
Modern engineering organizations deploy more often than they announce features. The production environment is no longer a ceremonial destination at the end of a release train. It is where compatibility is proven, latency is measured, dependencies are exercised, and operational confidence is built.
That shift changes the job of the platform team. The platform is not merely a build runner that turns commits into containers. It is a risk control system. It decides how artifacts move, how quickly blast radius expands, which health signals pause the rollout, who can change runtime behavior, and how stale release controls are retired.
Feature flags entered this picture because deployment and release are different control loops. Deployment answers: is this version of the software safely installed? Release answers: should this behavior be visible to this actor, in this environment, right now?
Those loops move at different speeds. A Kubernetes deployment may take minutes. A product release may take days. A kill switch may need to act in seconds. Treating all three as the same operation turns every rollout into an expensive, high-pressure redeploy.
The Problem
The common failure is using deployments as the only release mechanism. A team merges a change, builds an artifact, deploys it through staging, promotes it to production, and assumes the release is complete because the pipeline is green. That works until the defect is not a crash.
Some failures only appear under production traffic shape: a cache key with unexpected cardinality, an authorization edge case in one tenant, a search index path that melts under skew, or a user interface flow that drives support volume. Rolling back the deployment may be too blunt. The artifact might contain ten unrelated fixes, a database migration that must not be reversed, or backward-compatible API changes already consumed by another service.
Feature flags solve part of this, but they introduce their own failure mode: invisible production branches that never die. A flag without ownership, expiry, observability, and cleanup is just deferred complexity. It can double the test matrix, confuse incident response, and turn code search into archaeology.
So the architecture question is not “should we use feature flags?” It is: how do we separate deployment from release without creating a second, ungoverned deployment system?
Answer — A Release Control Plane
The answer is a release control plane: a small, explicit platform layer that treats deployment artifacts, flag state, rollout policy, and observability as separate but connected objects.
flowchart TD
A[commit merged — behavior hidden] --> B[build artifact — immutable version]
B --> C[deployment pipeline — place code safely]
C --> D[production runtime — flag evaluates request]
D --> E{release decision}
E -->|off by default| F[dark code path — no customer exposure]
E -->|targeted cohort| G[limited exposure — monitored blast radius]
G --> H[observability guardrails — metrics and errors]
H -->|healthy| I[progressive rollout — larger audience]
H -->|unhealthy| J[disable flag — stop exposure]
J --> D
I --> K[remove flag — delete dead branch]
In this model, the deployment pipeline owns artifact safety. It builds once, verifies once, promotes immutably, and rolls back versions when the installed software is bad. The flag system owns exposure safety. It decides whether a behavior is dark, internal-only, tenant-targeted, percentage-based, or globally enabled.
The important design point is that flags are not merely if statements. They are operational resources. They need metadata: owner, purpose, creation date, expiry date, default state, allowed environments, rollout plan, linked dashboard, and cleanup issue. Without that metadata, the platform cannot distinguish a short-lived release toggle from a permanent permission model or an experiment.
The platform should also distinguish flag types:
| Flag type | Purpose | Expected lifetime | Failure response |
|---|---|---|---|
| Release flag | Hide incomplete or risky behavior | Days or weeks | Disable behavior |
| Ops flag | Reduce load or bypass a dependency path | As short as possible | Disable or degrade |
| Experiment flag | Compare behavior across cohorts | Experiment window | Stop experiment |
| Permission flag | Entitlement or plan boundary | Long-lived | Treat as product logic |
| Migration flag | Coordinate expand and contract rollout | Until migration completes | Pause migration |
That classification matters because the platform policy should be different for each type. A release flag should fail a hygiene check if it survives too long. A permission flag should not be deleted just because it is old. An ops flag should have incident documentation. An experiment flag should have cohort stability and analysis ownership.
In Practice
Context: Martin Fowler’s feature toggle taxonomy documents release toggles as a way of separating feature release from code deployment, and it also warns that release toggles should be transitional rather than permanent architecture. The documented pattern is that flags buy decoupling, but only if teams retire them after the release decision is complete. Source: Feature Toggles.
Action: Use flags for runtime exposure, not as a substitute for deployment discipline. The deployment artifact should still be tested, promoted, versioned, and rollback-capable. Kubernetes documents rolling deployments and rollout undo as deployment-level controls; those controls remain necessary even when every risky feature is hidden behind a flag. Source: Kubernetes rolling updates.
Result: The documented pattern is two independent rollback paths. If the container image is bad, roll back the deployment. If the code is installed correctly but the new behavior is unsafe for a cohort, disable the flag. This reduces the number of incidents where the only available response is a full redeploy.
Learning: Feature flag configuration is production configuration. Amazon’s Builders’ Library describes safe deployment pipelines with staged rollout, monitoring, bake time, and automatic rollback; it also notes that configuration and feature flag changes need the same kind of safety thinking because a bad configuration change can affect production like a bad code change. Source: Automating safe, hands-off deployments.
Context: GitLab’s public documentation describes feature flags as a way to deploy features early and roll them out incrementally, with states that start disabled, become enabled by default, and are later removed. GitLab’s development documentation also describes short-lived de-risking flags with a maximum lifespan and rollout issue. Sources: GitLab administration feature flags and GitLab development feature flags.
Action: Encode those practices into platform automation. Require a flag owner. Require a rollout issue. Require an expiry date for release flags. Require dashboards before percentage rollout. Add CI checks that fail when expired flags remain in code. Add a weekly report of stale flags grouped by owning team.
Result: The documented pattern becomes enforceable workflow instead of tribal memory. Engineers still move quickly, but the system makes hidden branches visible and forces cleanup before release controls become permanent debt.
Learning: The best flag platform is boring. It does not make every engineer learn a new release philosophy. It gives them a predictable way to ship dark, expose narrowly, watch health, expand gradually, stop quickly, and delete the branch when the release is done.
Where It Breaks
| Failure mode | Why it happens | Mitigation |
|---|---|---|
| Flag sprawl | Flags are easy to create and hard to remove | Expiry dates, owners, cleanup checks |
| Untested combinations | Multiple flags create behavior permutations | Test canonical states, not every permutation |
| Slow flag evaluation | Runtime checks call remote services too often | Local caching, streaming updates, sane defaults |
| Unsafe defaults | Missing config enables risky behavior | Default closed for release and ops flags |
| Incident confusion | On-call cannot tell which behavior is active | Flag audit log and dashboard links |
| Data migration coupling | New behavior depends on irreversible schema changes | Expand and contract migrations with separate flags |
| Product policy leakage | Permission logic is mixed with release toggles | Separate entitlement flags from release flags |
| Stale dark code | Disabled branches remain after launch | Automated stale flag reporting and deletion work |
What to Do Next
- Problem: Audit the last ten production incidents and identify which ones required redeploying code when a runtime exposure control would have been safer.
- Solution: Define three first-class objects in the platform: deployment artifact, feature flag, and rollout policy. Give each object ownership, history, and rollback semantics.
- Proof: Require every release flag to link to health metrics, an owner, a rollout plan, and a cleanup issue before it can reach production.
- Action: Start with one service. Add flag metadata, progressive rollout, audit logging, expiry checks, and stale-flag CI enforcement before scaling the pattern across the organization.