Per-App Postgres on Kubernetes Changes the Failure Boundary
Per-application PostgreSQL does not make databases easier to operate; it makes the failure boundary smaller and the operating contract larger. The trade is worth considering only when the platform can prove that every declared database can fail over, rotate credentials, archive WAL, restore into a clean namespace, and survive Kubernetes maintenance without relying on tribal memory.
Situation
The old platform default was a shared managed PostgreSQL cluster with many application databases. It is efficient, familiar, and often the right answer. It also couples teams through change windows, noisy neighbors, backup policy, major-version lifecycle, and shared operational risk.
The newer pattern is one PostgreSQL cluster per application, declared in Git and reconciled by a Kubernetes operator such as CloudNativePG. That changes what the platform owns. The platform is no longer only offering “a database”; it is offering a repeatable database lifecycle.
| Default model | Alternative model | What changes |
|---|---|---|
| One shared managed PostgreSQL cluster, many databases | One CloudNativePG cluster per application | Failure moves from shared infrastructure to per-service blast radius |
| Central database administrator controls change windows | GitOps declares database intent per service | Review moves into pull requests, admission policy, and runbooks |
| Backups and upgrades handled at the shared cluster level | Backups and upgrades handled per cluster | More isolation, more fleet operations |
| Credentials and connectivity are centrally managed | Secrets are synchronized into each namespace | Rotation becomes an end-to-end workflow, not a secret-store update |
| Database operations are concentrated in a few large systems | Database operations are repeated across many smaller systems | Templates, policy, alerts, and restore drills become the product |
CloudNativePG makes this viable because PostgreSQL becomes a Kubernetes custom resource. Argo CD can reconcile the database intent from Git. External Secrets Operator can pull credentials from Azure Key Vault or another external store into Kubernetes Secrets. Kustomize overlays can keep environment differences explicit.
That is a strong architecture. It is not managed-database simplicity with YAML in front of it.
The Problem
The operator can create the cluster. That is the least interesting part.
The production question is whether the database survives the ordinary failures: node drains, bad migrations, storage latency, broken WAL archiving, stale credentials, object-store access errors, version drift, and emergency changes made while GitOps is still reconciling the old state.
| Failure point | What breaks | Why it matters |
|---|---|---|
| Shared cluster migrations | One application’s migration can saturate I/O, bloat catalogs, or hold locks visible to unrelated tenants | Per-database isolation inside one PostgreSQL instance is not operational isolation |
| GitOps self-healing | Argo CD can reapply the desired state after manual emergency changes when selfHeal: true is enabled | Incident response needs a documented reconciliation pause; Argo CD retries self-heal after a default 5 second timeout when configured that way (Argo CD docs) |
| Backup configuration | WAL archives exist, but the physical base backup is missing, stale, or unrecoverable | CloudNativePG’s docs warn that a WAL archive alone is not a restore strategy (CloudNativePG backup docs) |
| Kubernetes storage | PostgreSQL restarts cleanly, but the StorageClass has poor latency, weak snapshot behavior, or unsafe reclaim defaults | A database operator cannot paper over unreliable persistent volume semantics |
| Secret rotation | External Secrets updates a Kubernetes Secret, but PostgreSQL roles and application connection pools keep using old credentials | Secret synchronization is not end-to-end credential rotation |
| Version drift | A manifest copied from an older CloudNativePG example keeps working until the operator lifecycle changes | Starting with CloudNativePG 1.26, backup and recovery capabilities are moving toward CNPG-I plugins, so backup templates need version review (CloudNativePG backup docs) |
The right question is not “can Kubernetes run PostgreSQL?” It can. The better question is: what operational boundary are you buying, and what repeated work are you accepting for every application database?
Architecture Problem
The shared database model and the per-application database model solve different coordination problems. In the shared model, operational consistency is achieved at the cost of coupling. In the per-application model, coupling is removed at the cost of operational repetition.
The architectural problem is not technical feasibility. Kubernetes can schedule PostgreSQL pods. CloudNativePG can declare a cluster as a custom resource. Argo CD can reconcile it from Git. External Secrets Operator can synchronize credentials into namespaces. These mechanisms are documented and widely deployed.
The actual architectural problem is: which operational concerns can be automated once at the platform layer, and which must be repeated per database — and is the platform mature enough to absorb the repetition safely?
The failure mode of the shared model is coupling: one application’s migration, bloat, or connection saturation affects every tenant of the cluster. The failure mode of the per-application model is multiplication: every new database adds backup monitoring, restore verification, credential rotation, upgrade planning, and failover testing. If these are not templated, tested, and owned by platform tooling, the per-application model exchanges shared risk for invisible risk.
Design Options
Three options are in common use, and each distributes risk and work differently.
| Option | Description | Coupling risk | Multiplication risk | Recommended for |
|---|---|---|---|---|
| Shared managed cluster | One cloud-managed PostgreSQL cluster hosts many application databases; DBA team or cloud provider owns operations | High — shared change windows, noisy neighbors, shared version lifecycle | Low — operations are centralized | Teams early in database operational maturity; stable workloads without strict isolation requirements |
| Per-app PostgreSQL, manual management | Each application gets a dedicated cloud-managed database instance; teams manage their own backups, creds, and versions | Low — isolated failure boundary | High — no shared templates, policy, or tooling | Teams that need isolation but cannot invest in a Kubernetes-native platform |
| Per-app PostgreSQL via operator (CloudNativePG + GitOps) | Kubernetes operator reconciles PostgreSQL clusters from Git; external secrets, backups, monitoring, and failover are declared resources | Low — each application cluster is independent | Medium — operator and templates absorb repetition, but restore drills and upgrade testing must still run per cluster | Teams with mature Kubernetes platform capability and willingness to own the database lifecycle |
Option A should remain the default until coupling failure modes are actively limiting teams. The argument for per-app databases should be made from incident reports and blocking dependencies, not from preference for patterns.
Option B increases operational isolation without a shared template layer. Teams that choose this option often discover that they have recreated the shared-cluster problem in a distributed form: many databases with inconsistent backup policies, no shared restore testing, and no centralized visibility into credential expiry or disk saturation.
Option C is the strongest option when the platform investment has been made. CloudNativePG provides a consistent operator lifecycle, standardized service semantics, and Prometheus integration. GitOps provides audit history, review gates, and reconciliation. External Secrets provides credentialed automation. The platform team owns the templates, admission policy, and restore drill cadence. Application teams declare their database intent and trust the platform to handle the lifecycle correctly.
Tradeoff Matrix
| Dimension | Shared managed cluster | Per-app managed instances | Per-app operator (CloudNativePG) |
|---|---|---|---|
| Failure blast radius | Shared across all tenants | Per application | Per application |
| Noisy neighbor risk | High | None | None |
| Operational repetition | Low | High | Medium — templates absorb most repetition |
| Backup and restore | Centralized, consistent | Per-team, inconsistent without tooling | Per-cluster, consistent if platform owns templates |
| Credential rotation | Central secret store | Per-instance manual or scripted | External Secrets + per-cluster runbook |
| Version upgrades | Scheduled at cluster level | Per-instance, team-owned | Per-cluster, GitOps-managed |
| GitOps compatibility | External to database | External to database | Native — cluster is a Kubernetes custom resource |
| Restore drill burden | One drill for shared cluster | One drill per instance | One drill per cluster tier (production, staging) |
| Platform investment | Low | Low | High — operator lifecycle, policy, monitoring, templates |
Core Concept: Per-App PostgreSQL as a Declared Failure Boundary
A per-application PostgreSQL cluster works when the platform treats the database manifest as an operating contract, not a deployment snippet.
flowchart TD
Dev[developer commit] --> Git[Git repository — apps and databases]
Git --> Argo[Argo CD — reconcile desired state]
Argo --> App[application namespace]
Argo --> CNPGCluster[CloudNativePG Cluster resource]
KeyVault[external secret store] --> ESO[External Secrets Operator]
ESO --> K8sSecret[Kubernetes Secret]
K8sSecret --> App
K8sSecret --> CNPGCluster
CNPG[CloudNativePG operator] --> Primary[PostgreSQL primary]
CNPG --> ReplicaA[PostgreSQL replica]
CNPG --> ReplicaB[PostgreSQL replica]
App --> RWService[cluster rw service]
RWService --> Primary
Primary --> WAL[WAL archive in object storage]
ReplicaA --> WAL
ReplicaB --> WAL
Backup[scheduled base backup] --> ObjectStore[object storage recovery boundary]
CloudNativePG creates service endpoints for each cluster: rw points to the current primary, ro points to replicas when available, and r can point to any instance. The rw service is essential and cannot be disabled because CloudNativePG relies on it for PostgreSQL replication behavior (CloudNativePG service docs). Application write traffic should use the generated *-rw service unless there is a deliberately tested routing layer in front of it.
A production-grade manifest should look less like a tutorial and more like a contract:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: linkding-db-prod
labels:
app.kubernetes.io/name: linkding
platform.example.com/owner: bookmarks
platform.example.com/tier: production
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.4
storage:
size: 100Gi
storageClass: premium-rwo
resources:
requests:
cpu: "500m"
memory: 2Gi
limits:
memory: 4Gi
monitoring:
enablePodMonitor: true
bootstrap:
initdb:
database: linkding
owner: linkding
secret:
name: linkding-db-owner
backup:
barmanObjectStore:
destinationPath: https://example.blob.core.windows.net/postgres/linkding
azureCredentials:
storageAccount:
name: linkding-backup-creds
key: storage-account
storageSasToken:
name: linkding-backup-creds
key: sas-token
wal:
compression: gzip
data:
compression: gzip
retentionPolicy: 14d
The contract is not complete until it has tests.
- Split day-0 infrastructure from day-2 database intent.
Install CloudNativePG, External Secrets Operator, Argo CD, monitoring CRDs, admission policy, namespaces, and storage classes through Terraform or another cluster-admin workflow. Application repositories should declare database intent, not own operator installation.
Verification:
kubectl auth can-i create clusters.postgresql.cnpg.io -n linkding-prod
kubectl auth can-i update deployment cloudnative-pg -n cnpg-system
kubectl auth can-i patch storageclass premium-rwo
The expected shape is narrow: application delivery can create its own Cluster resource in its namespace, but cannot modify the operator deployment, cluster-wide secret stores, or storage classes.
- Make policy enforce the minimum contract.
For production clusters, reject manifests that omit ownership labels, resource requests, monitoring, backup configuration, explicit storage class, or a three-instance topology.
A CI or admission rule should fail a manifest like this:
spec:
instances: 1
storage:
size: 5Gi
The exact policy engine is less important than the invariant. Kyverno, OPA Gatekeeper, Conftest, or a custom CI check can all work. The point is to stop “temporary” database YAML from becoming production state.
- Route applications through the CloudNativePG read-write service.
Do not hardcode pod names. Do not point applications at ordinal 0. Do not teach application teams that the first pod is the primary. In a failover, the application needs the service abstraction to follow the writable instance.
Verification:
kubectl -n linkding-prod get cluster linkding-db-prod \
-o jsonpath='{.status.currentPrimary}{"\n"}'
kubectl -n linkding-prod delete pod "$(kubectl -n linkding-prod get cluster linkding-db-prod \
-o jsonpath='{.status.currentPrimary}')"
kubectl -n linkding-prod wait cluster/linkding-db-prod \
--for=condition=Ready \
--timeout=300s
kubectl -n linkding-prod get cluster linkding-db-prod \
-o jsonpath='{.status.currentPrimary}{"\n"}'
Then verify the application can still write through the same hostname:
create table if not exists platform_failover_probe (
id bigserial primary key,
observed_at timestamptz not null default now()
);
insert into platform_failover_probe default values;
select count(*) from platform_failover_probe;
A changed primary is not enough. The application write must succeed without changing connection strings.
- Prove recovery before calling the platform production-ready.
CloudNativePG can archive WAL to object storage and recover from physical backups. For Barman object-store backups, current CloudNativePG docs say the operator sets archive_timeout to 5min by default, giving a deterministic time-based RPO boundary for low-write workloads (CloudNativePG object-store backup docs). That boundary is meaningful only after restore has been tested.
Verification:
kubectl -n linkding-prod apply -f - <<'YAML'
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
name: linkding-manual-restore-drill
spec:
cluster:
name: linkding-db-prod
YAML
kubectl -n linkding-prod get backup linkding-manual-restore-drill
A restore drill should create a new namespace, restore from object storage, run application migrations against the restored database, and record observed RTO and RPO. The output should be boring enough to put in a runbook:
| Drill field | Recorded value |
|---|---|
| Backup identifier | Exact backup object or CloudNativePG backup name |
| Restore namespace | Isolated namespace name |
| Restore start time | Timestamp |
| Application migration result | Pass or fail |
| Observed RTO | Measured duration |
| Observed RPO | Last committed test row recovered |
| Operator version | CloudNativePG version |
| PostgreSQL image | Exact image tag |
| StorageClass | Exact class |
- Make GitOps incident-aware.
Automated pruning and self-healing are useful until an incident commander needs to patch a live object. Argo CD automated sync does not prune by default; pruning and self-healing are explicit settings (Argo CD docs). Database resources need operational rules around those settings.
Verification:
argocd app set linkding-db-prod --sync-policy none
kubectl -n linkding-prod annotate cluster linkding-db-prod \
incident.example.com/reconciliation-paused="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
# Apply the emergency change, then commit the final desired state back to Git.
argocd app set linkding-db-prod --sync-policy automated --self-heal --auto-prune
argocd app sync linkding-db-prod
The runbook should say who can pause reconciliation, how the change is recorded, and how drift is reconciled afterward.
- Monitor the database fleet, not just one cluster.
CloudNativePG provides predefined metrics and Prometheus integration. A PodMonitor for a cluster can be created by setting .spec.monitoring.enablePodMonitor: true, and CloudNativePG publishes Grafana dashboard material for the operator and clusters (CloudNativePG monitoring docs, Grafana dashboard).
Per-application databases multiply alert surfaces. That is acceptable only if ownership is encoded.
Minimum alert classes:
| Alert class | Why it matters |
|---|---|
| Replication lag | Failover safety depends on replicas being current enough for the workload |
| Failed WAL archiving | PITR depends on the archive, not only the running pods |
| Backup age | A configured backup policy can still fail silently |
| Disk saturation | PostgreSQL availability usually fails gradually before it fails completely |
| Failover events | The application may need connection-pool and retry validation after promotion |
| Certificate or secret expiry | A synchronized Secret does not prove clients are using it correctly |
| External Secrets sync errors | The Kubernetes Secret can drift from the external source |
| Object-store errors | Restore readiness depends on credentials, network path, and storage availability |
In Practice
The documented pattern is not “Kubernetes makes databases easy.” The documented pattern is “Kubernetes gives the operator a control plane, and the operator still depends on PostgreSQL, storage, object storage, secrets, and reconciliation semantics behaving correctly.”
The strongest public warning is GitLab’s January 31, 2017 database outage. It was not a Kubernetes incident, and it should not be misrepresented as one. Its relevance is narrower and more useful: GitLab’s public postmortem shows how PostgreSQL HA, replication, snapshots, dumps, and restore procedures can all look plausible until the one day they are needed together.
GitLab reported accidental removal of data from the primary database, replication already propagating the damage, missing pg_dump backups caused by a PostgreSQL client version mismatch, backup failure notifications that were not reaching operators, and a restore path bottlenecked by slow disk transfer from a staging snapshot (GitLab postmortem). The public incident summary also noted that a six-hour-old backup was used and database changes in that window were lost (GitLab incident update).
The lesson for CloudNativePG is not that Kubernetes would have prevented the incident. It would not automatically do that. The lesson is that database resilience is a chain:
flowchart TD
Write[application write] --> WAL[WAL generated]
WAL --> Archive[WAL archived]
Data[database files] --> BaseBackup[physical base backup]
Archive --> Restore[restore procedure]
BaseBackup --> Restore
Restore --> AppCheck[application migration and read write check]
AppCheck --> Evidence[recorded RTO and RPO]
If any link is assumed rather than tested, the platform is carrying hidden risk.
| Evidence type | Public mechanism | Production implication |
|---|---|---|
| GitLab public postmortem | Backup jobs failed because the wrong PostgreSQL client version was used, and failure notifications were not reaching operators (GitLab postmortem) | Backup configuration must be verified by restore tests and alert delivery, not only scheduled jobs |
| GitLab restore behavior | Restore was constrained by the available snapshot and storage transfer path (GitLab postmortem) | RTO depends on data size, object-store throughput, volume performance, and the restore procedure |
| CloudNativePG service behavior | CloudNativePG documents rw, ro, and r services, with rw pointing to the primary and being non-disableable (service docs) | Application failover depends on using the service, not pod identity |
| CloudNativePG backup behavior | CloudNativePG documents WAL archiving, physical base backups, PITR, and warns that WAL alone cannot restore a cluster (backup docs) | Backup success is not restore readiness |
| CloudNativePG object-store behavior | CloudNativePG documents a default archive_timeout of 5min for Barman object-store WAL archiving (object-store backup docs) | Low-write workloads still need explicit RPO measurement and restore validation |
| Argo CD reconciliation | Argo CD documents automated prune, self-heal, sync semantics, and rollback limits under automated sync (auto-sync docs) | Database emergency operations need a GitOps pause and resume procedure |
| External Secrets refresh | External Secrets Operator documents CreatedOnce, Periodic, and OnChange refresh policies; Periodic updates the Kubernetes Secret on refreshInterval (ExternalSecret API docs) | Secret rotation must include application reload and PostgreSQL role behavior |
| Kubernetes disruption behavior | Kubernetes distinguishes voluntary and involuntary disruptions and notes that not all voluntary disruptions are constrained by PodDisruptionBudgets (Kubernetes docs) | Node drain, pod deletion, node loss, and storage failure are separate tests |
I have not run this exact Linkding-style reference deployment at production scale personally. The documented mechanics are still enough to draw the boundary: a three-instance PostgreSQL cluster can fail over correctly at the Kubernetes object level while the user-visible service still fails because the application pinned stale connections, the volume layer stalled, External Secrets rotated a value no process reloaded, WAL archiving failed unnoticed, or Argo CD reverted an emergency patch.
That is why the proof must be operational, not visual. A green Argo CD dashboard proves convergence. It does not prove recoverability. A promoted replica proves one HA path. It does not prove connection-pool behavior, restore speed, backup freshness, or data-loss bounds.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Correlated downtime across replicas | Kubernetes schedules PostgreSQL instances onto nodes sharing the same failure domain | Require topology spread constraints, node affinity, and anti-affinity across zones or node pools |
| False confidence from HA | Primary pod deletion succeeds, but storage-zone failure or object-store outage was never tested | Run separate drills for pod deletion, node drain, node loss, storage latency, and restore from object storage |
| Backup drift across CloudNativePG versions | Templates depend on older barmanObjectStore examples while the operator lifecycle moves toward CNPG-I plugins from 1.26 onward | Pin operator versions, maintain upgrade notes, and test backup plus restore for every operator upgrade |
| GitOps conflicts with emergency repair | selfHeal: true reapplies Git state after manual database-related Kubernetes changes | Document Argo CD suspension, require incident annotations, and reconcile the final state back into Git |
| Secret rotation only updates Kubernetes | External Secrets updates the Secret, but PostgreSQL connections remain open with old credentials | Use explicit rotation runbooks: create new role secret, restart or reload clients, verify new logins, then revoke the old role |
| Read traffic hits the wrong endpoint | Application sends writes to ro or uses r because it appears to work during steady state | Standardize environment variables and policy checks so write paths use only *-rw |
| Cost expands quietly | Every service gets PostgreSQL pods, persistent volumes, backups, metrics, and alerts | Define tiers: production HA, staging reduced HA, ephemeral development, and explicit cost labels |
| Noisy fleet operations | One-off manifests diverge across teams | Generate manifests from reviewed templates and enforce policy with Kyverno, OPA Gatekeeper, or CI checks |
| Restore exceeds incident budget | PITR exists in theory, but base backup size, object-store throughput, and migration replay time were never measured | Record RTO and RPO during scheduled restore drills, then publish them with the service SLO |
| Kubernetes maintenance causes failover churn | Node drains evict database pods without a maintenance strategy | Use PodDisruptionBudgets, maintenance windows, topology constraints, and CloudNativePG-aware drain procedures |
| Backup alerts are too shallow | The backup job exits successfully, but restore would fail because credentials, object paths, or versions drifted | Alert on backup age and WAL archive failures, then run scheduled restore verification into a clean namespace |
| Application retry behavior is untested | PostgreSQL primary changes while clients hold old sessions | Test failover through the real application path, including connection pool settings and transaction retry behavior |
What to Do Next
- Problem: Per-application PostgreSQL reduces blast radius, but multiplies operational surfaces across storage, backup, monitoring, secrets, upgrades, GitOps, and cost.
- Solution: Build a database platform contract around CloudNativePG manifests, admission policy, restore drills, and incident-aware reconciliation.
- Proof: A valid proof creates a cluster from Git, writes test data, kills the primary, confirms application writes through
*-rw, rotates credentials, restores from object storage into a clean namespace, and records observed RTO and RPO. - Action: This week, add CI or admission checks for
instances >= 3, backup configuration, monitoring enabled, resource requests, owner labels, explicit storage class, and no plaintext Secret manifests.
A per-application database is not a smaller managed service. It is a sharper failure boundary. Use it when the platform is prepared to test the edge.