Autovacuum Is a Capacity Problem, Not a Maintenance Task
Autovacuum is not a background chore; it is part of write capacity, and PostgreSQL will collect that debt during peak traffic if the system does not budget for cleanup before the workload arrives.
Situation
PostgreSQL’s multi-version concurrency control, or MVCC, makes reads and writes coexist by leaving old row versions behind after UPDATE and DELETE. VACUUM later removes or marks that dead space reusable, updates planner statistics, maintains visibility maps for index-only scans, and protects the database from transaction ID wraparound, as PostgreSQL’s own routine vacuuming documentation describes: PostgreSQL 17 routine vacuuming.
The operational mistake is treating autovacuum as maintenance instead of capacity. In a write-heavy commerce system, queue processor, billing ledger, workflow engine, or event ingestion service, dead tuples are not an after-hours concern. They are a steady byproduct of throughput.
| Default mental model | Production reality |
|---|---|
| Autovacuum is background maintenance | Autovacuum competes for I/O, workers, locks, and transaction horizon progress |
| Active connection count explains the incident | Table-level dead tuples, lock waits, and oldest xmin explain the incident |
| One cluster setting fits every table | High-churn tables need per-table settings |
| Killing autovacuum ends the emergency | Killing autovacuum creates cleanup debt that must be paid back deliberately |
The Problem
The common failure is backwards: autovacuum usually does not start as the villain. It becomes visible after the system has already created cleanup debt.
PostgreSQL standard VACUUM can run alongside ordinary SELECT, INSERT, UPDATE, and DELETE, while VACUUM FULL requires an ACCESS EXCLUSIVE lock and rewrites the table. That distinction matters. A normal autovacuum is designed to be cooperative, but it still consumes I/O and takes a SHARE UPDATE EXCLUSIVE lock. If conflicting operations keep interrupting it, if long transactions hold the visibility horizon open, or if the write rate exceeds cleanup capacity, dead tuples accumulate until the application starts paying for them in heap scans, index scans, cache churn, and longer vacuum cycles.
| Failure point | What breaks | Why it matters |
|---|---|---|
Long-running transaction or idle in transaction session | Dead tuples remain visible to the oldest snapshot and cannot be removed | Autovacuum can run and still fail to reclaim the space operators expect |
Default autovacuum_vacuum_scale_factor = 0.2 on a 200M-row table | Vacuum may wait for tens of millions of obsolete tuples before triggering | The threshold is mathematically sane for small tables and operationally late for hot large tables |
Replication slot or stale replica feedback holds xmin | Cleanup is pinned behind downstream consumption | Primary database bloat becomes a replication and availability problem, not just local storage waste |
| Large tables become eligible together | autovacuum_max_workers can be occupied by a small number of relations | Smaller hot tables wait behind large scans and latency spreads across unrelated features |
Monitoring only pg_stat_activity active count | Operators see queueing, not the relation causing cleanup debt | The dashboard points at symptoms while the table-level cause grows |
The core question is not “Why did autovacuum run during peak load?” The useful question is: why did the system enter peak load with no table-level cleanup budget, no lock visibility, and no oldest-transaction alarm?
Treat Vacuum as a Capacity Control Plane
The right architecture is a small vacuum control plane: table-level observability, per-table policy, lock and horizon detection, and an operator runbook that distinguishes emergency relief from debt repayment.
flowchart TD
App[application writes] --> MVCC[MVCC creates old row versions]
MVCC --> Stats[pg_stat_user_tables dead tuple counters]
MVCC --> Horizon[oldest xmin and replication horizon]
Stats --> Dashboard[vacuum health dashboard]
Horizon --> Dashboard
Locks[pg_locks and pg_stat_activity] --> Dashboard
Progress[pg_stat_progress_vacuum] --> Dashboard
Dashboard --> Policy[per-table autovacuum policy]
Policy --> Workers[autovacuum workers]
Workers --> Cleanup[dead tuple cleanup and freeze progress]
Cleanup --> Capacity[steady write capacity]
Dashboard --> Runbook[operator runbook]
-
Build the dashboard around relations, not sessions.
Start with
pg_stat_user_tables,pg_class,pg_stat_activity,pg_locks, andpg_stat_progress_vacuum. Active connections are only the smoke. The heat is per relation:n_dead_tup, relation size,last_autovacuum,last_autoanalyze, current vacuum phase, lock wait duration, and the oldest transaction age.SELECT s.schemaname, s.relname, s.n_live_tup, s.n_dead_tup, pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size, ROUND((s.n_dead_tup::numeric / NULLIF(s.n_live_tup, 0)) * 100, 2) AS dead_rows_pct, s.last_autovacuum, s.last_autoanalyze, age(now(), s.last_autovacuum) AS last_autovacuum_age FROM pg_stat_user_tables s JOIN pg_class c ON c.relname = s.relname JOIN pg_namespace n ON n.oid = c.relnamespace AND n.nspname = s.schemaname ORDER BY s.n_dead_tup DESC;Verification: the top 20 write-heavy tables should have visible dead tuple count, dead tuple ratio, total relation size, last autovacuum age, and last analyze age on one screen.
-
Add horizon monitoring before tuning cost limits.
Autovacuum cannot remove row versions still visible to an old snapshot. A single abandoned transaction can make vacuum appear “ineffective” even when workers are active. Check for large
backend_xmin, oldbackend_xid, prepared transactions, and replication slots.SELECT pid, usename, application_name, state, wait_event_type, wait_event, age(backend_xmin) AS backend_xmin_age, age(backend_xid) AS backend_xid_age, age(now(), xact_start) AS transaction_age, LEFT(query, 160) AS query_sample FROM pg_stat_activity WHERE backend_xmin IS NOT NULL OR backend_xid IS NOT NULL ORDER BY GREATEST( COALESCE(age(backend_xmin), 0), COALESCE(age(backend_xid), 0) ) DESC;Verification: alert when a transaction age crosses a workload-specific threshold, such as 5 minutes for OLTP checkout paths or 30 minutes for internal reporting, before tying the alert to dead tuple growth.
-
Track vacuum progress by phase.
PostgreSQL exposes
pg_stat_progress_vacuumfor active vacuum operations, including autovacuum workers. The view reports heap blocks scanned, heap blocks vacuumed, index vacuum count, dead tuple counters, and the current phase; PostgreSQL documents this under progress reporting: VACUUM progress reporting.SELECT p.pid, a.datname, p.relid::regclass AS relation, a.query, p.phase, p.heap_blks_total, p.heap_blks_scanned, p.heap_blks_vacuumed, ROUND(100 * p.heap_blks_scanned::numeric / NULLIF(p.heap_blks_total, 0), 2) AS pct_scanned, p.index_vacuum_count, p.num_dead_tuples FROM pg_stat_progress_vacuum p JOIN pg_stat_activity a USING (pid) ORDER BY p.pid;Verification: operators should be able to classify an active vacuum as scanning, vacuuming indexes, vacuuming heap, cleaning indexes, truncating heap, or performing final cleanup without reading server logs.
-
Tune hot tables with absolute thresholds, not ratios alone.
PostgreSQL triggers autovacuum when obsolete tuple count exceeds:
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuplesThat formula is documented in the PostgreSQL autovacuum daemon section: autovacuum threshold formula. On a 10M-row
orderstable, the default50 + 0.2 * 10000000means roughly 2,000,050 obsolete tuples before vacuum eligibility. On a hot table updated continuously, that is not a maintenance threshold. It is an incident waiting room with chairs.ALTER TABLE orders SET ( autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 50000, autovacuum_analyze_scale_factor = 0.02, autovacuum_analyze_threshold = 50000, autovacuum_vacuum_cost_delay = 10 );Verification: after a realistic write-load test, the table should show smaller, more frequent vacuum cycles, stable
n_dead_tup, and no sustained increase in p95 query latency during vacuum phases. -
Separate emergency termination from recovery.
Terminating an autovacuum worker may reduce immediate pressure if it is contending with production traffic, but it does not remove the dead tuples. It postpones cleanup. Worse, if the worker is running to prevent wraparound, PostgreSQL does not treat it like ordinary background work; autovacuum behavior around wraparound prevention is intentionally harder to interrupt.
SELECT pid, query, age(now(), query_start) AS runtime, wait_event_type, wait_event FROM pg_stat_activity WHERE query ILIKE '%autovacuum%';Verification: every termination action must create a follow-up ticket with target relation, observed dead tuples, oldest transaction state, and an explicit manual
VACUUMor retuning plan.
In Practice
The documented pattern is not theoretical. GitLab publicly analyzed PostgreSQL autovacuum behavior on GitLab.com and treated it as a production tuning problem backed by stats, logs, and Prometheus data. In their autovacuum considerations issue, they reported autovacuum consuming a high share of read I/O while doing a small amount of block cleanup, then evaluated table-specific behavior and candidate configuration changes: GitLab autovacuum considerations.
The important engineering detail is scale. GitLab called out relations in the hundreds of millions to over a billion tuples, including merge_request_diff_files and merge_request_diff_commits. For those shapes, a global threshold is a blunt instrument. A scale factor that is reasonable for a 500K-row table can be absurd for a 1B-row table, and a threshold tuned for one high-churn table can make quieter tables vacuum too often.
| Public evidence | What it shows | Production lesson |
|---|---|---|
| GitLab tracked autovacuum and autoanalyze daily counts | Vacuum frequency was measured as an operational signal | Count vacuum cycles per table, not just cluster-wide activity |
| GitLab compared before and after migration behavior | Configuration changed based on observed workload | Treat autovacuum tuning as capacity testing, not folklore |
GitLab inspected pg_stat_all_table.n_dead_tup in Prometheus | Dead tuples were tracked over time | Alert on trajectory, not only threshold breach |
| GitLab selected candidate tables for custom settings | Large relations needed table-specific policy | Per-table storage parameters are normal for serious PostgreSQL operations |
This also follows directly from PostgreSQL behavior. UPDATE and DELETE leave old row versions behind under MVCC until vacuum can mark space reusable. Standard vacuum does not generally return space to the operating system; it makes space reusable inside the relation. VACUUM FULL rewrites the table and requires an exclusive lock. That is why waiting until bloat is obvious is expensive: at that point, the fix may require either a long plain vacuum that only stabilizes reuse or a rewrite operation that needs a maintenance window.
The source incident describes the recognizable operational smell: response time spikes, lock waits, autovacuum visible in pg_stat_activity, and operators reaching for termination commands. The deeper diagnosis is that the system had no pre-peak signal for cleanup debt. Once users are checking out, workers are busy, indexes are colder, heap pages are dirty, and autovacuum is behind, every option is ugly. The best time to find a bloated orders table is before the marketing email, not while the payment service is practicing interpretive latency.
A production vacuum dashboard should make five questions answerable in less than a minute:
| Question | View or metric | Bad signal |
|---|---|---|
| Which tables are accumulating cleanup debt? | pg_stat_user_tables.n_dead_tup, relation size | Dead tuples rising faster than vacuum completion |
| Is vacuum running or stalled? | pg_stat_progress_vacuum.phase | Phase unchanged while lock waits or I/O waits climb |
| What is pinning cleanup? | pg_stat_activity.backend_xmin, replication slots | Old snapshot age grows while dead tuples persist |
| Are workers saturated? | Active autovacuum workers and table queue | Large relations occupy workers for long periods |
| Is the threshold wrong? | Dead tuples at vacuum start and duration | Vacuum starts only after latency or bloat is visible |
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Dead tuple percentage looks fine while absolute debt is huge | A 1B-row table with 1 percent dead rows still has 10M obsolete tuples | Alert on absolute n_dead_tup, dead tuple ratio, and relation size together |
| Autovacuum runs but bloat does not fall | Long transaction, prepared transaction, stale replica feedback, or replication slot pins the visibility horizon | Monitor backend_xmin, backend_xid, pg_prepared_xacts, and replication slot lag before changing vacuum cost settings |
| Vacuum becomes too aggressive after lowering scale factor | Hot tables vacuum frequently enough to compete with foreground I/O | Tune autovacuum_vacuum_cost_delay, table thresholds, and worker count under load; verify p95 latency during vacuum |
VACUUM FULL becomes the only visible cleanup option | Plain vacuum can reuse space but cannot compact most table files back to the operating system | Prefer steady plain vacuum; reserve VACUUM FULL, CLUSTER, or table rewrite for controlled maintenance windows with disk headroom |
| Partitioned parent has stale planner statistics | Autovacuum processes partitions, but parent-level statistics may not update as expected | Run explicit ANALYZE on partitioned parents after load or distribution shifts |
| Insert-heavy table misses cleanup expectations | PostgreSQL 13 and later include insert-trigger autovacuum settings, but older tuning habits focus only on update and delete churn | Include autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor in version-aware reviews |
| Terminating autovacuum becomes the runbook | Operators kill workers during peak traffic and never repay cleanup debt | Require a follow-up manual vacuum, threshold change, or capacity review for every terminated worker |
| Managed database hides host-level detail | Amazon RDS, Aurora PostgreSQL, Cloud SQL, or Azure Database for PostgreSQL restrict OS-level inspection | Use SQL-visible signals first: stats views, logs, parameter groups, Performance Insights, and query wait sampling |
What to Do Next
- Problem: Vacuum incidents happen when write throughput creates cleanup debt faster than PostgreSQL can safely remove it.
- Solution: Treat autovacuum as a capacity control plane with table-level metrics, horizon detection, progress visibility, and per-table policy.
- Proof: A healthy system shows bounded
n_dead_tup, recentlast_autovacuumon hot tables, short transaction ages, and vacuum progress that completes without sustained lock waits. - Action: This week, build a dashboard for the top 20 write-heavy tables showing dead tuples, relation size, last autovacuum age, oldest transaction age, lock waiters, and active vacuum phase.
Autovacuum does not need heroics; it needs budget, observability, and the dignity of being treated like production capacity before it collects payment at the worst possible hour.