Autovacuum Is a Capacity Problem, Not a Maintenance Task

Autovacuum is not a background chore; it is part of write capacity, and PostgreSQL will collect that debt during peak traffic if the system does not budget for cleanup before the workload arrives.

Situation

PostgreSQL’s multi-version concurrency control, or MVCC, makes reads and writes coexist by leaving old row versions behind after UPDATE and DELETE. VACUUM later removes or marks that dead space reusable, updates planner statistics, maintains visibility maps for index-only scans, and protects the database from transaction ID wraparound, as PostgreSQL’s own routine vacuuming documentation describes: PostgreSQL 17 routine vacuuming.

The operational mistake is treating autovacuum as maintenance instead of capacity. In a write-heavy commerce system, queue processor, billing ledger, workflow engine, or event ingestion service, dead tuples are not an after-hours concern. They are a steady byproduct of throughput.

Default mental model	Production reality
Autovacuum is background maintenance	Autovacuum competes for I/O, workers, locks, and transaction horizon progress
Active connection count explains the incident	Table-level dead tuples, lock waits, and oldest `xmin` explain the incident
One cluster setting fits every table	High-churn tables need per-table settings
Killing autovacuum ends the emergency	Killing autovacuum creates cleanup debt that must be paid back deliberately

The Problem

The common failure is backwards: autovacuum usually does not start as the villain. It becomes visible after the system has already created cleanup debt.

PostgreSQL standard VACUUM can run alongside ordinary SELECT, INSERT, UPDATE, and DELETE, while VACUUM FULL requires an ACCESS EXCLUSIVE lock and rewrites the table. That distinction matters. A normal autovacuum is designed to be cooperative, but it still consumes I/O and takes a SHARE UPDATE EXCLUSIVE lock. If conflicting operations keep interrupting it, if long transactions hold the visibility horizon open, or if the write rate exceeds cleanup capacity, dead tuples accumulate until the application starts paying for them in heap scans, index scans, cache churn, and longer vacuum cycles.

Failure point	What breaks	Why it matters
Long-running transaction or `idle in transaction` session	Dead tuples remain visible to the oldest snapshot and cannot be removed	Autovacuum can run and still fail to reclaim the space operators expect
Default `autovacuum_vacuum_scale_factor = 0.2` on a 200M-row table	Vacuum may wait for tens of millions of obsolete tuples before triggering	The threshold is mathematically sane for small tables and operationally late for hot large tables
Replication slot or stale replica feedback holds `xmin`	Cleanup is pinned behind downstream consumption	Primary database bloat becomes a replication and availability problem, not just local storage waste
Large tables become eligible together	`autovacuum_max_workers` can be occupied by a small number of relations	Smaller hot tables wait behind large scans and latency spreads across unrelated features
Monitoring only `pg_stat_activity` active count	Operators see queueing, not the relation causing cleanup debt	The dashboard points at symptoms while the table-level cause grows

The core question is not “Why did autovacuum run during peak load?” The useful question is: why did the system enter peak load with no table-level cleanup budget, no lock visibility, and no oldest-transaction alarm?

Treat Vacuum as a Capacity Control Plane

The right architecture is a small vacuum control plane: table-level observability, per-table policy, lock and horizon detection, and an operator runbook that distinguishes emergency relief from debt repayment.

flowchart TD
    App[application writes] --> MVCC[MVCC creates old row versions]
    MVCC --> Stats[pg_stat_user_tables dead tuple counters]
    MVCC --> Horizon[oldest xmin and replication horizon]
    Stats --> Dashboard[vacuum health dashboard]
    Horizon --> Dashboard
    Locks[pg_locks and pg_stat_activity] --> Dashboard
    Progress[pg_stat_progress_vacuum] --> Dashboard
    Dashboard --> Policy[per-table autovacuum policy]
    Policy --> Workers[autovacuum workers]
    Workers --> Cleanup[dead tuple cleanup and freeze progress]
    Cleanup --> Capacity[steady write capacity]
    Dashboard --> Runbook[operator runbook]

Build the dashboard around relations, not sessions.

Start with pg_stat_user_tables, pg_class, pg_stat_activity, pg_locks, and pg_stat_progress_vacuum. Active connections are only the smoke. The heat is per relation: n_dead_tup, relation size, last_autovacuum, last_autoanalyze, current vacuum phase, lock wait duration, and the oldest transaction age.

SELECT
    s.schemaname,
    s.relname,
    s.n_live_tup,
    s.n_dead_tup,
    pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
    ROUND((s.n_dead_tup::numeric / NULLIF(s.n_live_tup, 0)) * 100, 2) AS dead_rows_pct,
    s.last_autovacuum,
    s.last_autoanalyze,
    age(now(), s.last_autovacuum) AS last_autovacuum_age
FROM pg_stat_user_tables s
JOIN pg_class c ON c.relname = s.relname
JOIN pg_namespace n ON n.oid = c.relnamespace AND n.nspname = s.schemaname
ORDER BY s.n_dead_tup DESC;

Verification: the top 20 write-heavy tables should have visible dead tuple count, dead tuple ratio, total relation size, last autovacuum age, and last analyze age on one screen.

Add horizon monitoring before tuning cost limits.

Autovacuum cannot remove row versions still visible to an old snapshot. A single abandoned transaction can make vacuum appear “ineffective” even when workers are active. Check for large backend_xmin, old backend_xid, prepared transactions, and replication slots.

SELECT
    pid,
    usename,
    application_name,
    state,
    wait_event_type,
    wait_event,
    age(backend_xmin) AS backend_xmin_age,
    age(backend_xid) AS backend_xid_age,
    age(now(), xact_start) AS transaction_age,
    LEFT(query, 160) AS query_sample
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
   OR backend_xid IS NOT NULL
ORDER BY GREATEST(
    COALESCE(age(backend_xmin), 0),
    COALESCE(age(backend_xid), 0)
) DESC;

Verification: alert when a transaction age crosses a workload-specific threshold, such as 5 minutes for OLTP checkout paths or 30 minutes for internal reporting, before tying the alert to dead tuple growth.

Track vacuum progress by phase.

PostgreSQL exposes pg_stat_progress_vacuum for active vacuum operations, including autovacuum workers. The view reports heap blocks scanned, heap blocks vacuumed, index vacuum count, dead tuple counters, and the current phase; PostgreSQL documents this under progress reporting: VACUUM progress reporting.
```
SELECT
    p.pid,
    a.datname,
    p.relid::regclass AS relation,
    a.query,
    p.phase,
    p.heap_blks_total,
    p.heap_blks_scanned,
    p.heap_blks_vacuumed,
    ROUND(100 * p.heap_blks_scanned::numeric / NULLIF(p.heap_blks_total, 0), 2) AS pct_scanned,
    p.index_vacuum_count,
    p.num_dead_tuples
FROM pg_stat_progress_vacuum p
JOIN pg_stat_activity a USING (pid)
ORDER BY p.pid;
```
Verification: operators should be able to classify an active vacuum as scanning, vacuuming indexes, vacuuming heap, cleaning indexes, truncating heap, or performing final cleanup without reading server logs.
Tune hot tables with absolute thresholds, not ratios alone.

PostgreSQL triggers autovacuum when obsolete tuple count exceeds:
```
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples
```
That formula is documented in the PostgreSQL autovacuum daemon section: autovacuum threshold formula. On a 10M-row orders table, the default 50 + 0.2 * 10000000 means roughly 2,000,050 obsolete tuples before vacuum eligibility. On a hot table updated continuously, that is not a maintenance threshold. It is an incident waiting room with chairs.
```
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold = 50000,
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_analyze_threshold = 50000,
    autovacuum_vacuum_cost_delay = 10
);
```
Verification: after a realistic write-load test, the table should show smaller, more frequent vacuum cycles, stable n_dead_tup, and no sustained increase in p95 query latency during vacuum phases.
Separate emergency termination from recovery.

Terminating an autovacuum worker may reduce immediate pressure if it is contending with production traffic, but it does not remove the dead tuples. It postpones cleanup. Worse, if the worker is running to prevent wraparound, PostgreSQL does not treat it like ordinary background work; autovacuum behavior around wraparound prevention is intentionally harder to interrupt.
```
SELECT
    pid,
    query,
    age(now(), query_start) AS runtime,
    wait_event_type,
    wait_event
FROM pg_stat_activity
WHERE query ILIKE '%autovacuum%';
```
Verification: every termination action must create a follow-up ticket with target relation, observed dead tuples, oldest transaction state, and an explicit manual VACUUM or retuning plan.

In Practice

The documented pattern is not theoretical. GitLab publicly analyzed PostgreSQL autovacuum behavior on GitLab.com and treated it as a production tuning problem backed by stats, logs, and Prometheus data. In their autovacuum considerations issue, they reported autovacuum consuming a high share of read I/O while doing a small amount of block cleanup, then evaluated table-specific behavior and candidate configuration changes: GitLab autovacuum considerations.

The important engineering detail is scale. GitLab called out relations in the hundreds of millions to over a billion tuples, including merge_request_diff_files and merge_request_diff_commits. For those shapes, a global threshold is a blunt instrument. A scale factor that is reasonable for a 500K-row table can be absurd for a 1B-row table, and a threshold tuned for one high-churn table can make quieter tables vacuum too often.

Public evidence	What it shows	Production lesson
GitLab tracked autovacuum and autoanalyze daily counts	Vacuum frequency was measured as an operational signal	Count vacuum cycles per table, not just cluster-wide activity
GitLab compared before and after migration behavior	Configuration changed based on observed workload	Treat autovacuum tuning as capacity testing, not folklore
GitLab inspected `pg_stat_all_table.n_dead_tup` in Prometheus	Dead tuples were tracked over time	Alert on trajectory, not only threshold breach
GitLab selected candidate tables for custom settings	Large relations needed table-specific policy	Per-table storage parameters are normal for serious PostgreSQL operations

This also follows directly from PostgreSQL behavior. UPDATE and DELETE leave old row versions behind under MVCC until vacuum can mark space reusable. Standard vacuum does not generally return space to the operating system; it makes space reusable inside the relation. VACUUM FULL rewrites the table and requires an exclusive lock. That is why waiting until bloat is obvious is expensive: at that point, the fix may require either a long plain vacuum that only stabilizes reuse or a rewrite operation that needs a maintenance window.

The source incident describes the recognizable operational smell: response time spikes, lock waits, autovacuum visible in pg_stat_activity, and operators reaching for termination commands. The deeper diagnosis is that the system had no pre-peak signal for cleanup debt. Once users are checking out, workers are busy, indexes are colder, heap pages are dirty, and autovacuum is behind, every option is ugly. The best time to find a bloated orders table is before the marketing email, not while the payment service is practicing interpretive latency.

A production vacuum dashboard should make five questions answerable in less than a minute:

Question	View or metric	Bad signal
Which tables are accumulating cleanup debt?	`pg_stat_user_tables.n_dead_tup`, relation size	Dead tuples rising faster than vacuum completion
Is vacuum running or stalled?	`pg_stat_progress_vacuum.phase`	Phase unchanged while lock waits or I/O waits climb
What is pinning cleanup?	`pg_stat_activity.backend_xmin`, replication slots	Old snapshot age grows while dead tuples persist
Are workers saturated?	Active autovacuum workers and table queue	Large relations occupy workers for long periods
Is the threshold wrong?	Dead tuples at vacuum start and duration	Vacuum starts only after latency or bloat is visible

Where It Breaks

Failure mode	Trigger	Fix
Dead tuple percentage looks fine while absolute debt is huge	A 1B-row table with 1 percent dead rows still has 10M obsolete tuples	Alert on absolute `n_dead_tup`, dead tuple ratio, and relation size together
Autovacuum runs but bloat does not fall	Long transaction, prepared transaction, stale replica feedback, or replication slot pins the visibility horizon	Monitor `backend_xmin`, `backend_xid`, `pg_prepared_xacts`, and replication slot lag before changing vacuum cost settings
Vacuum becomes too aggressive after lowering scale factor	Hot tables vacuum frequently enough to compete with foreground I/O	Tune `autovacuum_vacuum_cost_delay`, table thresholds, and worker count under load; verify p95 latency during vacuum
`VACUUM FULL` becomes the only visible cleanup option	Plain vacuum can reuse space but cannot compact most table files back to the operating system	Prefer steady plain vacuum; reserve `VACUUM FULL`, `CLUSTER`, or table rewrite for controlled maintenance windows with disk headroom
Partitioned parent has stale planner statistics	Autovacuum processes partitions, but parent-level statistics may not update as expected	Run explicit `ANALYZE` on partitioned parents after load or distribution shifts
Insert-heavy table misses cleanup expectations	PostgreSQL 13 and later include insert-trigger autovacuum settings, but older tuning habits focus only on update and delete churn	Include `autovacuum_vacuum_insert_threshold` and `autovacuum_vacuum_insert_scale_factor` in version-aware reviews
Terminating autovacuum becomes the runbook	Operators kill workers during peak traffic and never repay cleanup debt	Require a follow-up manual vacuum, threshold change, or capacity review for every terminated worker
Managed database hides host-level detail	Amazon RDS, Aurora PostgreSQL, Cloud SQL, or Azure Database for PostgreSQL restrict OS-level inspection	Use SQL-visible signals first: stats views, logs, parameter groups, Performance Insights, and query wait sampling

What to Do Next

Problem: Vacuum incidents happen when write throughput creates cleanup debt faster than PostgreSQL can safely remove it.
Solution: Treat autovacuum as a capacity control plane with table-level metrics, horizon detection, progress visibility, and per-table policy.
Proof: A healthy system shows bounded n_dead_tup, recent last_autovacuum on hot tables, short transaction ages, and vacuum progress that completes without sustained lock waits.
Action: This week, build a dashboard for the top 20 write-heavy tables showing dead tuples, relation size, last autovacuum age, oldest transaction age, lock waiters, and active vacuum phase.

Autovacuum does not need heroics; it needs budget, observability, and the dignity of being treated like production capacity before it collects payment at the worst possible hour.

Situation

The Problem

Treat Vacuum as a Capacity Control Plane

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

When Autovacuum Becomes a Backpressure Signal

Prometheus + Grafana for Database Engineers: Open-Source Monitoring That Actually Works

Cloud Architecture Review Checklist for Database-Backed Applications