Autovacuum is not a background chore; it is part of write capacity, and PostgreSQL will collect that debt during peak traffic if the system does not budget for cleanup before the workload arrives.

Situation

PostgreSQL’s multi-version concurrency control, or MVCC, makes reads and writes coexist by leaving old row versions behind after UPDATE and DELETE. VACUUM later removes or marks that dead space reusable, updates planner statistics, maintains visibility maps for index-only scans, and protects the database from transaction ID wraparound, as PostgreSQL’s own routine vacuuming documentation describes: PostgreSQL 17 routine vacuuming.

The operational mistake is treating autovacuum as maintenance instead of capacity. In a write-heavy commerce system, queue processor, billing ledger, workflow engine, or event ingestion service, dead tuples are not an after-hours concern. They are a steady byproduct of throughput.

Default mental modelProduction reality
Autovacuum is background maintenanceAutovacuum competes for I/O, workers, locks, and transaction horizon progress
Active connection count explains the incidentTable-level dead tuples, lock waits, and oldest xmin explain the incident
One cluster setting fits every tableHigh-churn tables need per-table settings
Killing autovacuum ends the emergencyKilling autovacuum creates cleanup debt that must be paid back deliberately

The Problem

The common failure is backwards: autovacuum usually does not start as the villain. It becomes visible after the system has already created cleanup debt.

PostgreSQL standard VACUUM can run alongside ordinary SELECT, INSERT, UPDATE, and DELETE, while VACUUM FULL requires an ACCESS EXCLUSIVE lock and rewrites the table. That distinction matters. A normal autovacuum is designed to be cooperative, but it still consumes I/O and takes a SHARE UPDATE EXCLUSIVE lock. If conflicting operations keep interrupting it, if long transactions hold the visibility horizon open, or if the write rate exceeds cleanup capacity, dead tuples accumulate until the application starts paying for them in heap scans, index scans, cache churn, and longer vacuum cycles.

Failure pointWhat breaksWhy it matters
Long-running transaction or idle in transaction sessionDead tuples remain visible to the oldest snapshot and cannot be removedAutovacuum can run and still fail to reclaim the space operators expect
Default autovacuum_vacuum_scale_factor = 0.2 on a 200M-row tableVacuum may wait for tens of millions of obsolete tuples before triggeringThe threshold is mathematically sane for small tables and operationally late for hot large tables
Replication slot or stale replica feedback holds xminCleanup is pinned behind downstream consumptionPrimary database bloat becomes a replication and availability problem, not just local storage waste
Large tables become eligible togetherautovacuum_max_workers can be occupied by a small number of relationsSmaller hot tables wait behind large scans and latency spreads across unrelated features
Monitoring only pg_stat_activity active countOperators see queueing, not the relation causing cleanup debtThe dashboard points at symptoms while the table-level cause grows

The core question is not “Why did autovacuum run during peak load?” The useful question is: why did the system enter peak load with no table-level cleanup budget, no lock visibility, and no oldest-transaction alarm?

Treat Vacuum as a Capacity Control Plane

The right architecture is a small vacuum control plane: table-level observability, per-table policy, lock and horizon detection, and an operator runbook that distinguishes emergency relief from debt repayment.

flowchart TD
    App[application writes] --> MVCC[MVCC creates old row versions]
    MVCC --> Stats[pg_stat_user_tables dead tuple counters]
    MVCC --> Horizon[oldest xmin and replication horizon]
    Stats --> Dashboard[vacuum health dashboard]
    Horizon --> Dashboard
    Locks[pg_locks and pg_stat_activity] --> Dashboard
    Progress[pg_stat_progress_vacuum] --> Dashboard
    Dashboard --> Policy[per-table autovacuum policy]
    Policy --> Workers[autovacuum workers]
    Workers --> Cleanup[dead tuple cleanup and freeze progress]
    Cleanup --> Capacity[steady write capacity]
    Dashboard --> Runbook[operator runbook]
  1. Build the dashboard around relations, not sessions.

    Start with pg_stat_user_tables, pg_class, pg_stat_activity, pg_locks, and pg_stat_progress_vacuum. Active connections are only the smoke. The heat is per relation: n_dead_tup, relation size, last_autovacuum, last_autoanalyze, current vacuum phase, lock wait duration, and the oldest transaction age.

    SELECT
        s.schemaname,
        s.relname,
        s.n_live_tup,
        s.n_dead_tup,
        pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
        ROUND((s.n_dead_tup::numeric / NULLIF(s.n_live_tup, 0)) * 100, 2) AS dead_rows_pct,
        s.last_autovacuum,
        s.last_autoanalyze,
        age(now(), s.last_autovacuum) AS last_autovacuum_age
    FROM pg_stat_user_tables s
    JOIN pg_class c ON c.relname = s.relname
    JOIN pg_namespace n ON n.oid = c.relnamespace AND n.nspname = s.schemaname
    ORDER BY s.n_dead_tup DESC;
    

    Verification: the top 20 write-heavy tables should have visible dead tuple count, dead tuple ratio, total relation size, last autovacuum age, and last analyze age on one screen.

  2. Add horizon monitoring before tuning cost limits.

    Autovacuum cannot remove row versions still visible to an old snapshot. A single abandoned transaction can make vacuum appear “ineffective” even when workers are active. Check for large backend_xmin, old backend_xid, prepared transactions, and replication slots.

    SELECT
        pid,
        usename,
        application_name,
        state,
        wait_event_type,
        wait_event,
        age(backend_xmin) AS backend_xmin_age,
        age(backend_xid) AS backend_xid_age,
        age(now(), xact_start) AS transaction_age,
        LEFT(query, 160) AS query_sample
    FROM pg_stat_activity
    WHERE backend_xmin IS NOT NULL
       OR backend_xid IS NOT NULL
    ORDER BY GREATEST(
        COALESCE(age(backend_xmin), 0),
        COALESCE(age(backend_xid), 0)
    ) DESC;
    

    Verification: alert when a transaction age crosses a workload-specific threshold, such as 5 minutes for OLTP checkout paths or 30 minutes for internal reporting, before tying the alert to dead tuple growth.

  3. Track vacuum progress by phase.

    PostgreSQL exposes pg_stat_progress_vacuum for active vacuum operations, including autovacuum workers. The view reports heap blocks scanned, heap blocks vacuumed, index vacuum count, dead tuple counters, and the current phase; PostgreSQL documents this under progress reporting: VACUUM progress reporting.

    SELECT
        p.pid,
        a.datname,
        p.relid::regclass AS relation,
        a.query,
        p.phase,
        p.heap_blks_total,
        p.heap_blks_scanned,
        p.heap_blks_vacuumed,
        ROUND(100 * p.heap_blks_scanned::numeric / NULLIF(p.heap_blks_total, 0), 2) AS pct_scanned,
        p.index_vacuum_count,
        p.num_dead_tuples
    FROM pg_stat_progress_vacuum p
    JOIN pg_stat_activity a USING (pid)
    ORDER BY p.pid;
    

    Verification: operators should be able to classify an active vacuum as scanning, vacuuming indexes, vacuuming heap, cleaning indexes, truncating heap, or performing final cleanup without reading server logs.

  4. Tune hot tables with absolute thresholds, not ratios alone.

    PostgreSQL triggers autovacuum when obsolete tuple count exceeds:

    autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples
    

    That formula is documented in the PostgreSQL autovacuum daemon section: autovacuum threshold formula. On a 10M-row orders table, the default 50 + 0.2 * 10000000 means roughly 2,000,050 obsolete tuples before vacuum eligibility. On a hot table updated continuously, that is not a maintenance threshold. It is an incident waiting room with chairs.

    ALTER TABLE orders SET (
        autovacuum_vacuum_scale_factor = 0.01,
        autovacuum_vacuum_threshold = 50000,
        autovacuum_analyze_scale_factor = 0.02,
        autovacuum_analyze_threshold = 50000,
        autovacuum_vacuum_cost_delay = 10
    );
    

    Verification: after a realistic write-load test, the table should show smaller, more frequent vacuum cycles, stable n_dead_tup, and no sustained increase in p95 query latency during vacuum phases.

  5. Separate emergency termination from recovery.

    Terminating an autovacuum worker may reduce immediate pressure if it is contending with production traffic, but it does not remove the dead tuples. It postpones cleanup. Worse, if the worker is running to prevent wraparound, PostgreSQL does not treat it like ordinary background work; autovacuum behavior around wraparound prevention is intentionally harder to interrupt.

    SELECT
        pid,
        query,
        age(now(), query_start) AS runtime,
        wait_event_type,
        wait_event
    FROM pg_stat_activity
    WHERE query ILIKE '%autovacuum%';
    

    Verification: every termination action must create a follow-up ticket with target relation, observed dead tuples, oldest transaction state, and an explicit manual VACUUM or retuning plan.

In Practice

The documented pattern is not theoretical. GitLab publicly analyzed PostgreSQL autovacuum behavior on GitLab.com and treated it as a production tuning problem backed by stats, logs, and Prometheus data. In their autovacuum considerations issue, they reported autovacuum consuming a high share of read I/O while doing a small amount of block cleanup, then evaluated table-specific behavior and candidate configuration changes: GitLab autovacuum considerations.

The important engineering detail is scale. GitLab called out relations in the hundreds of millions to over a billion tuples, including merge_request_diff_files and merge_request_diff_commits. For those shapes, a global threshold is a blunt instrument. A scale factor that is reasonable for a 500K-row table can be absurd for a 1B-row table, and a threshold tuned for one high-churn table can make quieter tables vacuum too often.

Public evidenceWhat it showsProduction lesson
GitLab tracked autovacuum and autoanalyze daily countsVacuum frequency was measured as an operational signalCount vacuum cycles per table, not just cluster-wide activity
GitLab compared before and after migration behaviorConfiguration changed based on observed workloadTreat autovacuum tuning as capacity testing, not folklore
GitLab inspected pg_stat_all_table.n_dead_tup in PrometheusDead tuples were tracked over timeAlert on trajectory, not only threshold breach
GitLab selected candidate tables for custom settingsLarge relations needed table-specific policyPer-table storage parameters are normal for serious PostgreSQL operations

This also follows directly from PostgreSQL behavior. UPDATE and DELETE leave old row versions behind under MVCC until vacuum can mark space reusable. Standard vacuum does not generally return space to the operating system; it makes space reusable inside the relation. VACUUM FULL rewrites the table and requires an exclusive lock. That is why waiting until bloat is obvious is expensive: at that point, the fix may require either a long plain vacuum that only stabilizes reuse or a rewrite operation that needs a maintenance window.

The source incident describes the recognizable operational smell: response time spikes, lock waits, autovacuum visible in pg_stat_activity, and operators reaching for termination commands. The deeper diagnosis is that the system had no pre-peak signal for cleanup debt. Once users are checking out, workers are busy, indexes are colder, heap pages are dirty, and autovacuum is behind, every option is ugly. The best time to find a bloated orders table is before the marketing email, not while the payment service is practicing interpretive latency.

A production vacuum dashboard should make five questions answerable in less than a minute:

QuestionView or metricBad signal
Which tables are accumulating cleanup debt?pg_stat_user_tables.n_dead_tup, relation sizeDead tuples rising faster than vacuum completion
Is vacuum running or stalled?pg_stat_progress_vacuum.phasePhase unchanged while lock waits or I/O waits climb
What is pinning cleanup?pg_stat_activity.backend_xmin, replication slotsOld snapshot age grows while dead tuples persist
Are workers saturated?Active autovacuum workers and table queueLarge relations occupy workers for long periods
Is the threshold wrong?Dead tuples at vacuum start and durationVacuum starts only after latency or bloat is visible

Where It Breaks

Failure modeTriggerFix
Dead tuple percentage looks fine while absolute debt is hugeA 1B-row table with 1 percent dead rows still has 10M obsolete tuplesAlert on absolute n_dead_tup, dead tuple ratio, and relation size together
Autovacuum runs but bloat does not fallLong transaction, prepared transaction, stale replica feedback, or replication slot pins the visibility horizonMonitor backend_xmin, backend_xid, pg_prepared_xacts, and replication slot lag before changing vacuum cost settings
Vacuum becomes too aggressive after lowering scale factorHot tables vacuum frequently enough to compete with foreground I/OTune autovacuum_vacuum_cost_delay, table thresholds, and worker count under load; verify p95 latency during vacuum
VACUUM FULL becomes the only visible cleanup optionPlain vacuum can reuse space but cannot compact most table files back to the operating systemPrefer steady plain vacuum; reserve VACUUM FULL, CLUSTER, or table rewrite for controlled maintenance windows with disk headroom
Partitioned parent has stale planner statisticsAutovacuum processes partitions, but parent-level statistics may not update as expectedRun explicit ANALYZE on partitioned parents after load or distribution shifts
Insert-heavy table misses cleanup expectationsPostgreSQL 13 and later include insert-trigger autovacuum settings, but older tuning habits focus only on update and delete churnInclude autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor in version-aware reviews
Terminating autovacuum becomes the runbookOperators kill workers during peak traffic and never repay cleanup debtRequire a follow-up manual vacuum, threshold change, or capacity review for every terminated worker
Managed database hides host-level detailAmazon RDS, Aurora PostgreSQL, Cloud SQL, or Azure Database for PostgreSQL restrict OS-level inspectionUse SQL-visible signals first: stats views, logs, parameter groups, Performance Insights, and query wait sampling

What to Do Next

  • Problem: Vacuum incidents happen when write throughput creates cleanup debt faster than PostgreSQL can safely remove it.
  • Solution: Treat autovacuum as a capacity control plane with table-level metrics, horizon detection, progress visibility, and per-table policy.
  • Proof: A healthy system shows bounded n_dead_tup, recent last_autovacuum on hot tables, short transaction ages, and vacuum progress that completes without sustained lock waits.
  • Action: This week, build a dashboard for the top 20 write-heavy tables showing dead tuples, relation size, last autovacuum age, oldest transaction age, lock waiters, and active vacuum phase.

Autovacuum does not need heroics; it needs budget, observability, and the dignity of being treated like production capacity before it collects payment at the worst possible hour.