Covering Indexes Are Not Enough Without Visibility

A PostgreSQL covering index is not a performance fix by itself; it is a bet that the query, the index payload, and the visibility map will stay aligned under real production churn.

Situation

The default move is still an ordinary B-tree index on the predicate column: CREATE INDEX ON users(email). The better move, when the read path is stable, is a covering index using PostgreSQL 11’s INCLUDE clause, which stores projected columns in the index payload so an index-only scan can answer the query without visiting the heap when visibility permits it.

Approach	What it optimizes	What it still pays for
Ordinary B-tree index	Finds matching tuple IDs quickly	Heap reads for projected columns and Multi-Version Concurrency Control (MVCC) visibility
Covering index with `INCLUDE`	Keeps predicate and selected columns in one index	Larger index, write overhead, visibility map dependency
Covering index plus vacuum discipline	Avoids heap access for stable pages	Operational ownership of autovacuum and long transactions

The Problem

PostgreSQL indexes do not store complete row visibility. They can point to candidate rows, but MVCC visibility is determined from heap state unless PostgreSQL can trust the visibility map. The official PostgreSQL documentation is explicit: index-only scans only win when the needed columns are available from the index and a significant fraction of heap pages have their all-visible bits set in the visibility map.

Failure point	What breaks	Why it matters
Projection misses the index	`SELECT username, status` uses `idx_users_email(email)` and still reads the heap	The index finds rows, but the table still serves the selected columns
Visibility map is stale	Plan says `Index Only Scan`, but reports `Heap Fetches: 12000`	The scan is only “index-only” for pages marked all-visible
Autovacuum threshold is too loose	Default `autovacuum_vacuum_scale_factor = 0.2` can mean roughly 40M changed tuples on a 200M-row table before vacuum triggers	Large tables can accumulate heap pages that are not all-visible for too long
Included column churn	Updating `status` or `username` touches an indexed column	PostgreSQL must maintain the index entry, and HOT updates are less likely
Staging lies politely	Freshly loaded and manually vacuumed test data shows zero heap fetches	Production write churn, old snapshots, and delayed vacuum change the execution profile

The core question is not “did we add an index?” It is: can PostgreSQL answer this production query from the index while proving that the referenced heap pages are visible to the current snapshot?

Design the Index Around the Read Path and the Visibility Map

The right architecture is a measured covering-index loop: identify the hot read path, build the narrowest covering index, verify heap avoidance with EXPLAIN (ANALYZE, BUFFERS), and tune vacuum behavior for that table instead of celebrating the DDL.

flowchart TD
    Query[hot read query — predicate and projection] --> Cover[covering B-tree index — key and included columns]
    Cover --> VM[visibility map — all visible bit]
    VM -->|bit set| Return[index tuple returned]
    VM -->|bit clear| Heap[heap visit for MVCC check]
    Heap --> Return
    Vacuum[VACUUM and autovacuum] --> VM
    Writes[INSERT UPDATE DELETE on page] --> VM

Start from pg_stat_statements, not intuition. Pick one query by total time and call count, then write down its WHERE, ORDER BY, and SELECT columns.
Verification: the candidate query has a stable fingerprint and enough calls to matter.
Put search columns in the key and projected columns in INCLUDE. For the lookup path below, email is the key; username and status are payload.
```
CREATE INDEX CONCURRENTLY idx_users_email_covering
ON users(email)
INCLUDE (username, status);
```
Verification: CREATE INDEX CONCURRENTLY finishes without blocking ordinary reads and writes, and the index size is acceptable via pg_relation_size.
Run the real query with execution metrics.
```
EXPLAIN (ANALYZE, BUFFERS)
SELECT username, status
FROM users
WHERE email = 'dev@example.com';
```
Verification: look for Index Only Scan, low shared buffer reads, and Heap Fetches: 0 or a number small enough to survive peak traffic.
Check visibility health, not just plan shape. PostgreSQL’s visibility map stores all-visible and all-frozen state per heap page, and its bits are set by vacuum and cleared by data-modifying operations.
Verification: if heap fetches remain high after the index is used, inspect last_autovacuum, n_dead_tup, long-running transactions, and table-level autovacuum settings.
Bound the write cost. Included columns are not search keys, but they still live in the index. A wide text, jsonb, or frequently updated status column can turn a read optimization into write amplification.
Verification: compare pg_stat_user_indexes.idx_scan, write latency, WAL volume, HOT update ratio, and index size before and after rollout.

In Practice

I am not going to invent a 2:14 AM incident with a heroic graph. The documented production pattern is enough, and the public PostgreSQL material gives a concrete measurement boundary.

PostgreSQL 11 added covering indexes with INCLUDE, documented in the project release notes and in the current index-only scan documentation. The documentation says the scan is physically possible when the index type supports it and the query’s referenced columns are available from the index. B-tree indexes satisfy the access-method requirement. The same documentation adds the operational catch: because visibility data is not stored in index entries, PostgreSQL checks the visibility map before skipping the heap.

That behavior explains why a plan can contain Index Only Scan and still do heap work. The plan node describes the access strategy; Heap Fetches tells you how often the executor had to visit heap pages anyway. If heap fetches are high, the covering index may still reduce work, but it has not removed the table from the read path.

A useful public comparison comes from Dalibo’s PostgreSQL 11 workshop, which uses a 10M-row table with columns a, b, and c. With a unique index on (a, b), selecting only a, b can use an index-only scan with Heap Fetches: 0. Selecting a, b, c from the same predicate cannot be answered by that index, so PostgreSQL uses an index scan and reads the table to get c. After adding a covering index on (a, b) INCLUDE (c), the same a, b, c query returns to an index-only scan with Heap Fetches: 0.

Public PostgreSQL 11 workshop measurement	Plan shape	Heap fetch signal	Execution time
Existing unique index on `(a, b)`, query selects `a, b`	`Index Only Scan`	`0`	`12.628 ms`
Existing unique index on `(a, b)`, query selects `a, b, c`	`Index Scan`	Heap access is inherent	`16.034 ms`
Covering unique index on `(a, b) INCLUDE (c)`, query selects `a, b, c`	`Index Only Scan`	`0`	`14.263 ms`

The more interesting part is not the small read-time delta in that example. It is the storage and write tradeoff. Dalibo reports 214 MB for the unique (a, b) index and 387 MB for a separate (a, b, c) index, or 602 MB if both are kept. Replacing that pair with one unique covering index on (a, b) INCLUDE (c) is reported at 386 MB. The same workshop then inserts 100k rows: maintaining one covering index reports 502.594 ms; maintaining the two-index design reports 843.147 ms.

That is the design tradeoff senior engineers should care about. The covering index did not make writes free. It reduced a two-index design into one index while preserving uniqueness semantics on (a, b). If your alternative is no extra index, writes still pay. If your alternative is two overlapping indexes, a covering index may be the cheaper structure.

The deeper production gotcha is autovacuum math. PostgreSQL documents autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2 defaults. On small tables, that is fine. On a 200M-row relation, scale-factor-driven vacuum can wait for a very large number of changed tuples unless table storage parameters override it. That delay matters because visibility map bits are conservative: if PostgreSQL cannot prove a page is all-visible, it visits the heap.

There is also a schema-design trap. Adding INCLUDE (username, status) is reasonable for a hot lookup endpoint. Adding ten payload columns because “index-only scans are fast” is not engineering; it is moving the table into another structure with worse write economics. PostgreSQL will reject oversized index tuples, and before that hard failure, you pay with memory pressure, cache churn, WAL, and slower updates.

The useful mental model is simple: a covering index is a read-path contract. Autovacuum, transaction age, and update patterns are the parties that can break it.

Where It Breaks

Failure mode	Trigger	Fix
`Index Only Scan` still shows large `Heap Fetches`	Pages are not marked all-visible after recent `INSERT`, `UPDATE`, or `DELETE` activity	Tune table-level autovacuum and remove long-running transactions holding old snapshots
Covering index bloats quickly	`INCLUDE` contains wide `text`, `jsonb`, or low-value projected columns	Keep payload columns narrow and tied to one hot query family
Write latency rises after rollout	Included columns are frequently updated, preventing cheap heap-only behavior	Drop volatile payload columns or split read model from write-heavy table
Planner ignores the new index	Query selects extra columns, uses mismatched predicates, or statistics are stale	Re-run `ANALYZE`, verify exact projection, and compare with `EXPLAIN (ANALYZE, BUFFERS)`
Staging benchmark overstates gains	Test data was bulk-loaded, vacuumed, and mostly static	Replay production write mix or test after churn before trusting heap-fetch counts
RDS maintenance lags during peak write load	Autovacuum workers and cost limits cannot keep up with dead tuples	Use per-table autovacuum settings and monitor `pg_stat_user_tables`

What to Do Next

Problem: Ordinary indexes still force heap access when the query projects columns outside the index or when MVCC visibility cannot be proven from the visibility map.
Solution: Build narrow covering indexes only for high-call-count read paths, then treat autovacuum health as part of the index design.
Proof: The validation signal is not the presence of Index Only Scan; it is low Heap Fetches, stable buffer reads, acceptable index size, preserved HOT update ratio, and no write regression.
Action: This week, take the top query from pg_stat_statements, add one candidate covering index in staging, and compare EXPLAIN (ANALYZE, BUFFERS), pg_relation_size, write latency, WAL volume, and HOT update ratio before and after real write churn.

A fast PostgreSQL query is rarely the result of one clever index; it is the result of making the storage engine’s promises line up with the workload it is actually running.

Situation

The Problem

Design the Index Around the Read Path and the Visibility Map

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Prometheus + Grafana for Database Engineers: Open-Source Monitoring That Actually Works

Cassandra Observability: Compaction, Tombstones, Repair, Latency, and Hot Partitions

Cloud Architecture Review Checklist for Database-Backed Applications