Covering Indexes Are Not Enough Without Visibility
A PostgreSQL covering index is not a performance fix by itself; it is a bet that the query, the index payload, and the visibility map will stay aligned under real production churn.
Situation
The default move is still an ordinary B-tree index on the predicate column: CREATE INDEX ON users(email). The better move, when the read path is stable, is a covering index using PostgreSQL 11’s INCLUDE clause, which stores projected columns in the index payload so an index-only scan can answer the query without visiting the heap when visibility permits it.
| Approach | What it optimizes | What it still pays for |
|---|---|---|
| Ordinary B-tree index | Finds matching tuple IDs quickly | Heap reads for projected columns and Multi-Version Concurrency Control (MVCC) visibility |
Covering index with INCLUDE | Keeps predicate and selected columns in one index | Larger index, write overhead, visibility map dependency |
| Covering index plus vacuum discipline | Avoids heap access for stable pages | Operational ownership of autovacuum and long transactions |
The Problem
PostgreSQL indexes do not store complete row visibility. They can point to candidate rows, but MVCC visibility is determined from heap state unless PostgreSQL can trust the visibility map. The official PostgreSQL documentation is explicit: index-only scans only win when the needed columns are available from the index and a significant fraction of heap pages have their all-visible bits set in the visibility map.
| Failure point | What breaks | Why it matters |
|---|---|---|
| Projection misses the index | SELECT username, status uses idx_users_email(email) and still reads the heap | The index finds rows, but the table still serves the selected columns |
| Visibility map is stale | Plan says Index Only Scan, but reports Heap Fetches: 12000 | The scan is only “index-only” for pages marked all-visible |
| Autovacuum threshold is too loose | Default autovacuum_vacuum_scale_factor = 0.2 can mean roughly 40M changed tuples on a 200M-row table before vacuum triggers | Large tables can accumulate heap pages that are not all-visible for too long |
| Included column churn | Updating status or username touches an indexed column | PostgreSQL must maintain the index entry, and HOT updates are less likely |
| Staging lies politely | Freshly loaded and manually vacuumed test data shows zero heap fetches | Production write churn, old snapshots, and delayed vacuum change the execution profile |
The core question is not “did we add an index?” It is: can PostgreSQL answer this production query from the index while proving that the referenced heap pages are visible to the current snapshot?
Design the Index Around the Read Path and the Visibility Map
The right architecture is a measured covering-index loop: identify the hot read path, build the narrowest covering index, verify heap avoidance with EXPLAIN (ANALYZE, BUFFERS), and tune vacuum behavior for that table instead of celebrating the DDL.
flowchart TD
Query[hot read query — predicate and projection] --> Cover[covering B-tree index — key and included columns]
Cover --> VM[visibility map — all visible bit]
VM -->|bit set| Return[index tuple returned]
VM -->|bit clear| Heap[heap visit for MVCC check]
Heap --> Return
Vacuum[VACUUM and autovacuum] --> VM
Writes[INSERT UPDATE DELETE on page] --> VM
-
Start from
pg_stat_statements, not intuition. Pick one query by total time and call count, then write down itsWHERE,ORDER BY, andSELECTcolumns.
Verification: the candidate query has a stable fingerprint and enough calls to matter. -
Put search columns in the key and projected columns in
INCLUDE. For the lookup path below,emailis the key;usernameandstatusare payload.CREATE INDEX CONCURRENTLY idx_users_email_covering ON users(email) INCLUDE (username, status);Verification:
CREATE INDEX CONCURRENTLYfinishes without blocking ordinary reads and writes, and the index size is acceptable viapg_relation_size. -
Run the real query with execution metrics.
EXPLAIN (ANALYZE, BUFFERS) SELECT username, status FROM users WHERE email = 'dev@example.com';Verification: look for
Index Only Scan, low shared buffer reads, andHeap Fetches: 0or a number small enough to survive peak traffic. -
Check visibility health, not just plan shape. PostgreSQL’s visibility map stores all-visible and all-frozen state per heap page, and its bits are set by vacuum and cleared by data-modifying operations.
Verification: if heap fetches remain high after the index is used, inspectlast_autovacuum,n_dead_tup, long-running transactions, and table-level autovacuum settings. -
Bound the write cost. Included columns are not search keys, but they still live in the index. A wide
text,jsonb, or frequently updated status column can turn a read optimization into write amplification.
Verification: comparepg_stat_user_indexes.idx_scan, write latency, WAL volume, HOT update ratio, and index size before and after rollout.
In Practice
I am not going to invent a 2:14 AM incident with a heroic graph. The documented production pattern is enough, and the public PostgreSQL material gives a concrete measurement boundary.
PostgreSQL 11 added covering indexes with INCLUDE, documented in the project release notes and in the current index-only scan documentation. The documentation says the scan is physically possible when the index type supports it and the query’s referenced columns are available from the index. B-tree indexes satisfy the access-method requirement. The same documentation adds the operational catch: because visibility data is not stored in index entries, PostgreSQL checks the visibility map before skipping the heap.
That behavior explains why a plan can contain Index Only Scan and still do heap work. The plan node describes the access strategy; Heap Fetches tells you how often the executor had to visit heap pages anyway. If heap fetches are high, the covering index may still reduce work, but it has not removed the table from the read path.
A useful public comparison comes from Dalibo’s PostgreSQL 11 workshop, which uses a 10M-row table with columns a, b, and c. With a unique index on (a, b), selecting only a, b can use an index-only scan with Heap Fetches: 0. Selecting a, b, c from the same predicate cannot be answered by that index, so PostgreSQL uses an index scan and reads the table to get c. After adding a covering index on (a, b) INCLUDE (c), the same a, b, c query returns to an index-only scan with Heap Fetches: 0.
| Public PostgreSQL 11 workshop measurement | Plan shape | Heap fetch signal | Execution time |
|---|---|---|---|
Existing unique index on (a, b), query selects a, b | Index Only Scan | 0 | 12.628 ms |
Existing unique index on (a, b), query selects a, b, c | Index Scan | Heap access is inherent | 16.034 ms |
Covering unique index on (a, b) INCLUDE (c), query selects a, b, c | Index Only Scan | 0 | 14.263 ms |
The more interesting part is not the small read-time delta in that example. It is the storage and write tradeoff. Dalibo reports 214 MB for the unique (a, b) index and 387 MB for a separate (a, b, c) index, or 602 MB if both are kept. Replacing that pair with one unique covering index on (a, b) INCLUDE (c) is reported at 386 MB. The same workshop then inserts 100k rows: maintaining one covering index reports 502.594 ms; maintaining the two-index design reports 843.147 ms.
That is the design tradeoff senior engineers should care about. The covering index did not make writes free. It reduced a two-index design into one index while preserving uniqueness semantics on (a, b). If your alternative is no extra index, writes still pay. If your alternative is two overlapping indexes, a covering index may be the cheaper structure.
The deeper production gotcha is autovacuum math. PostgreSQL documents autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2 defaults. On small tables, that is fine. On a 200M-row relation, scale-factor-driven vacuum can wait for a very large number of changed tuples unless table storage parameters override it. That delay matters because visibility map bits are conservative: if PostgreSQL cannot prove a page is all-visible, it visits the heap.
There is also a schema-design trap. Adding INCLUDE (username, status) is reasonable for a hot lookup endpoint. Adding ten payload columns because “index-only scans are fast” is not engineering; it is moving the table into another structure with worse write economics. PostgreSQL will reject oversized index tuples, and before that hard failure, you pay with memory pressure, cache churn, WAL, and slower updates.
The useful mental model is simple: a covering index is a read-path contract. Autovacuum, transaction age, and update patterns are the parties that can break it.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
Index Only Scan still shows large Heap Fetches | Pages are not marked all-visible after recent INSERT, UPDATE, or DELETE activity | Tune table-level autovacuum and remove long-running transactions holding old snapshots |
| Covering index bloats quickly | INCLUDE contains wide text, jsonb, or low-value projected columns | Keep payload columns narrow and tied to one hot query family |
| Write latency rises after rollout | Included columns are frequently updated, preventing cheap heap-only behavior | Drop volatile payload columns or split read model from write-heavy table |
| Planner ignores the new index | Query selects extra columns, uses mismatched predicates, or statistics are stale | Re-run ANALYZE, verify exact projection, and compare with EXPLAIN (ANALYZE, BUFFERS) |
| Staging benchmark overstates gains | Test data was bulk-loaded, vacuumed, and mostly static | Replay production write mix or test after churn before trusting heap-fetch counts |
| RDS maintenance lags during peak write load | Autovacuum workers and cost limits cannot keep up with dead tuples | Use per-table autovacuum settings and monitor pg_stat_user_tables |
What to Do Next
- Problem: Ordinary indexes still force heap access when the query projects columns outside the index or when MVCC visibility cannot be proven from the visibility map.
- Solution: Build narrow covering indexes only for high-call-count read paths, then treat autovacuum health as part of the index design.
- Proof: The validation signal is not the presence of
Index Only Scan; it is lowHeap Fetches, stable buffer reads, acceptable index size, preserved HOT update ratio, and no write regression. - Action: This week, take the top query from
pg_stat_statements, add one candidate covering index in staging, and compareEXPLAIN (ANALYZE, BUFFERS),pg_relation_size, write latency, WAL volume, and HOT update ratio before and after real write churn.
A fast PostgreSQL query is rarely the result of one clever index; it is the result of making the storage engine’s promises line up with the workload it is actually running.