Index Debt Review: How to Find Bad, Missing, and Duplicate Indexes
Indexes accumulate silently. Engineers add them to fix slow queries, migration scripts add them to enforce constraints, ORM scaffolding adds them speculatively, and nobody systematically removes them. Over several years, a database with 50 tables can accumulate 200 indexes — half of which are never used, a tenth of which duplicate each other, and several of which are invalid or bloated. The cost is paid on every write: each insert, update, and delete must maintain every index on the affected table, whether or not that index is ever scanned.
Situation
PostgreSQL’s pg_stat_user_indexes tracks cumulative scan counts for every index since the last statistics reset. An index with idx_scan = 0 has never been used in a query plan. An index that duplicates another index means two identical maintenance operations happen on every write. An invalid index — one that failed partway through a CREATE INDEX CONCURRENTLY — takes up space and maintenance overhead without ever being selected by the planner.
Index debt reviews should happen on a schedule, not just when disk is running low. Write amplification from carrying 40 unused indexes on a high-write table is not dramatic — it adds microseconds per write — but it compounds. At high write volume, the cumulative effect shows up as elevated lock contention during bulk operations and higher checkpoint I/O pressure.
The review is a structured SQL audit. No tools required beyond psql.
Symptoms
| Signal | Where to see it | What it means |
|---|---|---|
| Table size growing faster than row count | pg_size_pretty(pg_total_relation_size(...)) | Index bloat accumulating alongside table bloat |
| Slow bulk inserts or updates on large tables | Application timing logs | Too many indexes being maintained per write |
idx_scan = 0 on multiple indexes | pg_stat_user_indexes | Unused indexes consuming write bandwidth |
Duplicate entries in pg_index by indrelid and indkey | pg_index | Redundant indexes doubling maintenance overhead |
indisvalid = false in pg_index | pg_index | Invalid indexes from failed concurrent builds |
| High seq_scan count with low idx_scan | pg_stat_user_tables | Missing index on a frequently filtered column |
First Five Checks
- Unused indexes (zero scan count) — the first thing to remove:
SELECT
s.schemaname,
s.tablename,
s.indexname,
s.idx_scan,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
i.indisprimary
FROM pg_stat_user_indexes s
JOIN pg_index i ON i.indexrelid = s.indexrelid
WHERE s.idx_scan = 0
AND NOT i.indisprimary
AND NOT i.indisunique
ORDER BY pg_relation_size(s.indexrelid) DESC
LIMIT 20;
Sort by size to prioritize — a 10 GB unused index is a higher-priority removal than a 10 MB one. Exclude primary keys and unique constraints; those enforce data integrity regardless of query usage.
Check when statistics were last reset before acting on zero-scan counts:
SELECT stats_reset FROM pg_stat_database WHERE datname = current_database();
If stats_reset was yesterday, a zero scan count is not evidence. If it was 60+ days ago, it is reliable.
- Duplicate indexes — same table, same column list:
SELECT
indrelid::regclass AS tablename,
array_agg(indexrelid::regclass ORDER BY pg_relation_size(indexrelid) DESC) AS indexes,
array_agg(pg_size_pretty(pg_relation_size(indexrelid)) ORDER BY pg_relation_size(indexrelid) DESC) AS sizes
FROM pg_index
GROUP BY indrelid, indkey
HAVING count(*) > 1;
Two indexes on (customer_id) with identical definitions are pure overhead — keep the one with higher idx_scan and drop the other. Duplicates often result from migration tools generating a new index when a unique constraint was added on a column that already had a regular index.
- Bloated or low-use large indexes — high storage cost relative to usage:
SELECT
s.indexrelid::regclass AS indexname,
s.tablename,
s.idx_scan,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
pg_relation_size(s.indexrelid) AS raw_size
FROM pg_stat_user_indexes s
WHERE s.idx_scan < 10
ORDER BY raw_size DESC
LIMIT 10;
An index with fewer than 10 scans that takes 5 GB of storage is worth examining closely. Combine with the age of statistics reset to determine if ”< 10 scans” reflects weeks of production traffic or just a few hours.
- Tables with high sequential scan counts and missing indexes — potential missing indexes:
SELECT
relname,
seq_scan,
idx_scan,
n_live_tup,
seq_scan - idx_scan AS seq_excess
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND n_live_tup > 10000
AND seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 15;
A table with 500,000 rows where seq_scan = 10000 and idx_scan = 50 is performing full table scans on almost every access. Pair this with EXPLAIN (ANALYZE) on the most frequent queries against that table to identify which column would benefit from an index.
- Invalid indexes — indexes that must be rebuilt:
SELECT
indexrelid::regclass AS indexname,
indrelid::regclass AS tablename,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_index
WHERE NOT indisvalid;
An invalid index results from a CREATE INDEX CONCURRENTLY that failed partway through, typically due to a deadlock or constraint violation. PostgreSQL keeps the partially-built index but marks it as invalid — it takes up space and triggers write maintenance but is never used by the planner. These must be rebuilt or dropped.
Decision Tree
flowchart TD
A[Index audit triggered] --> B{stats_reset recent?}
B -->|yes — under 30 days| C[Wait for 30 days of data before removing]
B -->|no — over 30 days of data| D{idx_scan = 0 indexes found?}
D -->|yes| E{Primary key or unique constraint?}
E -->|yes| F[Keep — data integrity requirement]
E -->|no| G[DROP INDEX CONCURRENTLY]
D -->|no| H{Duplicate indexes found?}
H -->|yes| I[Keep higher-scan index — drop duplicate]
H -->|no| J{Invalid indexes found?}
J -->|yes| K[REINDEX CONCURRENTLY]
J -->|no| L{High seq_scan on large table?}
L -->|yes| M[EXPLAIN slow query — add covering index]
L -->|no| N[Index health OK — schedule next audit]
Remediation Options
Option 1 — Drop unused indexes
Always use CONCURRENTLY to avoid blocking writes:
-- Drop a specific unused index
DROP INDEX CONCURRENTLY schema_name.unused_index_name;
-- Verify it is gone
SELECT indexname FROM pg_indexes
WHERE tablename = 'orders' AND indexname = 'unused_index_name';
DROP INDEX CONCURRENTLY waits for all transactions that reference the index to complete, then removes it. It does not hold an ACCESS EXCLUSIVE lock for the duration — it uses multiple lower-level locks and can coexist with reads and writes. It cannot run inside a transaction block.
Option 2 — Rebuild invalid or bloated indexes
For invalid indexes from failed concurrent builds:
-- Rebuild concurrently — creates new valid index, replaces old
REINDEX INDEX CONCURRENTLY schema_name.invalid_index_name;
-- Or drop and recreate
DROP INDEX CONCURRENTLY schema_name.invalid_index_name;
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
For bloated indexes where the size has grown disproportionately to the data (common on tables with many deletes and updates), REINDEX CONCURRENTLY reclaims the space. The bloat is visible by comparing pg_relation_size(indexrelid) against pg_relation_size(indrelid) * 0.1 — an index larger than 10% of its table’s size on a low-selectivity column is worth investigating.
Option 3 — Create missing indexes for high-seq-scan tables
When pg_stat_user_tables shows a table with seq_scan >> idx_scan and large n_live_tup, identify the query pattern and create a covering index:
-- Always create concurrently in production
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing'); -- partial index if applicable
-- Verify the index is used after creation
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > now() - interval '7 days'
ORDER BY created_at DESC
LIMIT 50;
A partial index (WHERE status IN (...)) is smaller, faster to maintain, and more selective than a full index on the same column. Use it when the query always filters to a known subset.
Rollback Plan
DROP INDEX CONCURRENTLY: reversible by recreating the index withCREATE INDEX CONCURRENTLY. Keep the original index DDL in a migration file before dropping so reconstruction is a single command. Note that recreation is not instant on large tables — budget time for it.REINDEX CONCURRENTLY: leaves the original index in place until the rebuild is complete, then swaps atomically. Safe to abort at any point — if aborted, the original index is still valid.CREATE INDEX CONCURRENTLY: if the new index turns out to worsen plan choices, drop it withDROP INDEX CONCURRENTLY. The planner will revert to its prior plan immediately.- No rollback is needed for the read-only audit queries — they have no side effects.
Automation Opportunity
Index audits are well-suited to a quarterly automated report. This query generates a prioritized removal candidate list:
-- Quarterly index debt report
SELECT
'DROP INDEX CONCURRENTLY ' || schemaname || '.' || indexname || ';' AS removal_sql,
pg_size_pretty(pg_relation_size(indexrelid)) AS reclaimed,
idx_scan,
last_idx_scan
FROM pg_stat_user_indexes s
JOIN pg_index i ON i.indexrelid = s.indexrelid
WHERE s.idx_scan = 0
AND NOT i.indisprimary
AND NOT i.indisunique
AND pg_relation_size(s.indexrelid) > 10 * 1024 * 1024 -- > 10 MB only
ORDER BY pg_relation_size(s.indexrelid) DESC;
last_idx_scan (added in PostgreSQL 16) shows the timestamp of the last use, which is more precise than relying on stats_reset. For earlier versions, stats_reset from pg_stat_database is the best proxy.
In Practice
The PostgreSQL documentation for pg_stat_user_indexes explicitly notes that idx_scan is reset by pg_stat_reset() and reflects cumulative counts since the last reset. This means that before acting on zero-scan counts, verifying the age of the statistics reset is not optional — it is required. The PostgreSQL wiki recommends a minimum of 2–4 weeks of production traffic before treating a zero scan count as evidence of permanent non-use.
The documented behavior of DROP INDEX CONCURRENTLY is that it requires two table scans — one to mark the index invalid, one to remove it — and uses a series of lower-level locks rather than a single ACCESS EXCLUSIVE lock. Per the PostgreSQL documentation, it is safe to run on production tables under normal load, with the caveat that it cannot be executed inside an explicit transaction block.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Dropped index turns out to be needed | Statistics reset was recent; index was used before reset | Recreate with CREATE INDEX CONCURRENTLY; add to rollback script before next drop |
DROP INDEX CONCURRENTLY hangs | Long-running transaction holds a lock on the table | Wait for transaction to complete; monitor pg_stat_activity for blockers |
REINDEX CONCURRENTLY fails midway | Disk full during index rebuild | Free disk space; the original index is still valid after failure |
| Duplicate index removal breaks constraint | Duplicate was actually a unique constraint enforced via index | Check indisunique in pg_index before dropping — never drop unique indexes without confirming the constraint is covered elsewhere |
| New covering index triggers plan regression | Planner prefers new index for a query it should not | Drop the new index and use pg_hint_plan or partial index to constrain scope |
What to Do Next
- Problem: Unused and duplicate indexes consume write bandwidth on every insert, update, and delete, with no benefit — and invalid indexes waste space and maintenance work while never being selected by the planner.
- Solution: Run the five audit queries on a schedule, confirm statistics age, and use
DROP INDEX CONCURRENTLYandREINDEX CONCURRENTLYto clean up — always with CONCURRENTLY to avoid locking. - Proof: After removing a high-overhead unused index,
pg_stat_bgwriter.buffers_cleanshould stabilize or decrease on write-heavy tables, and bulk insert timing should improve. - Action: Run Check 1 and Check 5 this week. Drop any invalid indexes immediately with
REINDEX CONCURRENTLY, and flag any zero-scan indexes over 1 GB for the next review cycle.
Checklist
- Check
pg_stat_database.stats_reset— confirm statistics are at least 30 days old before acting - Query
pg_stat_user_indexesforidx_scan = 0— exclude primary keys and unique constraints - Sort zero-scan indexes by
pg_relation_size— prioritize largest for removal - Query
pg_indexfor duplicateindrelid + indkeycombinations — identify redundant indexes - For duplicates, keep the index with the higher
idx_scancount and drop the other - Query
pg_index WHERE NOT indisvalid— list all invalid indexes - Run
REINDEX CONCURRENTLYon all invalid indexes immediately - Check
pg_stat_user_tablesfor tables withseq_scan >> idx_scanandn_live_tup > 10000 - For high-seq-scan tables, run
EXPLAIN (ANALYZE)on frequent queries to identify missing indexes - Create any missing indexes with
CREATE INDEX CONCURRENTLY - Document all dropped indexes with their original DDL before removing
- Schedule the next index audit for 90 days out — add to the team runbook