Reading a Query Plan Without Getting Lost
The query plan is the database’s answer to a question you did not explicitly ask: given the data distribution I know about and the resources available, what is the cheapest path to your result? Reading that answer correctly means knowing which nodes cost the most, not which nodes appear first.
Situation
PostgreSQL’s EXPLAIN and EXPLAIN ANALYZE are the primary tools for diagnosing slow queries. Every engineer who works with databases reads query plans eventually. Most read them wrong — scanning from top to bottom, treating the first node as the first operation, and ignoring the difference between estimated and actual row counts.
The plan is a tree. Execution starts at the leaf nodes (innermost indentation) and flows up toward the root. The root node produces the final output.
The Problem
A query is slower than expected. EXPLAIN ANALYZE shows a plan with a Seq Scan, an Index Scan, a Hash Join, and a Sort. Which node is the problem? Without understanding how to read the plan, the engineer focuses on the Seq Scan — which may be entirely appropriate for a small table — while missing the Hash Join that is processing 10 million rows due to a bad row count estimate.
What are the three numbers that matter in every query plan, and how do you use them to find the slow node?
The Three Numbers
1. Rows (estimated vs actual)
Every node in the plan shows rows=N in the EXPLAIN output and, after ANALYZE, the actual row count alongside it. When these diverge significantly, the query planner made a bad estimate — which usually means a subsequent join or aggregation was sized incorrectly, causing it to use the wrong strategy.
2. Cost
The cost is expressed as cost=startup..total where both numbers are in abstract “cost units” (proportional to disk page reads). The startup cost is the cost before the first row is returned; the total cost is the cost to return all rows. Compare total costs across nodes to find the expensive one.
3. Actual time (from ANALYZE)
actual time=startup..total in milliseconds. This is the real measurement. A node with a high estimated cost but a low actual time is fine. A node with a low estimated cost but a high actual time indicates a bad estimate or a resource problem (I/O, locking, network).
-- Always use ANALYZE BUFFERS for real diagnosis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.status, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > now() - interval '30 days';
The BUFFERS option shows how many shared buffer hits vs disk reads each node required. A node with shared read=10000 and shared hit=0 is reading entirely from disk — a cache miss problem, not an index problem.
Reading the Plan
In the plan output, each node shows its operation (Seq Scan, Index Scan, Hash Join, Sort, etc.) and its target. Read from the most-indented line outward:
Hash Join (cost=1200..5600 rows=4500 width=48) (actual time=45.2..89.3 rows=4312 loops=1)
-> Seq Scan on customers c (cost=0..350 rows=12000 width=24) (actual time=0.1..8.2 rows=12000 loops=1)
-> Hash (cost=900..900 rows=24000 width=24) (actual time=38.1..38.1 rows=23890 loops=1)
-> Index Scan using orders_created_at_idx on orders o (actual time=0.2..22.4 rows=23890 loops=1)
The Seq Scan on customers runs first. Its 12,000 rows feed the Hash node. The Index Scan on orders runs in parallel and its rows are probed against the hash. The Hash Join produces the result. The expensive node here is the Hash (38ms) — the Seq Scan on customers is cheap because it returns all 12,000 rows directly.
In Practice
PostgreSQL’s query planner documentation describes the cost model as based on sequential page reads (cost unit ≈ 1 seq page read) with random reads costing random_page_cost times more (default: 4). An SSD changes this ratio significantly — random_page_cost = 1.1 is appropriate for SSDs and often causes the planner to prefer index scans that it would otherwise avoid.
The documented signal for a missing index: a Seq Scan with rows=N where N is large and a Filter: (condition) that eliminates most rows. The database is scanning the whole table to find a few rows — a clear candidate for an index on the filter column.
Where It Breaks
| Plan symptom | What it means | Fix |
|---|---|---|
rows=1 actual rows=50000 | Severe row count underestimate; bad join strategy | ANALYZE the table; check for stale statistics |
Seq Scan on large table with filter | No index on filter column, or index not used | Create index; or lower random_page_cost for SSD |
Sort with Disk: true | Sort spilled to disk; work_mem too small | Increase work_mem per session for large queries |
Nested Loop with millions of rows | Planner underestimated join size | Force join strategy with SET enable_nestloop = off for testing |
What to Do Next
- Problem: Slow queries cannot be diagnosed without reading the plan, and most plans are misread because engineers focus on node type rather than actual time and row estimate accuracy.
- Solution: Always use
EXPLAIN (ANALYZE, BUFFERS)for slow query diagnosis; find the node with the highest actual time; check if actual rows match estimated rows. - Proof: After running EXPLAIN ANALYZE on your five slowest queries, at least one will show a row count divergence that explains the poor plan choice.
- Action: Take your slowest query today and run
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)— find the node where actual rows diverges most from estimated rows, then runANALYZE table_nameon the relevant table.