MySQL EXPLAIN: Reading the Plan Without Guessing
The most common mistake engineers make with EXPLAIN is treating type: ALL as an alarm that requires an index. It is a data point, not a verdict. Whether a full scan is a problem depends on the rows estimate, the Extra flags, and what the optimizer decided to do with the indexes that already exist. Reading the plan systematically takes two minutes.
Situation
Every engineer who has investigated a slow query has seen EXPLAIN output. Most can recognize the column names — type, key, rows, Extra — but not how to read them as a system.
The common workflow is: see type: ALL, add an index. That misses the reason the optimizer chose the plan it chose, and misses the cases where the new index will be ignored anyway. MySQL 8.0 added EXPLAIN ANALYZE, which executes the query and returns actual row counts alongside estimates. The gap between those two numbers is often the real story.
The Problem
Indexes do not guarantee the optimizer will use them. InnoDB’s cost-based optimizer weighs index access cost against cardinality estimates. If those estimates suggest the index returns a large fraction of the table, the optimizer may choose a full scan instead. This behavior is documented: MySQL uses index dive estimates and statistics from INFORMATION_SCHEMA.INNODB_TABLE_STATS to make that call.
When statistics are stale — after bulk loads, large deletes, or fast-growing tables — the optimizer’s row estimates can be wrong by an order of magnitude. A plan that looks safe in EXPLAIN may be running against a table ten times larger.
What does each column actually mean, and how do you read them together to know whether the optimizer’s choice was reasonable?
How to Read EXPLAIN Output
EXPLAIN returns one row per table in the query, in the join order the optimizer chose. The columns that carry diagnostic weight are type, key, rows, and Extra.
The type column describes the access method. From best to worst: const (single-row primary key match), eq_ref (one matching row per join from a unique index), ref (non-unique index lookup), range (bounded index scan), index (full index scan), ALL (full table scan). The useful breakpoint is between range and index — anything at index or ALL with a high rows estimate is worth investigating.
The key column shows which index the optimizer actually chose. If key is NULL and possible_keys lists candidates, the optimizer decided the available indexes were not selective enough to be worth using. That is the cardinality problem — not a missing index.
The rows column is the optimizer’s estimate of how many rows it will examine to satisfy the query. For EXPLAIN ANALYZE (MySQL 8.0+), the output also shows actual rows — the count from the real execution. A large gap between estimated and actual rows means statistics are stale. Run ANALYZE TABLE tablename; to refresh them.
The Extra column carries execution flags. Using filesort means MySQL sorted the result after retrieval — no index covers the ORDER BY, and on large result sets this spills to disk. Using temporary means an internal temp table was created, common with GROUP BY on non-indexed columns. Using index is a positive signal — a covering index served the query without touching table rows.
Reading these together: type: ALL, rows: 4000000, Extra: Using temporary; Using filesort means the optimizer scanned four million rows, built a temp table, and sorted it. That is not a statistics problem — that is a schema problem.
A concrete example with EXPLAIN ANALYZE on MySQL 8.0:
EXPLAIN ANALYZE
SELECT user_id, created_at FROM orders
WHERE status = 'pending' AND created_at > '2022-01-01'\G
-> Filter: ((orders.status = 'pending') and (orders.created_at > '2022-01-01'))
(cost=48213.45 rows=45823)
(actual time=0.112..842.361 rows=12847 loops=1)
-> Table scan on orders
(cost=48213.45 rows=458230)
(actual time=0.089..721.903 rows=458230 loops=1)
The rows estimate (458,230 for the table scan) matches actual rows — statistics are current. But actual time=842ms for a filter that returns 12,847 rows confirms the full scan is the problem: no index covers (status, created_at). Adding idx_status_created (status, created_at) would reduce the scan to an index range lookup.
In Practice
The MySQL 8.0 Reference Manual documents that InnoDB’s optimizer uses cardinality statistics from INFORMATION_SCHEMA.INNODB_TABLE_STATS to choose between an index range scan and a full table scan. EXPLAIN ANALYZE, introduced in MySQL 8.0.18, returns both estimated and actual row counts per step. The manual identifies a large gap between the two as the primary signal for stale statistics — estimated 500, actual 2,400,000 means the plan was optimized for a table that no longer exists.
Where It Breaks
| Scenario | What breaks | Why |
|---|---|---|
| Stale statistics after bulk load | rows estimate is far below actual; optimizer picks a plan sized for the old table | innodb_stats_auto_recalc threshold (10% of rows changed) was not met; run ANALYZE TABLE manually |
| JOIN order surprises | type: ALL appears on a table you expected to be driven by an index | InnoDB’s cost model may reorder joins; the id column in EXPLAIN output shows actual join order |
| Index ignored due to low cardinality | possible_keys lists the index; key is NULL | Column has few distinct values (boolean, status enum); optimizer’s index dive concluded the full scan was cheaper |
What to Do Next
- Problem: Engineers add indexes without confirming the optimizer will use them, because they read
type: ALLwithout readingkey,rows, andExtratogether. - Solution: Treat EXPLAIN output as a system — check
keyfirst, thenrows, thenExtra, before drawing any conclusion about what is wrong. - Proof: Run
EXPLAIN ANALYZEon MySQL 8.0+. If actual rows diverges significantly from estimated rows, the plan is stale — runANALYZE TABLEand re-check before adding any index. - Action: This week, take one slow query your team has been discussing and run
EXPLAIN ANALYZEon it. Readtype,key,rows,Extrain order. Write one sentence describing what the optimizer decided. That sentence is more useful than a blindCREATE INDEX.