Why SELECT * Still Hurts Production Systems

SELECT * is not a minor style violation. It is a query that opts out of covering indexes, pulls every TOAST column unconditionally, and defeats columnar storage’s only performance advantage — column pruning. Engineers know the advice, but most have never seen the actual mechanism that makes SELECT * expensive in production. The problem almost always shows up the same way: the query ran fine in development, shipped, then became the top line in I/O bytes as the table grew.

Situation

Applications accumulate columns over time. A users table starts with a dozen fields and grows incrementally — a preferences JSONB column here, a bio TEXT there, an audit field, a feature flag blob. Each migration is routine. The SELECT * queries that read that table are unchanged.

By the time a query shows up in slow query logs, the table has 50 columns and two of them are 40KB per row on average. Development databases rarely catch this because dev data is small and large TEXT or JSONB values are usually short.

The Problem

There are four distinct mechanisms through which SELECT * degrades production workloads.

Covering indexes become useless. PostgreSQL’s index-only scan resolves a query entirely from the index without touching the heap — but only when every output column is present in the index. SELECT * forces a heap fetch for every matching row regardless, turning a fast index-only scan into a random I/O operation per result.

TOAST columns are fetched unconditionally. PostgreSQL stores values larger than roughly 2KB out-of-line in a secondary TOAST table. A TEXT, JSONB, or BYTEA column that exceeds the threshold is fetched separately when accessed. SELECT * includes every column, so every oversized value triggers a secondary read — even when the application uses only two fields from the row.

Schema changes break application code silently. ORM code that maps SELECT * results onto struct fields may corrupt state when a new NOT NULL column is added or columns are reordered. The query succeeds; the struct carries unexpected data.

Columnar systems lose column pruning. Redshift, BigQuery, and DuckDB store data by column. Their foundational I/O optimization is reading only the columns the query names. SELECT * forces reads across every column in the table, with I/O cost proportional to column count.

What does a query that avoids all four problems look like, and what needs to change at the schema and index layer?

Core Concept

PostgreSQL’s index-only scan allows the executor to return results directly from index pages without visiting heap pages at all. For this to work, every column in the SELECT list and WHERE clause must be present in the index.

flowchart TD
    A[Query execution] --> B{All selected columns in index?}
    B -- Yes --> C[Index-only Scan]
    B -- No — SELECT star used --> D[Fetch full row from heap]
    D --> E{Has out-of-line TOAST columns?}
    E -- Yes --> F[Fetch secondary TOAST pages]
    E -- No --> G[Return heap data]

A query like this can use an index-only scan if an index exists on (email, id, name):

SELECT id, name FROM users WHERE email = 'user@example.com';

Change that to SELECT * and the covering index is bypassed. The executor must fetch the full heap row for every match regardless of index efficiency. The practical guidance from PostgreSQL’s documentation is direct: include output columns in the index using INCLUDE, and name only the columns the query needs. SELECT * makes both impossible because the output column list is unbounded.

For EXPLAIN-based verification, EXPLAIN (ANALYZE, BUFFERS) before and after switching from SELECT * to named columns makes the heap fetch cost visible as the difference in Buffers: shared hit counts. The MySQL EXPLAIN post walks through reading query plans systematically — the same principle applies to PostgreSQL’s EXPLAIN ANALYZE output when comparing index-only scan eligibility.

For vector queries, column selection matters in the same way. A query retrieving pgvector embeddings alongside large JSON metadata columns pays the TOAST cost on every result row when SELECT * is used. Selecting only the embedding and the fields the application reads avoids that fetch entirely. Index setup is only half the battle; column selection determines what gets fetched once the index returns its matches.

In Practice

The documented behavior of PostgreSQL’s index-only scan is that it is unavailable when the query output includes columns not present in the index. The PostgreSQL documentation states this explicitly: every column in the query’s target list and WHERE clause must be available from the index. SELECT * prevents this by construction.

The PostgreSQL TOAST documentation describes out-of-line threshold behavior: values are not fetched unless the column is accessed. This means SELECT id, name FROM users genuinely avoids reading oversized metadata values, while SELECT * fetches them for every row regardless of whether the application uses them.

Google’s BigQuery documentation is explicit under query optimization guidance: selecting only needed columns reduces bytes scanned and therefore cost. The documented design of Redshift and DuckDB follows the same principle — column pruning requires a bounded output list. SELECT * removes that bound entirely.

Where It Breaks

Scenario	What breaks	Why
Covering index bypassed	Index-only scan degrades to heap fetch per row	`SELECT *` requires columns the index cannot contain
TOAST column on every row	Seconds of extra I/O per query execution	Large out-of-line values fetched even when the app discards them
ORM struct mapping	Application reads wrong values after schema migration	Positional mapping breaks when columns are added or reordered
Columnar storage full-scan	Query cost proportional to column count instead of query selectivity	Column pruning requires knowing the output columns at parse time

What to Do Next

Problem: SELECT * bypasses covering indexes, unconditionally fetches TOAST columns, and eliminates column pruning — costs invisible in development, expensive in production.
Solution: Name only the columns the application consumes, and build indexes with INCLUDE to cover the output columns needed on frequent read paths.
Proof: Run EXPLAIN (ANALYZE, BUFFERS) before and after switching from SELECT * to named columns — a drop in shared hit buffer counts confirms the heap fetch is no longer happening.
Action: Audit the top 10 queries by I/O bytes in pg_stat_statements this week and identify which use SELECT * on tables containing TEXT, JSONB, or BYTEA columns.

The rule exists not because of style but because the optimizer needs a bounded column list to make cost decisions. Give the optimizer that list and three of these four problems disappear entirely.

Situation

The Problem

Core Concept

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Deadlocks vs Blocking: The Difference Engineers Miss

Database Connection Pooling: Why Apps Kill Databases

Redis Memory Eviction Policies Explained