MVCC Explained Like a Database Engineer

Most engineers know that MVCC means “readers don’t block writers.” What they miss is the operational consequence: those non-blocking reads are paid for with storage, and if you stop collecting the debt, the database starts degrading in ways that look nothing like a concurrency problem.

Situation

MVCC — Multi-Version Concurrency Control — is the concurrency model used by PostgreSQL, MySQL InnoDB, Oracle, CockroachDB, and most other production-grade relational databases. Inside a transaction, the database does not show you the current physical state of the rows; it shows a consistent snapshot as it existed at the moment your transaction started.

Engineers rely on this without thinking about it. The property they care about — “I can run a long analytical query on a busy OLTP table without blocking inserts” — comes directly from MVCC. But few have thought through what has to be true at the storage level for that property to hold.

The Problem

The concrete failure mode is table bloat in PostgreSQL after a heavy UPDATE or DELETE workload. Engineers see a table that is 40 GB on disk with only 8 GB of live data and conclude something is wrong with storage. The actual cause is MVCC: every UPDATE leaves the old version in place; every DELETE marks the row dead without removing it. Old versions accumulate until VACUUM reclaims them.

The less visible failure is more dangerous: a long-running read transaction — a reporting query left open, a replication slot that fell behind — prevents VACUUM from advancing. PostgreSQL can eventually hit transaction ID wraparound, an emergency that takes the cluster offline.

Where is the cost of “free” snapshot isolation actually hidden?

How MVCC Works

When a transaction writes a row, the database does not overwrite the existing bytes. It writes a new version stamped with the writer’s transaction ID, leaving the old version in place. Concurrent readers see the version that was current at transaction start. Snapshot isolation without locking — but two systems store those versions very differently, and the difference shapes every operational concern that follows.

PostgreSQL stores all versions — live and dead — directly in the heap files alongside current rows. UPDATE leaves the old version in the page; DELETE flags it dead but does not remove it. VACUUM (or AUTOVACUUM) scans the heap and marks dead tuples as reclaimable. It cannot advance past any row version that is still visible to an open transaction.

You can inspect the version metadata directly. xmin is the transaction ID that created the row; xmax is the transaction ID that deleted or updated it (0 if the row is live). ctid is the physical location in the heap file:

-- Inspect row versions in PostgreSQL
SELECT xmin, xmax, ctid, id
FROM your_table
LIMIT 10;

After a series of updates, you will see multiple heap entries for the same logical row — old versions with non-zero xmax, new versions with xmax = 0. These are the dead tuples VACUUM is responsible for reclaiming.

MySQL InnoDB keeps only the current version in the clustered index. Old versions go to the undo log; when a reader needs an older snapshot, InnoDB reconstructs it by applying undo entries in reverse. A background purge thread reclaims undo space once no active transaction needs those versions. The same pressure applies: long-running reads block the purge thread.

Oracle uses a dedicated undo tablespace. The undo_retention parameter sets a fixed consistency window — simpler cleanup at the cost of a hard expiry (ORA-01555: snapshot too old).

Database	Where old versions live	Cleanup mechanism	Risk when cleanup stalls
PostgreSQL	Heap files (table data)	VACUUM — explicit or autovacuum	Table bloat, transaction ID wraparound
MySQL InnoDB	Undo log segments	Background purge thread	Undo log growth, purge lag
Oracle	Undo tablespace	Automatic undo management	ORA-01555 snapshot too old

In Practice

PostgreSQL’s MVCC documentation (chapter 13, “Concurrency Control”) states directly that dead tuples are not reclaimed until VACUUM runs, and that VACUUM cannot remove a dead tuple if any transaction older than that tuple is still open — the documented mechanism behind bloat from long-running transactions.

MySQL’s InnoDB documentation (“InnoDB Multi-Versioning”) states that the purge thread deletes undo log records no longer needed by any consistent read, and that history list length — in SHOW ENGINE INNODB STATUS — grows when the purge thread falls behind.

Where It Breaks

Scenario	What breaks	Why
Long-running read in PostgreSQL	Table bloat; VACUUM cannot advance past the open snapshot	PostgreSQL keeps every row version visible to any active transaction
Long-running read in MySQL InnoDB	Undo log grows; purge thread stalls	Purge thread cannot remove records still needed by open transactions
Transaction ID wraparound in PostgreSQL	Cluster enters emergency read-only mode	32-bit XID wraps after ~2 billion transactions; VACUUM must freeze rows before the counter laps

What to Do Next

Problem: Long-running transactions block VACUUM and the InnoDB purge thread, causing table bloat and undo log growth that degrades the database without any concurrency alarm firing.
Solution: Set idle_in_transaction_session_timeout in PostgreSQL; monitor InnoDB history list length in SHOW ENGINE INNODB STATUS.
Proof: In PostgreSQL, pg_stat_activity shows open transactions with state = 'idle in transaction'; in InnoDB, a rising history list length during write traffic is the direct signal.
Action: Run this query on your PostgreSQL instances this week to surface any sessions holding open transactions without actively executing:

SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY duration DESC;

MVCC teaches the same lesson as most database internals: reads that look free are paid for somewhere. Knowing where is what lets you diagnose degradation instead of just observing it.

Situation

The Problem

How MVCC Works

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

CPU vs GPU vs TPU Explained for Database Engineers

CAP Theorem in Operational Terms

Caches, Queues, and Databases: When to Use Each