Caches Do Not Remove Database Load Unless You Design the Miss Path

A cache is not a shield around the database; it is a second traffic control system whose failure mode is often a synchronized stampede back to the database.

Situation

Most production systems add caching after the database becomes visibly expensive. Read latency climbs, connection pools saturate, replica lag grows, and product teams discover that many requests ask for the same objects repeatedly. The obvious response is to place Redis, Memcached, CDN edge storage, or an application-local cache in front of the hot read path.

That response is directionally correct. Caches reduce repeated work when the same value is requested many times within a useful freshness window. They also change the shape of the system. The database is no longer serving every read, but it is now serving cache misses, cache refreshes, cold starts, evictions, invalidations, and retry storms.

The first architecture review usually asks whether the cache hit rate is high enough. The better review asks what happens when the hit rate suddenly drops.

The Problem

A cache hit is the easy path. The hard path begins when the value is missing, stale, evicted, expired, invalidated, or never warmed.

If every application instance handles a miss by immediately querying the database, the cache has only moved the load problem. Under normal traffic, a 95 percent hit rate may look excellent. Under correlated expiration, deployment cold start, regional failover, or key eviction, that same system can convert thousands of concurrent user requests into thousands of identical database queries.

This is why cache-aside implementations often fail under precisely the conditions where the database most needs protection. The cache removes load only when it is warm and healthy. The miss path decides what happens when it is not.

The core question is not, “Should we cache this?” The core question is, “Who is allowed to miss, how fast may they miss, and what happens while the value is being recovered?”

The Answer Is a Governed Miss Path

A resilient cache architecture treats misses as a controlled workflow, not as an exception buried inside a request handler.

flowchart TD
  A[client request] --> B[application read path]
  B --> C{cache lookup}
  C -->|hit| D[return cached value]
  C -->|miss| E[miss coordinator]
  E --> F{refresh already running}
  F -->|yes| G[wait briefly or serve stale value]
  F -->|no| H[acquire refresh lease]
  H --> I[load from database with budget]
  I --> J[write cache with jittered ttl]
  J --> K[return fresh value]
  I -->|budget exhausted| L[serve stale value or fail closed]
  E --> M[miss metrics and admission control]
  M --> N[rate limits and circuit breakers]

The important component is not the cache. It is the miss coordinator.

At minimum, that coordinator should provide request coalescing, so one cache miss per key becomes one database read, not one read per caller. It should enforce a per-key refresh lease so that only one worker repopulates a hot key at a time. It should use bounded wait times so callers do not pile up indefinitely behind a slow database query. It should support stale serving for values where slightly old data is better than taking the system down. It should apply jitter to expirations so hot keys do not all expire at the same second.

The database call itself needs a budget. A miss should not receive unlimited retries simply because the cache missed. Retries on the miss path multiply load exactly when the database is already exposed. Prefer short deadlines, limited attempts, and explicit fallback behavior.

This also means cache keys require ownership. A key is not just a string. It has a freshness contract, a rebuild cost, an invalidation source, and a blast radius. Keys that are cheap to rebuild can expire aggressively. Keys that are expensive to rebuild need warming, stale reads, or asynchronous refresh.

In Practice

Context. Facebook’s published Memcache architecture describes caches as a distributed system with operational problems around consistency, thundering herds, regional topology, and invalidation. The documented pattern is that large-scale caching requires coordination around misses and invalidations, not merely inserting Memcached between application servers and storage.

Action. The Facebook Memcache design uses mechanisms such as leases to reduce stale sets and control concurrent regeneration. A lease lets the cache tell a client that it has permission to compute and fill a missing value. Other clients do not all independently regenerate the same object at full speed.

Result. The documented result is a cache layer that can absorb high read traffic while reducing redundant backend work. The key lesson is not that Memcache is special. The lesson is that the miss path is part of the cache protocol.

Learning. The architectural pattern is request coalescing with ownership of regeneration. Without that ownership, every caller treats itself as responsible for recovery, and the database becomes the coordination mechanism by accident.

A second documented pattern appears in Amazon’s public guidance on caching and service resilience. The Builders Library discusses cache behavior in terms of timeouts, retries, overload, and dependency protection. The relevant lesson is that retries and cache refreshes must be limited by budgets, because uncontrolled recovery traffic can become worse than the original user traffic.

PostgreSQL also illustrates the same point at the storage layer. Its buffer cache improves repeated access to pages already in memory, but a cache miss still becomes physical or operating-system-backed I/O. If many sessions miss on the same expensive query shape, PostgreSQL does not magically make that application-level work disappear. The documented behavior is that caching changes where repeated reads are served from; it does not eliminate the need to control concurrency, query cost, or admission.

The pattern across these systems is consistent: caching is effective when the recovery path is engineered. A cache without miss governance is a performance optimization during calm periods and a load amplifier during incidents.

Where It Breaks

Failure mode	What happens	Design response
Cold start	New instances have empty local caches and all query the database	Warm critical keys and use shared cache before local cache
Correlated expiration	Many hot keys expire together	Add TTL jitter and refresh before expiry
Hot key miss	One popular key triggers many identical database reads	Use per-key leases and request coalescing
Cache outage	All traffic bypasses cache at once	Add database rate limits and fail closed for noncritical reads
Slow database recovery	Misses wait, retry, and consume application threads	Use short deadlines and bounded retry budgets
Over-broad invalidation	One write invalidates too much cached data	Use precise keys and versioned invalidation
Silent cache bloat	Low-value keys evict high-value keys	Add admission control and track hit rate by key class

The uncomfortable tradeoff is that a safer miss path sometimes returns stale data or partial results. That is often the right choice. For many product surfaces, a profile count that is thirty seconds old is better than a database outage caused by thousands of simultaneous refreshes.

The other tradeoff is complexity. A governed miss path adds leases, metrics, deadlines, fallback rules, and operational runbooks. But that complexity already exists in the system. If it is not explicit in the cache layer, it is implicit in the database, the connection pool, and the incident channel.

What to Do Next

Problem: Measure misses as first-class production events, not as the inverse of hit rate. Break them down by key class, caller, latency, database query, and retry count.
Solution: Put a miss coordinator in the read path. Start with per-key request coalescing, refresh leases, TTL jitter, and stale serving for safe data classes.
Proof: Load test cold cache, hot key expiration, cache outage, and database slowdown. The database query rate during each test is the real measure of cache design quality.
Action: Pick the ten most expensive cached objects in the system and write down their freshness contract, rebuild cost, invalidation source, and failure behavior. If those answers are unclear, the cache is not yet protecting the database.

Situation

The Problem

The Answer Is a Governed Miss Path

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Read-After-Write Consistency: The UX Bug That Becomes a Database Bug

Rate Limiting Is a Product Contract, Not Just a Redis Counter

Consistent Hashing: What It Solves and What It Does Not