Search Indexes in Commerce: Why Elasticsearch Is Not the Source of Truth

The fastest way to corrupt a commerce platform is to let the system that finds products become the system that decides what products are true.

Situation

Commerce teams reach for Elasticsearch because the user experience demands it. Product listing pages need faceted filters. Search boxes need typo tolerance, ranking, synonyms, and language-aware tokenization. Merchandising teams need boosted products, curated collections, and category rules. Buyers expect search to feel instant even when the catalog has millions of SKUs.

A relational database is rarely the right serving layer for that experience. The transactional catalog stores products, variants, prices, inventory policies, category assignments, eligibility rules, and publishing state. Search wants something else: a denormalized document shaped for retrieval. One product document might contain title tokens, normalized attributes, category breadcrumbs, brand fields, popularity scores, availability flags, and precomputed price ranges.

That separation is healthy. The operational mistake is forgetting that the search document is a projection.

Elasticsearch is excellent at serving a read model. It is not the canonical catalog. It is not the pricing ledger. It is not the inventory authority. It is not the publishing workflow. It is a derived index optimized for retrieval, and every derived index can be stale, incomplete, or wrong.

The Problem

Search indexes fail in ways that look harmless until they touch money.

A product rename misses the indexer and customers keep seeing the old title. A price update lands in the transactional database but not in search, so listing pages show one price and checkout shows another. Inventory moves to zero, but cached search results continue to present the item as available. A product is unpublished for legal, compliance, or supplier reasons, but remains discoverable because deletion from the index failed. A backfill overwrites newer documents with older snapshots. A retry duplicates a stale event. A partial outage silently creates a gap.

These are not Elasticsearch bugs. They are boundary bugs.

The root cause is usually architectural ambiguity. If services read from Elasticsearch as though it were authoritative, the index becomes part database, part cache, part workflow state, and part operational hazard. Teams then patch individual symptoms: manual reindex buttons, admin scripts, replay jobs, delete queues, and dashboard alerts. Those are useful tools, but they cannot fix the deeper question.

If the search index is allowed to disagree with the commerce system, which one wins?

Source of Truth, Projection of Search

The answer is to make the ownership boundary explicit: transactional systems own facts; search owns retrieval.

In a commerce platform, facts include product identity, publication state, variant structure, price rules, inventory policy, fulfillment eligibility, and compliance status. These belong in systems that provide transactional semantics, durable writes, validation, and auditability.

Search documents are projections built from those facts. They should be disposable. If the index is deleted, corrupted, or rebuilt with a new schema, the business should lose search availability or freshness for a period, not the catalog itself.

flowchart TD
  A[commerce admin — product edits] --> B[catalog database — canonical product state]
  C[pricing service — canonical price state] --> D[event log — durable change stream]
  E[inventory service — canonical availability state] --> D
  B --> D
  D --> F[indexer workers — build search documents]
  F --> G[elasticsearch — retrieval projection]
  G --> H[storefront search — ranked discovery]
  H --> I[product detail page — confirm canonical state]
  I --> B
  I --> C
  I --> E

This architecture has a simple rule: Elasticsearch can help customers discover candidates, but the transaction path must verify canonical state before showing final commitments or accepting an order.

The product listing page may use Elasticsearch to show searchable results. The product detail page can still hydrate critical fields from canonical services or a separately validated read model. Checkout must never trust search for price, availability, eligibility, or purchasability.

That does not mean every request has to fan out to every source system. Mature platforms often introduce additional read models, caches, and materialized views. The point is not that only one database may serve reads. The point is that each derived model must have a declared authority boundary, freshness expectation, rebuild path, and conflict policy.

In Practice

Context: The documented pattern is Command Query Responsibility Segregation: separate the model used to accept writes from the model used to answer reads. In commerce search, the write model is the catalog, pricing, and inventory authority. The query model is the search document.

Action: Treat the search document as a CQRS read model. Build it from committed changes, not from best-effort application side effects. Common implementations use a transactional outbox, change data capture, or a durable event log. The important property is that catalog changes and indexable changes are not split across two unrelated writes where one can commit and the other can disappear.

Result: Search becomes operationally recoverable. If an index mapping changes, rebuild from canonical data. If an indexer falls behind, measure lag and drain the queue. If a worker processes the same event twice, idempotent document writes converge on the same result. If a stale event arrives after a newer one, version checks or monotonic sequence numbers prevent regression.

Learning: The indexer is part of the data plane, not a background convenience. It needs replay, dead-letter handling, schema versioning, observability, and backpressure. A search outage is visible; silent search drift is worse.

Elasticsearch’s own behavior reinforces this design. Documents are searchable after refresh, not necessarily immediately after write. Bulk indexing can partially fail. Distributed systems can retry, reorder, or duplicate work around failures. None of that is surprising; it is exactly why a search index should not be the place where business truth is born.

The known pattern is therefore not “sync database rows into Elasticsearch.” It is “publish durable facts, build disposable projections, and verify money-moving decisions against authority.”

Where It Breaks

Failure mode	What happens	Architecture response
Index lag	Search shows old product data	Expose lag metrics and define freshness budgets
Partial indexing failure	Some products disappear or retain stale fields	Use durable retries, dead-letter queues, and replayable events
Stale overwrite	Older events replace newer documents	Store source version or sequence number in each indexed document
Mapping migration	New search schema cannot read old documents cleanly	Build a new index, backfill, validate counts, then switch alias
Search as checkout input	Customer sees wrong price or availability	Revalidate canonical price and inventory before commitment
Manual index edits	Operators repair symptoms that later get overwritten	Make canonical data the only durable correction path
Product deletion drift	Unpublished items remain searchable	Model publication state explicitly and include deletion events in replay
Backfill overload	Reindexing harms live traffic	Throttle workers and isolate bulk pipelines from interactive search

This design has tradeoffs. It adds infrastructure. It introduces eventual consistency. It forces teams to define ownership rather than letting every service read whatever is convenient. But the alternative is worse: a commerce system where the retrieval layer quietly becomes a second catalog with weaker guarantees and unclear accountability.

The hard part is not writing to Elasticsearch. The hard part is proving that what Elasticsearch serves is a faithful, bounded, and rebuildable projection of the commerce facts.

Good platforms make that proof routine. They compare canonical product counts against indexed counts. They sample documents and validate key fields. They track indexing lag by partition and event type. They test reindexing before emergencies. They keep old indexes until new ones are verified. They design search ranking experiments so they cannot mutate canonical product state.

Most importantly, they keep the user journey honest. Search can rank candidates. Browse can filter projections. Recommendations can suggest products. But product detail, cart, and checkout must converge on the same authoritative answer: is this item sellable, at this price, under these rules, right now?

What to Do Next

Problem: Your search index is probably carrying more authority than intended. Audit every consumer of Elasticsearch and mark which fields are discovery-only versus business-critical.
Solution: Move canonical ownership back to catalog, pricing, inventory, and policy systems. Feed search through durable events, transactional outbox, or change data capture.
Proof: Add drift detection: indexed count versus canonical count, sampled field comparison, index lag by event stream, failed bulk item rates, and stale version rejection.
Action: Make the index disposable. Practice rebuilding it from source data, switching aliases, replaying missed changes, and validating that checkout never depends on Elasticsearch truth.

Situation

The Problem

Source of Truth, Projection of Search

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Staff Engineer's System Design Review: Questions That Expose Real Risk

Designing for Peak Traffic Without Designing for Permanent Waste

Building a Commerce Platform Data Plane: OLTP, Search, Cache, Queue, Warehouse