Product Catalog Modeling: Relational, Document, Search Index, or All Three
Product catalogs fail when teams treat “the product” as one data shape instead of three competing workloads: correctness, merchandising flexibility, and discovery.
Situation
A catalog begins innocently. There is a products table, a few categories, a price, a description, and an image URL. Then the business asks for variants, bundles, regional availability, marketplace sellers, promotions, localized copy, regulated attributes, and category-specific fields.
Shoes need size and material. Laptops need CPU, RAM, warranty, and energy labels. Groceries need allergens, pack size, substitution rules, and fulfillment temperature. The product catalog stops being a table of products and becomes the contract between commerce, fulfillment, search, analytics, ads, and customer support.
At that point the database question becomes architectural. A relational model gives integrity and joins. A document model gives shape flexibility. A search index gives retrieval behavior that neither of the first two should be forced to emulate.
The Problem
The common failure is picking one model and making it serve all catalog workloads.
A purely relational catalog often starts clean, then accumulates entity-attribute-value tables, nullable columns, category-specific side tables, and migration anxiety. The schema protects invariants, but product teams wait on DDL for every new attribute family.
A purely document catalog moves faster, but correctness gets harder. If price, availability, tax classification, seller state, and compliance flags live as loosely governed blobs, downstream systems have to rediscover which fields are authoritative.
A search-only catalog feels fast until the index becomes the source of truth. Search indexes are optimized for denormalized retrieval, ranking, tokenization, and filtering. They are not designed to be the system of record for transactional correctness.
The core question is not “which database stores products best?” It is: which parts of the product catalog must be correct, which parts must be flexible, and which parts must be discoverable?
Core Concept
The strongest pattern is usually not relational or document or search. It is relational and document and search, with ownership boundaries that prevent each store from pretending to be the others.
flowchart TD
A[merchant tools — catalog edits] --> B[relational core — identity and invariants]
A --> C[document attributes — category shape]
B --> D[change stream — catalog events]
C --> D
D --> E[index builder — denormalized projection]
E --> F[search index — retrieval and ranking]
F --> G[customer experience — browse and search]
B --> H[commerce services — price and availability checks]
C --> I[content services — product detail pages]
The relational core owns identity and invariants: product ID, SKU, variant relationships, seller ownership, lifecycle state, tax classification references, and other fields where duplication or ambiguity creates operational risk.
The document layer owns attribute shape: category-specific specs, localized content blocks, merchandising metadata, and optional fields that change faster than the canonical model. This can be a document database, a JSON column, or a structured object store. The key is governance: the document is flexible, but not lawless.
The search index owns retrieval: tokenized text, facets, ranking signals, autocomplete fields, synonyms, and denormalized category views. It is rebuilt from upstream truth. It can be tuned aggressively because losing or corrupting it should degrade discovery, not corrupt orders.
This split also clarifies write paths. Merchant edits update the system of record. A change stream or outbox emits catalog events. Index builders create projections for search and browse. Customer-facing product pages can read from a precomputed projection, but checkout-critical decisions still revalidate against authoritative services.
In Practice
Context: PostgreSQL documents two catalog-relevant capabilities that are often combined: relational constraints for integrity and jsonb for semi-structured data, including GIN indexes for querying JSON content. The documented pattern is not “put everything in JSON.” It is that relational and semi-structured fields can coexist when the boundary is deliberate. See the PostgreSQL documentation on JSON types and indexing: https://www.postgresql.org/docs/current/datatype-json.html
Action: Keep product identity, variant hierarchy, lifecycle state, and ownership in relational columns and tables. Put category-specific attributes in governed JSON only when they do not define core transactional identity. Validate those JSON documents with application schema checks or database constraints where appropriate.
Result: The catalog can evolve attribute families without turning every new merchandising idea into a schema migration, while preserving relational guarantees where duplicate or inconsistent state would break commerce.
Learning: JSON inside a relational database is useful when it extends a relational model. It becomes a liability when it replaces the model’s authority.
Context: Elasticsearch describes its core strength as search over indexed documents, including full-text search, filtering, aggregations, and relevance scoring. The documented behavior is projection-oriented: documents are indexed for retrieval, not normalized for source-of-truth integrity. See Elastic’s guide to mapping and search behavior: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
Action: Build the search document as a derived catalog projection. Include names, descriptions, category paths, normalized facets, popularity signals, availability hints, and merchandising boosts. Do not make the search document the final authority for price, inventory, seller eligibility, or compliance.
Result: Search can be tuned for relevance and latency without coupling ranking experiments to transactional correctness. If an index build fails, the recovery path is to replay events or rebuild from source, not manually repair business truth inside the index.
Learning: Search indexes are excellent read models. They are poor systems of record.
Context: MongoDB’s public schema design guidance uses product catalogs as a natural fit for document modeling because products in different categories can carry different attribute sets. The documented pattern is flexible representation for heterogeneous entities, not abandoning data ownership. See MongoDB’s data modeling guidance: https://www.mongodb.com/docs/manual/data-modeling/
Action: Use document modeling for product attributes when category diversity is the main source of change. Keep cross-product invariants explicit: identifiers, references, lifecycle state, and integration contracts should remain stable and validated.
Result: Attribute-heavy catalogs avoid brittle table explosions, but downstream systems still receive predictable contracts.
Learning: Document flexibility pays off when the business changes shape faster than the core identity model changes.
Where It Breaks
| Architecture choice | Works well when | Breaks when | Failure mode |
|---|---|---|---|
| Relational only | Catalog shape is stable and invariants dominate | Category attributes change constantly | EAV tables, nullable sprawl, slow schema evolution |
| Document only | Products are heterogeneous and mostly read as whole objects | Checkout correctness depends on embedded mutable fields | Conflicting truth across services |
| Search index only | The problem is discovery and ranking | The index becomes authoritative | Orders use stale or denormalized data |
| Relational plus document | Core identity is stable but attributes vary | JSON fields are unvalidated | Flexible fields become hidden contracts |
| Relational plus document plus search | Multiple workloads need different read shapes | Eventing and rebuild paths are weak | Index drift, stale projections, unclear ownership |
The combined model has real cost. You now own propagation, idempotency, rebuilds, schema versioning, and observability across stores. The win is not simplicity of implementation. The win is operational clarity.
You should be able to answer these questions during an incident:
- Which store is authoritative for this field?
- Can this projection be rebuilt from upstream state?
- What happens if the search index is ten minutes stale?
- Which fields must be revalidated before checkout?
- Which schema changes require backfills?
- Which consumers are pinned to old document versions?
If those answers are unclear, adding more databases will amplify the failure rather than contain it.
What to Do Next
- Problem: Your catalog probably contains multiple workloads hidden behind one noun: product.
- Solution: Separate the relational core, flexible attribute model, and search projection by ownership and failure behavior.
- Proof: Use relational constraints for invariants, governed documents for heterogeneous attributes, and rebuildable indexes for discovery.
- Action: Audit the top twenty catalog fields by authority, freshness requirement, write owner, read path, and rebuild strategy before changing the storage engine.