Order Analytics Pipeline: OLTP, CDC, Warehouse, and Reconciliation Checks

Order analytics does not fail because teams cannot count orders. It fails because the count is computed from a pipeline that silently changed the definition of an order.

Situation

The checkout database is built for correctness at transaction time. It knows whether an order was placed, paid, cancelled, refunded, amended, or partially fulfilled. It enforces constraints close to the write path because the business cannot afford ambiguity when money changes hands.

Analytics asks a different question. Product, finance, supply chain, fraud, and support teams want to ask the same order system questions across time: revenue by channel, cancellation rate by cohort, fulfillment latency by warehouse, refunds by payment method, and operational backlog by region. Those questions do not belong on the primary OLTP database. The workload is wide, historical, concurrent, and exploratory.

The usual answer is a pipeline: OLTP database, change data capture, event log, warehouse staging, modeled facts, and dashboards. On paper this looks clean. In production it becomes a distributed accounting system with a reporting interface. Every retry, schema change, late update, duplicate event, backfill, and timezone decision can alter the number an executive sees.

The Problem

The first failure mode is treating CDC as an analytics model. CDC tells you what changed, not what the business means. An orders row updated from pending to paid to cancelled is a sequence of database facts. Whether that contributes to gross merchandise value, net revenue, cancellation rate, or inventory demand is a modeling decision.

The second failure mode is losing the difference between ingestion correctness and reporting correctness. A connector can be healthy while the warehouse is wrong. The stream can be caught up while the model has duplicated a retry. A dashboard can load quickly while excluding orders whose payment settled after the reporting window.

The third failure mode is relying on row-level tests alone. order_id is not null. order_id is unique. status is in an accepted set. Those checks are useful, but they do not prove the warehouse agrees with the source system over a closed financial window.

The core question is: how do you build an order analytics pipeline where freshness is visible, transformations are replayable, and published numbers are blocked when they cannot be reconciled?

Ledgered Analytics Pipeline

The answer is to treat the pipeline as a ledgered system, not a best-effort data feed. The OLTP database remains the source of record. CDC captures committed changes. The warehouse preserves raw changes before applying business logic. Reconciliation jobs compare source-derived control totals with warehouse-derived totals before analytics tables are published.

flowchart TD
  A[checkout service — writes order transaction] --> B[OLTP database — source of record]
  B --> C[CDC connector — reads commit log]
  C --> D[event log — ordered change stream]
  D --> E[staging tables — append only raw changes]
  E --> F[warehouse models — current order facts]
  F --> G[analytics marts — revenue and operations]
  B --> H[control totals — orders and money by window]
  F --> I[warehouse totals — same windows]
  H --> J[reconciliation checks — count and amount diffs]
  I --> J
  J --> K[alerts — block publish on breach]

This architecture has four hard boundaries.

First, the OLTP schema is not the analytics contract. The source tables are optimized for transaction processing. The analytics contract should be explicit: order lifecycle states, revenue inclusion rules, refund treatment, cancellation semantics, currency normalization, and the timestamp used for each metric.

Second, CDC output is immutable input. Land it before reshaping it. Keep source metadata such as transaction position, operation type, event timestamp, and connector timestamp. A warehouse model should be rebuildable from raw change records and deterministic transformation code.

Third, facts need stable identities. An order fact should be keyed by business identity and versioned by source ordering metadata. If the same change is processed twice, the final model should converge. If an older change arrives after a newer one, the merge logic should not regress state.

Fourth, reconciliation is a release gate. A dashboard refresh is a publish event. Before publishing, compare source and warehouse control totals for closed windows: order count, gross amount, cancelled amount, refunded amount, tax, shipping, and discounts. For open windows, report freshness and lag rather than pretending the number is final.

In Practice

Context

The documented pattern is grounded in systems that already behave like logs. PostgreSQL logical decoding exposes committed database changes from the write-ahead log, and a logical replication slot represents a replayable stream of changes in source order for that slot, according to the PostgreSQL logical decoding documentation. Debezium’s PostgreSQL connector documents source metadata such as transaction id and write-ahead log position in change events, which gives downstream systems material to reason about ordering and replay, as described in the Debezium PostgreSQL connector documentation.

LinkedIn’s original Kafka work is also relevant, not because every order pipeline needs Kafka specifically, but because the public design describes a durable log used for both online and offline consumption of event data. The Kafka paper by LinkedIn engineers documents the architectural move from point-to-point feeds toward a shared log for scalable consumption.

Action

Use CDC to copy committed source changes, not to encode business semantics. Land raw changes into append-only warehouse staging with source ordering metadata intact. Build current-state order facts through idempotent merges keyed by order_id and guarded by source version ordering. Build metric marts from those facts, not directly from connector payloads.

Add a separate reconciliation path. For each closed reporting window, compute source control totals from the OLTP database or a source-faithful replica. Compute warehouse totals from the modeled fact tables. Compare counts and money columns with explicit tolerances. If the difference exceeds tolerance, block the publish step and alert the owning team.

Result

The result is not theoretical exactly-once analytics. The result is observable convergence. If the connector replays records, idempotent merges prevent double counting. If a model change breaks revenue logic, aggregate reconciliation catches the mismatch even when row-level tests pass. If CDC lags, the freshness signal explains why open-window dashboards are incomplete.

This is derived from documented system behavior: PostgreSQL emits committed changes through logical decoding, Debezium carries source position metadata, Kafka-style logs support independent consumers, and warehouse validation frameworks such as Great Expectations include aggregate checks like table row count expectations in their expectations documentation.

Learning

CDC is transport. The warehouse model is interpretation. Reconciliation is evidence. Treating those as separate concerns makes the system easier to operate because each failure has a specific owner and a specific diagnostic path.

When finance says revenue is wrong, the first question should not be whether the dashboard query changed. It should be which invariant failed: source extraction, raw landing, merge ordering, business classification, or aggregate reconciliation.

Where It Breaks

Failure mode	Why it happens	Mitigation
Duplicate orders	Connector retry or warehouse task retry reprocesses the same change	Merge by business key and source position, not by load timestamp
Missing late updates	Dashboard window closes before payment, cancellation, or refund arrives	Separate event time, processing time, and closed financial period
Schema drift	OLTP column changes before warehouse model is updated	Version raw payloads and fail loudly on unknown required fields
Incorrect revenue	Analytics model treats all paid orders as final revenue	Encode gross, net, cancelled, refunded, and recognized revenue separately
Silent CDC lag	Connector is running but behind the source log	Track source position lag and expose freshness per table
False confidence	Row tests pass while aggregates drift	Add reconciliation checks for counts and money by closed window
Expensive backfills	Raw changes were overwritten by current-state tables	Keep append-only staging long enough to replay critical periods
Cross-table inconsistency	Orders, payments, and refunds arrive at different times	Model lifecycle state from all required entities before publishing marts

What to Do Next

Problem: Your dashboard is only as trustworthy as the weakest unverified step between checkout and the warehouse.
Solution: Build a ledgered pipeline: OLTP as source of record, CDC as committed change transport, append-only raw staging, deterministic warehouse facts, and reconciliation gates.
Proof: Require every published order metric to pass source-to-warehouse checks for closed windows, including count and money totals.
Action: Start with one metric that matters, usually daily net revenue. Define its source query, warehouse query, tolerance, owner, alert, and publish-blocking behavior before expanding the pattern.

Situation

The Problem

Ledgered Analytics Pipeline

In Practice

Context

Action

Result

Learning

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Staff Engineer's System Design Review: Questions That Expose Real Risk

Designing for Peak Traffic Without Designing for Permanent Waste

Building a Commerce Platform Data Plane: OLTP, Search, Cache, Queue, Warehouse