Pub/Sub Ordering Keys: The Detail That Decides Your Event Model

Ordering is not a checkbox on a queue. It is the boundary where your event model admits which facts must move together, which facts can move independently, and which failures are allowed to stall the system.

Situation

Teams usually adopt Pub/Sub because they want distance between producers and consumers. Orders, payments, inventory reservations, invoices, model updates, and notification workflows all become events. The topic becomes a shared integration surface instead of a direct call graph.

That move works until the business starts depending on sequence. A customer profile must not apply email_changed before customer_created. A payment projection must not see captured before authorized. A search index must not publish version 42 and then overwrite it with version 41. These are not messaging problems in isolation; they are state reconstruction problems.

Google Cloud Pub/Sub gives you ordering keys for this exact class of issue. The documented guarantee is scoped: messages with the same ordering key can be delivered in order when message ordering is enabled on the subscription, while messages with different keys have no expected order. The publisher guidance also says the guarantee applies when publishes for a key happen in the same region and notes that multiple publishers using the same key may need coordination if they require strict publishing order. See the Pub/Sub ordering documentation and publisher guidance.

That sounds small. It is not. The choice of ordering key becomes the event model.

The Problem

The common failure is choosing an ordering key that reflects today’s handler instead of tomorrow’s invariant.

If you key by customer_id, every customer event for that customer is serialized. That is easy to reason about, but one slow customer workflow can build a local backlog. If you key by order_id, order processing scales better, but customer-level projections must tolerate interleaving across orders. If you key by aggregate type, you have probably built a global bottleneck with better branding.

The failure mode is subtle because the system works under normal load. Then one message fails, an acknowledgment deadline expires, a subscriber restart shifts affinity, or a hot key receives a burst. Pub/Sub documents that redelivery of a message can trigger redelivery of subsequent messages for the same ordering key, even messages already acknowledged. It also documents that push subscriptions allow only one outstanding message per ordering key, which makes hot keys especially visible.

So the question is not “should we enable ordering?”

The question is: what is the smallest domain boundary inside which reordering would corrupt meaning?

The Ordering Key Boundary

An ordering key should name the consistency boundary of a stream, not the routing preference of a worker. Treat it as the unit of replay, delay, redelivery, and operational blame.

flowchart TD
  A[producer — domain event] --> B[choose ordering boundary]
  B --> C[customer stream — customer facts]
  B --> D[order stream — order facts]
  B --> E[inventory stream — sku facts]
  C --> F[ordered subscription — customer projection]
  D --> G[ordered subscription — fulfillment workflow]
  E --> H[ordered subscription — stock ledger]
  F --> I[idempotent handler — version check]
  G --> I
  H --> I
  I --> J[materialized state — replayable]

The diagram hides an important rule: the ordering key is not a database lock. It does not make two independent aggregates globally consistent. It only gives consumers an ordered lane for messages that share the key. If the invariant crosses keys, the architecture needs a second mechanism: a transaction before publishing, a saga coordinator, a projection that can reconcile late facts, or a durable workflow engine.

A good ordering key has three properties.

First, it maps to a real domain invariant. order_id is good when the only invalid sequence is inside one order. tenant_id is dangerous when tenants vary wildly in traffic. event_type is almost always wrong because it groups unrelated entities while separating related facts.

Second, it has enough cardinality to distribute work. Pub/Sub explicitly says ordering keys are not equivalent to partitions and are expected to have much higher cardinality than partition-based systems. That is a design hint: do not import Kafka partition thinking directly. Kafka’s documentation describes a partition as an ordered append-only sequence and says total order exists within a partition, not across partitions. Pub/Sub ordering keys let you express many more logical lanes without predeclaring a fixed partition count. See the Apache Kafka introduction.

Third, it makes failure containment acceptable. If a bad message blocks subsequent messages for the same key, is that the right blast radius? If the answer is no, the key is too broad or the handler is doing work that belongs behind another queue.

In Practice

Context: Google Cloud documents that ordered delivery depends on publishing related messages with the same ordering key, enabling ordering on the subscription, and keeping publishes for a key in the same region. It also documents that empty ordering keys are unordered and that ordering is preserved per subscription, not magically across every consumer view.

Action: Model the key from the aggregate that owns the transition. For an order lifecycle, use order_id. For a customer profile projection, use customer_id. For a ledger, use the account or ledger stream identifier. Then make the handler idempotent with an event id and, when possible, a monotonic version. Ordering reduces the number of states the handler must tolerate; it does not remove retries, duplicate delivery, or replay.

Result: The documented pattern is a set of independent ordered lanes. A failure in order A does not require pausing order B. A customer projection can rebuild one customer’s state without demanding global topic order. Subscriber concurrency scales with key cardinality, while correctness remains local to the domain boundary.

Learning: Ordering keys are a schema decision. They belong in design review with aggregate boundaries, idempotency rules, dead-letter policy, and regional publishing topology. If the key is changed later, consumers may need to rebuild state because the event stream’s ordering semantics changed underneath them.

Where It Breaks

Failure mode	Why it happens	Design response
Hot key backlog	One key receives disproportionate traffic, and callback work for that key must complete in order	Narrow the key, split the aggregate, or move expensive side effects behind another asynchronous step
Cross-key invariant	Two streams need a single ordered truth, but Pub/Sub only orders within one key	Use a transactional source of truth, saga coordination, or reconciliation logic
Multi-region publishers	Publishes for the same key enter Pub/Sub through different regions	Pin publishers for ordered streams to a locational endpoint or add publisher coordination
Redelivery surprise	A failed or expired acknowledgment can cause later messages for the same key to be redelivered	Make handlers idempotent and track processed event ids or versions
Dead-letter ambiguity	Dead-letter forwarding is best effort and may not preserve the same ordering assumptions	Treat dead-letter topics as repair queues, not as ordered continuations of the main stream
Push subscription latency	Push allows only one outstanding message per ordering key	Prefer pull or streaming pull for high-volume ordered streams

The hardest case is not technical; it is semantic. Product teams often ask for “events in order” when they mean “state must never go backwards.” Those are different requirements. Ordered delivery helps with the first. The second needs version checks at the write boundary.

What to Do Next

Problem: Identify every consumer that would produce incorrect state if two events arrived in the wrong order.
Solution: Assign ordering keys to the smallest aggregate boundary that protects that invariant.
Proof: Verify the design against documented Pub/Sub behavior: same key, ordering-enabled subscription, same-region publishing, idempotent processing, and explicit redelivery handling.
Action: Add the ordering key to the event contract, test replay with duplicated messages, and monitor backlog by key shape before calling the model production-ready.

Situation

The Problem

The Ordering Key Boundary

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Staff Engineer's System Design Review: Questions That Expose Real Risk

Designing for Peak Traffic Without Designing for Permanent Waste

Building a Commerce Platform Data Plane: OLTP, Search, Cache, Queue, Warehouse