The easiest way to break an event-driven system is to treat every message as the same kind of message.

Situation

Most Azure architectures eventually need asynchronous communication. A checkout service needs to tell fulfillment to reserve inventory. A telemetry gateway needs to ingest device readings. A fraud model needs a historical stream so it can be replayed after a new feature is deployed. A billing workflow needs a command to be processed once, or at least with enough idempotency that retry does not create a second charge.

Azure gives teams several messaging services, but two are frequently confused: Azure Service Bus and Azure Event Hubs. The names are close enough that many diagrams reduce them to generic boxes labeled “queue” or “stream.” That is where the architectural damage starts.

Service Bus is a brokered enterprise messaging system. It is designed for high-value messages, queues, topics, dead-lettering, duplicate detection, sessions, deferral, scheduled delivery, and transactional workflows. Event Hubs is an event ingestion and streaming service. It is designed for partitioned append-style ingestion, many consumers, retention, replay, telemetry, and downstream analytics.

The difference is not cosmetic. It is the difference between a command that asks a specific thing to happen and an event stream that records what happened so multiple readers can interpret it independently.

The Problem

The operational failure usually appears after success. A system starts with low volume, one consumer, and one happy path. A queue holds order events. A worker drains them. Everything looks fine.

Then the system grows. Analytics wants the same data. Machine learning wants backfills. Finance wants audit reconstruction. Support wants to replay a bad day after a bug fix. Operations wants failed business commands isolated from poison telemetry. Suddenly the original design has to answer questions it was never built to answer.

If Service Bus was used as the event log, replay is painful. Messages are consumed and removed from the active queue. Dead-letter queues help with failed processing, not normal historical reconstruction. You can add logging, but now the log is a side effect rather than the source of replay.

If Event Hubs was used as the command queue, a different class of failure appears. Consumers must manage offsets and idempotency. A slow or failed command processor does not naturally isolate one bad business message into a dead-letter queue. Per-command workflows such as scheduling, duplicate detection windows, and sessions are not the center of the model.

The question is not “which service is better?” The question is: which failure mode are you choosing to make cheap?

Core Concept

Use Service Bus when the publisher expects work to be done. Use Event Hubs when the publisher is recording a fact into a stream that may be read many times.

flowchart TD
  A[application service — business decision] -->|command| B[Service Bus queue — work contract]
  B --> C[worker — execute action]
  C --> D[database — state change]
  D -->|fact emitted| E[Event Hubs — append stream]
  E --> F[analytics consumer — independent offset]
  E --> G[model training — replay window]
  E --> H[capture storage — historical archive]
  B --> I[dead letter queue — failed commands]

The command path is narrow and accountable. A message such as ReserveInventory or SendInvoice has an intended handler and a business consequence. The system cares about retries, poison messages, ordering within a business key, duplicate sends, and operator repair. Service Bus gives the architecture places to express those concerns.

The event path is broad and historical. A fact such as OrderPlaced or DeviceReadingAccepted may have many consumers, some of which do not exist yet. The publisher should not know which analytics job, alerting rule, warehouse load, or feature pipeline will read it. Event Hubs gives the architecture partitioned ingestion, consumer groups, retention, and replay semantics.

The design rule is simple: commands are obligations; events are evidence.

That rule also clarifies naming. A message named CreateCustomer belongs on Service Bus because it asks a consumer to perform work. A message named CustomerCreated belongs on Event Hubs because it records that work already happened. A message named ProcessOrderEvent is a smell because it hides the contract. Is the system asking for processing, or publishing history?

In Practice

Context: Microsoft’s own Azure messaging comparison frames Service Bus as “high-value enterprise messaging” for cases like order processing and financial transactions, while Event Hubs is positioned as a big data pipeline for telemetry and distributed data streaming. That is a documented product boundary, not a stylistic preference. See Microsoft’s comparison of Event Grid, Event Hubs, and Service Bus.

Action: Put business commands on Service Bus queues or topics. Use queues when one logical handler owns the work. Use topics and subscriptions when multiple bounded contexts need filtered copies of the command-like message. Enable dead-letter handling, duplicate detection where resend ambiguity matters, and sessions when ordering must be preserved for a business key. Microsoft’s Service Bus documentation explicitly calls out features such as dead-lettering, duplicate detection, sessions, transactions, and scheduled delivery as part of the brokered messaging model.

Result: The operational surface matches the failure. A poison invoice command can be moved to a dead-letter queue, inspected, corrected, and resubmitted. A duplicate send caused by a timeout can be absorbed if the MessageId is stable within the detection window. A sequence of commands for the same aggregate can be serialized through sessions. These are command-processing concerns, and they should be visible in the broker.

Learning: Service Bus is not a durable analytics log. Its value is controlled delivery of work. Treating it as the permanent event store makes replay an afterthought.

Context: Event Hubs documents a partitioned consumer model and supports retention and replay of telemetry and event stream data. It also provides Capture, which writes streaming data to Azure Blob Storage or Azure Data Lake Storage on time or size intervals. See Microsoft’s Event Hubs documentation on Capture.

Action: Publish immutable facts to Event Hubs after the source-of-truth state change commits. Assign partition keys deliberately, usually by entity or tenant when per-key ordering matters. Give each independent workload its own consumer group. Use Capture when the stream must feed both real-time consumers and batch reconstruction.

Result: Replay becomes a normal operation. A consumer can rebuild projections from retained events. A model pipeline can reprocess the same historical stream after code changes. A warehouse loader can lag without blocking a fraud detector. The stream is not depleted by one reader because each consumer group tracks its own progress.

Learning: Event Hubs is not a command broker. Its value is high-throughput ingestion and independent consumption. If each event requires individual business repair, dead-letter triage, and workflow control, the design is asking a stream to behave like a queue.

Where It Breaks

Failure modeService Bus biasEvent Hubs bias
A payment command times out after sendUse stable message IDs and idempotent handlersProducer uncertainty becomes consumer logic
One message always crashes the workerDead-letter and repair the specific commandConsumer must skip, park, or handle offset carefully
Three systems need the same historical factsTopics help current subscribers, but replay is limitedConsumer groups and retention fit the requirement
Analytics needs to rerun last week’s dataRequires separate audit storageReplay retained stream or read captured files
Ordering matters for one customerSessions can serialize by keyPartition key preserves order only within a partition
Millions of telemetry readings arrive per secondUsually the wrong cost and throughput shapeDesigned for streaming ingestion
A human operator must correct failed workStrong fit through DLQ workflowsMust be built outside the stream
A new consumer is added months laterNeeds historical store elsewhereCan replay if retention or capture was designed

The dangerous middle ground is pretending one service can erase the distinction. You can build replay around Service Bus by writing every message to storage before sending it. You can build command repair around Event Hubs by adding poison-event stores, skip lists, and custom retry policies. Sometimes those choices are justified. But they should be conscious extensions, not accidental compensations for a wrong primitive.

A robust Azure architecture often uses both. Service Bus carries work that must be completed. Event Hubs carries facts that must be observed, replayed, and analyzed. The boundary between them is usually the database commit. Before the commit, the system is coordinating intent. After the commit, it is publishing evidence.

What to Do Next

Problem: Audit every asynchronous message name. If it is imperative, such as CalculateTax, ShipOrder, or SendEmail, classify it as a command. If it is past tense, such as TaxCalculated, OrderShipped, or EmailSent, classify it as an event.

Solution: Route commands through Service Bus and facts through Event Hubs. Keep handlers idempotent on both sides, but let the platform own the failure mode it was designed to expose.

Proof: Verify the design with operations questions. Where does a poison command go? How is duplicate send handled? How does a new analytics consumer replay history? How does a backfill avoid triggering business actions twice?

Action: Draw the command path and replay path as separate flows. If one arrow is carrying both obligation and evidence, split it before the system grows around the mistake.