S3 Event Architectures: Durable, Cheap, and Easy to Misorder

The dangerous part of S3 event processing is not losing the file. It is believing the event stream tells the same story as the bucket.

Situation

S3 has become the default landing zone for modern data systems. Logs, partner drops, ML features, media uploads, CDC exports, batch handoffs, and compliance artifacts all tend to arrive as objects before they become database rows, search documents, thumbnails, embeddings, or warehouse partitions.

That makes S3 event notifications attractive. They are cheap to operate, easy to wire into Lambda, SQS, SNS, or EventBridge, and close enough to the storage layer that teams treat them as the natural trigger for downstream work.

The architecture usually starts cleanly: object arrives, event fires, worker processes object, state advances. For low-volume systems, that model can survive for a long time.

Then retries happen. A user overwrites the same key. A batch job emits the same partition twice. A Lambda timeout causes redelivery. A downstream database accepts an older transformation after a newer one already committed. The event pipeline still looks healthy, but the materialized state is wrong.

The Problem

S3 event notifications are a notification mechanism, not a serialized change log.

AWS documents S3 event notifications as at-least-once delivery. That means duplicate events are part of the contract, not an outage. S3 event records also include a sequencer value for PUT and DELETE operations, but that value is only useful for comparing events for the same object key. It is not a global ordering primitive across a bucket, prefix, tenant, or workflow.

The failure mode is subtle because the infrastructure remains green. SQS depth returns to zero. Lambda invocations succeed. The object exists. Dashboards show throughput. But one of three things has happened:

The same object was processed more than once.
An older event overwrote the result of a newer event.
A downstream aggregate assumed cross-object ordering that S3 never promised.

The core question is: how do you keep S3’s durability and cost advantages without pretending its event notifications are a database log?

The Answer Is a Versioned Intake Ledger

Treat S3 as the durable payload store, but put an explicit intake ledger between object events and business state. The ledger records object identity, version identity when available, event identity, sequencer, processing status, and the latest accepted state transition.

That ledger is the system of record for processing decisions. Workers may be stateless. Events may duplicate. Queues may redeliver. But state changes become conditional writes against the ledger, not blind writes into downstream systems.

flowchart TD
  A[S3 bucket — object writes] -->|event notification| B[SQS queue — durable buffer]
  B -->|batch delivery| C[worker pool — idempotent consumers]
  C -->|read object metadata| D[S3 object — payload and version]
  C -->|conditional write| E[intake ledger — key state and sequencer]
  E -->|accepted transition| F[downstream processor — transform and index]
  F -->|commit result| G[serving store — queryable state]
  F -->|failure record| H[dead letter queue — replay inspection]
  H -->|manual replay| B

The important design choice is that the worker does not ask, “Did I receive an event?” It asks, “Is this event still allowed to advance processing for this object?”

For a single object key, the ledger can compare the incoming event’s sequencer against the last accepted sequencer. If the incoming value is older, the worker records it as stale and stops. If it is equal to a previously completed event, the worker records it as duplicate and stops. If it is newer, the worker claims the transition with a conditional write.

For versioned buckets, include the S3 version ID in the ledger key or in the ordering decision. For unversioned buckets, assume overwrites can collapse object history. If the downstream result must correspond to the exact bytes that triggered the event, versioning is not optional.

This changes the architecture from event-driven execution to event-driven reconciliation. The event wakes the system up. The ledger decides what work is valid.

In Practice

Context: AWS documents that S3 event notifications can be delivered more than once and that ordering is not guaranteed across independent object changes. AWS also documents the sequencer field as a way to determine ordering for PUT and DELETE events on the same object key, with hexadecimal comparison after padding shorter values on the left.

Action: The documented pattern is to make consumers idempotent and store enough processing state to reject duplicates or stale events. A DynamoDB table is a common fit because conditional writes can atomically claim a key, compare versions, and prevent an older event from replacing a newer decision. The store does not need to hold object bytes; it holds processing authority.

Result: Duplicate notifications become cheap no-ops. Redelivered queue messages can be retried without fear of double committing. Older events for the same object key can be detected before downstream work runs. The downstream database, index, or warehouse table receives only accepted transitions rather than every notification S3 emits.

Learning: S3 events are excellent triggers but weak ordering boundaries. The correct abstraction is not “S3 sent me the next change.” It is “S3 told me something changed, and now I must reconcile whether this change is current, duplicate, stale, or unprocessable.”

This is also why queues alone do not solve the problem. SQS gives buffering, retry control, visibility timeouts, and dead-letter handling. FIFO queues can order within a message group, but S3 event notification architectures often still have to choose the right grouping key and handle duplicate delivery. If the business invariant is per-object correctness, the idempotency boundary belongs at the object key and version level. If the invariant is per-account, per-partition, or per-dataset correctness, the ledger must model that explicitly.

The same principle applies to EventBridge. EventBridge is useful when routing, filtering, fanout, archive, and replay matter. It does not remove the need for idempotent consumers. Replay is only safe when consumers can distinguish “run this again because we asked” from “advance state again because we forgot.”

Where It Breaks

Design choice	What works	Where it breaks	Mitigation
Direct S3 to Lambda	Very low operational overhead	Duplicate events can double write downstream state	Add idempotency keys and conditional commits
S3 to SQS to workers	Better buffering and retry control	Queue order is not the same as object correctness	Use a ledger keyed by object and version
S3 to EventBridge	Flexible routing and replay	Replay can reapply old business actions	Make processors reconciliation based
Sequencer only	Useful for same-key PUT and DELETE order	Not global across keys or prefixes	Scope comparisons to one object key
Last write wins	Simple for derived views	Older events can overwrite newer results	Compare sequencer or version before commit
No bucket versioning	Lower storage and mental overhead	Overwrites can hide the bytes that caused an event	Enable versioning when exact payload lineage matters
Downstream idempotency only	Protects one target system	Other side effects may still duplicate	Centralize acceptance before side effects
Dead letter queue only	Preserves failed messages	Does not classify stale or duplicate work	Store terminal reason in the ledger

What to Do Next

Problem: Audit every S3-triggered workflow for hidden ordering assumptions. Look for object overwrites, partition rewrites, retry paths, fanout consumers, and downstream writes that do not check whether the triggering event is still current.
Solution: Add an intake ledger with conditional writes. Store bucket, key, version ID when present, event name, sequencer, processing status, attempt count, timestamps, and downstream commit identity.
Proof: Test duplicate delivery, delayed delivery, overwrite races, worker timeout, partial downstream failure, dead-letter replay, and manual reprocessing. The expected result is not “the event ran once.” The expected result is “only the valid state transition committed.”
Action: Keep S3 for durable payloads and cheap storage, but stop using its events as a serialized source of truth. Use events to trigger reconciliation, use the ledger to authorize work, and use downstream systems only after the event has proven it is current.

Situation

The Problem

The Answer Is a Versioned Intake Ledger

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Staff Engineer's System Design Review: Questions That Expose Real Risk

Designing for Peak Traffic Without Designing for Permanent Waste

Building a Commerce Platform Data Plane: OLTP, Search, Cache, Queue, Warehouse