AI can copy the shape of a storage design and still miss the contract that makes it correct: a double write buffer is not an extra write path, it is a durability boundary.

Situation

AI coding agents are now good enough to produce plausible database internals patches: new structs, recovery hooks, background workers, tests, and code that compiles. That changes the review problem. The risk is no longer only “does the code build?” The risk is “did the agent preserve the invisible contract between the database, kernel, filesystem, block device, and recovery algorithm?”

The source experiment is a useful failure: a Claude Code prototype attempted to port an InnoDB-style double write buffer into PostgreSQL. The implementation followed the surface pattern. Write page to double write buffer. Write page to the real data file. Reuse the slot. The failure was semantic: PostgreSQL and InnoDB do not share the same I/O model, process model, or recovery trust boundary.

MechanismDefault trust boundaryWhat protects against torn pagesReview question
PostgreSQL full page writesWrite-ahead log, or WAL, flushFirst modified 8KB page image after checkpointIs the WAL image durable before recovery needs it?
InnoDB doublewrite bufferDoublewrite file flushPage copy written before final tablespace overwriteIs the doublewrite copy durable before the destination page can tear?
Naive AI portFunction names and control flowAssumed equivalence between writesDid the patch prove the same crash states are recoverable?

The lesson generalizes beyond databases. AI-generated infrastructure code often calls the right APIs in the wrong contract order.

The Problem

A double write buffer, or DWB, protects a database page from a torn write by writing a complete copy somewhere else before overwriting the page at its final location. InnoDB documents this directly: pages flushed from the buffer pool are written to the doublewrite buffer before their proper locations, so crash recovery can find a good copy if the final page write is torn. MySQL 8.4 documentation names that as the purpose of the feature.

PostgreSQL solves the same class of failure differently. With full_page_writes=on, PostgreSQL writes the entire page to WAL during the first modification after each checkpoint. The PostgreSQL docs are explicit: without that page image, a crash during a page write can leave mixed old and new data, and normal row-level WAL records are not enough to reconstruct the page. PostgreSQL current WAL documentation also warns that turning it off can lead to unrecoverable or silent corruption after system failure.

The bug in the AI-generated design was treating those mechanisms as interchangeable.

Failure pointWhat breaksWhy it matters
write() treated as durablePostgreSQL writes dirty buffers through the operating system page cache; the kernel can accept the bytes before media persistenceA DWB slot reused after smgrwrite() can destroy the only good recovery copy
sync_file_range() treated as fsync()Linux documents SYNC_FILE_RANGE_WRITE as asynchronous and not suitable for data integrity operations; it also does not flush volatile disk write cachesAdvisory writeback is performance plumbing, not a crash recovery guarantee
BgWriter path gets synchronous durability workPostgreSQL’s background writer is tuned around cheap dirty-page writes and checkpoint-spread I/OPer-page DWB fsync turns an amortized background path into a latency amplifier
Full page writes disabled too earlyWAL no longer contains first-dirtied page images after checkpointRecovery must trust a DWB copy that may not actually be durable or current
Slot lifecycle lacks LSN accountingDWB slot reuse is disconnected from destination file fsync progressCrash recovery can observe a stale tablespace page and an overwritten DWB slot

The core question is not “can PostgreSQL be given a DWB?” It is: what additional durability accounting would make a DWB at least as trustworthy as PostgreSQL’s existing WAL full page image boundary?

A Crash-State Contract for Double Write Buffering

The right design starts with crash states, not code generation. If the system crashes at every boundary, recovery must have one complete page image with a known log sequence number, or LSN. Anything less is wishful thinking with structs.

flowchart TD
    Dirty[dirty PostgreSQL buffer — page LSN known] --> WAL[WAL record — optional full page image]
    Dirty --> DWBWrite[DWB slot write — buffered copy]
    DWBWrite --> DWBFlush[DWB file fsync — durable recovery copy]
    DWBFlush --> DataWrite[tablespace write — page cache accepted]
    DataWrite --> DataFlush[tablespace fsync — final page durable]
    DataFlush --> Reclaim[DWB slot reclaim — safe reuse]
    WAL --> Recovery[crash recovery — choose trusted image]
    DWBFlush --> Recovery
    DataFlush --> Recovery

The invariant is narrow:

StateDWB slot reusable?Recovery sourceReason
Before DWB fsyncNoWAL full page imageDWB copy may not exist after power loss
After DWB fsync, before tablespace writeNoDWB or WALDWB copy is durable, destination is old
After tablespace write, before tablespace fsyncNoDWBDestination may be stale or torn
After tablespace fsyncYesTablespaceFinal copy is durable through the filesystem boundary
After checkpoint and slot reclaimYesTablespace plus WAL from checkpointRecovery no longer depends on that DWB slot

That table is the design. The implementation follows from it.

  1. Keep full_page_writes=on while developing the DWB path.

    A prototype that disables full page writes before proving DWB recovery has removed PostgreSQL’s existing safety net. PostgreSQL’s documented default is full_page_writes=on, and the reason is exactly torn-page recovery after OS crashes. The first implementation should run DWB as a redundant mechanism, then compare recovery decisions against WAL.

    Verification: after crash recovery, report every page where WAL full page image and DWB recovery would have chosen different page contents or LSNs.

  2. Treat DWB slot state as a durability state machine.

    A slot is not “free” after the page is copied. It is not free after the destination write(). It is free only after the destination relation file has been synced past the page’s write. That requires at least: relation identifier, fork, block number, page LSN, DWB slot identifier, DWB fsync generation, and destination fsync generation.

    Verification: inject crashes at each transition and assert that no slot with tablespace_fsync_lsn < page_lsn is reused.

  3. Batch fsyncs around files, not pages.

    A naive per-page fsync(dwb_fd) will collapse throughput on ordinary SSDs and will be theatrical on network block devices. The DWB write path needs group commit semantics: append many page copies to DWB storage, issue one durable flush, then schedule destination writes. The destination side also needs file-level fsync grouping by relation segment, because PostgreSQL relations are spread across segment files.

    Verification: expose counters for pages per DWB fsync, relation files per destination fsync batch, p50 and p99 fsync latency, and backend buffer eviction waits.

  4. Move synchronous work out of FlushBuffer().

    FlushBuffer() is the wrong abstraction boundary for the whole protocol. It can mark that a page needs protection, enqueue the copy, and coordinate state. It should not become a per-page durability transaction. PostgreSQL already separates WAL writer, background writer, and checkpointer roles; a DWB design needs a manager that coordinates DWB slots, DWB fsync completion, destination writes, and reclaim.

    Verification: run write-heavy workloads with bgwriter_lru_maxpages, checkpoint_timeout, checkpoint_completion_target, and checkpoint_flush_after visible in logs; confirm backend writes do not spike because DWB workers are saturated.

  5. Make recovery distrustful by default.

    During startup, recovery must validate DWB records by checksum, relation identity, block number, page LSN, and DWB fsync generation. A DWB record without proof of durable completion is a hint, not a recovery source. PostgreSQL page checksums, when enabled, help detect torn pages, but detection is not repair.

    Verification: corrupt DWB records, destination pages, and WAL records independently in test images; recovery must either repair from a proven source or fail loudly.

  6. Test against the actual storage stack.

    PostgreSQL deployments differ by wal_sync_method, filesystem, cloud block device, hypervisor cache mode, RAID controller cache, and mount options. PostgreSQL documents several WAL sync methods, including fdatasync, fsync, open_sync, and open_datasync; Linux is not the whole production universe. The DWB claim is only meaningful on the stack where it is measured.

    Verification: repeat crash-injection tests on the production-like filesystem and block layer, including VM-level kill, host reboot where available, and forced process termination.

In Practice

The public evidence points in one direction: the prototype failed because it copied an algorithm without copying the assumptions that make the algorithm true.

EvidenceTypeEngineering implication
InnoDB documents the doublewrite buffer as a separate area written before pages reach their final data-file positionsPublic documented designThe protection comes from write ordering plus recovery lookup, not from an extra copy alone
PostgreSQL documents full_page_writes as writing the entire disk page to WAL on first modification after checkpointPublic documented designPostgreSQL’s trust boundary is WAL durability, not destination data-file durability
PostgreSQL documents wal_sync_method choices and warns that crash-safe configuration depends on system configurationPublic documented designA DWB replacement must be validated under the configured sync method and storage layer
Linux documents SYNC_FILE_RANGE_WRITE as asynchronous and “not suitable for data integrity operations”System behaviorCode that treats it as a durability boundary is wrong even if smoke tests pass
PostgreSQL checkpoint settings include checkpoint_flush_after, which attempts to push dirty data to storage to reduce later stallsSystem behaviorPostgreSQL already distinguishes writeback pressure from confirmed persistence
JIN’s Claude Code experiment compiled and passed basic smoke tests before semantic review exposed the DWB flawDocumented source experimentBuild success is not evidence of crash-state correctness

The deeper point is that storage correctness is usually hidden behind boring verbs: write, flush, sync, checkpoint, recover. Those verbs are not portable across systems.

write() to a regular file usually means “the kernel accepted bytes.” It does not mean “the bytes survived power loss.” sync_file_range() can start writeback and can be useful for reducing dirty-page backlog, but the Linux man page explicitly separates that from data integrity. fsync() is closer to the boundary PostgreSQL recovery cares about, but even then the real guarantee depends on the filesystem, block device, drive cache behavior, and whether the stack lies about flush completion.

This is exactly where AI-assisted systems work becomes dangerous. The model sees an InnoDB pattern:

InnoDB-looking stepWhat the AI can reproduceWhat it may miss
Copy page to DWBBuffer allocation and file writeWhether the copy is durable before final overwrite
Flush DWBCall a function with “flush” in the nameWhether the function is advisory or a persistence barrier
Write destination pagesmgrwrite() or equivalent callWhether the write reached media or page cache
Reclaim slotFree-list manipulationWhether recovery still depends on that slot
Disable FPWConfig change or branch bypassWhether WAL still has a complete first-touch page image

That is not a PostgreSQL-only lesson. The same failure shape appears when agents generate Kafka consumers without understanding offset commit semantics, Kubernetes controllers without understanding finalizers, S3 pipelines without understanding read-after-write boundaries by operation type, or distributed locks without understanding fencing tokens. The API name is the shallow part. The recovery contract is the system.

For this specific DWB design, I have not run the patch at production scale personally. The documented failure mode is enough to reject the architecture as described: if a DWB slot is reused after a buffered destination write but before a confirmed destination fsync, a crash can leave no durable complete image outside WAL. If full page writes have also been disabled, PostgreSQL’s documented repair mechanism has been removed.

The most deceptive benchmark would be a clean-shutdown write throughput test. It might show lower WAL volume and acceptable latency because it never exercises the crash boundary. A correct benchmark has to kill the database and the machine at controlled points: before DWB fsync, after DWB fsync, after destination write, before destination fsync, after destination fsync, and during checkpoint. Then it has to verify page checksums, page LSNs, WAL replay behavior, and DWB reclaim metadata. Anything else is testing formatting.

Where It Breaks

Failure modeTriggerFix
DWB slot reused too earlySlot freed after smgrwrite() or sync_file_range() instead of after destination fsync()Track destination fsync generation per relation segment and reclaim only when tablespace_fsync_lsn >= page_lsn
WAL safety removed before DWB is provenfull_page_writes=off during prototype or benchmark runsRun DWB in shadow mode first; compare recovery choices against WAL full page images
BgWriter stalls under durability workPer-page DWB fsync inside dirty buffer eviction pathUse DWB workers, group commit, and file-level batching outside the critical buffer eviction path
Checkpoint I/O becomes spikyDWB backlog prevents pages from becoming safely reclaimable before checkpoint pressure risesCoordinate DWB manager with checkpointer progress and expose backlog metrics tied to checkpoint cycles
Advisory flush mistaken for crash safetyLinux sync_file_range() or PostgreSQL writeback hints treated as persistenceReserve advisory writeback for latency smoothing; require fsync, fdatasync, or platform-equivalent durability boundary
Storage stack changes invalidate assumptionsMoving from local NVMe to EBS, Azure managed disks, GCP Persistent Disk, ZFS, ext4, XFS, or a controller with volatile cacheCertify the crash matrix per production stack and keep the result with the deployment profile
Recovery accepts stale DWB recordsDWB metadata lacks relation identity, block number, checksum, page LSN, or fsync generationValidate DWB records as recovery artifacts; reject ambiguous records loudly
Benchmark hides corruptionTests use clean shutdown, process kill only, or no filesystem fault injectionAdd power-loss style crash testing and page verification after replay

What to Do Next

  • Problem: AI-generated systems code can preserve code shape while breaking the durability, scheduling, and recovery contracts underneath it.
  • Solution: Review infrastructure patches by crash-state matrix first, then by code diff.
  • Proof: A PostgreSQL DWB design is not credible until every page state between DWB write, DWB fsync, destination write, destination fsync, checkpoint, and slot reclaim has a verified recovery source.
  • Action: This week, take one AI-generated infrastructure patch and write its hidden contract table: API call, assumed guarantee, actual guarantee, failure if the assumption is false.

The hard part of storage engineering is not making the second write happen; it is knowing exactly which copy the system is allowed to trust after the lights come back on.