Aurora Global Database: What It Solves and What It Does Not

Aurora Global Database is frequently evaluated as an active-active multi-region database. It is not. The secondary region is read-only until you explicitly promote it, promotion does not re-point your application endpoints, and the RPO on an unplanned failover is measured in seconds, not zero. Understanding what the product actually delivers — and what it leaves to you — is the only way to size it correctly for a DR or read-scale design.

Situation

Multi-region database architecture sits at the intersection of two pressures: latency-sensitive reads that cross region boundaries unnecessarily, and disaster recovery designs that require tighter RTO/RPO than a daily snapshot gives you. Aurora Global Database is the AWS answer to both, and the marketing framing — “single database spanning multiple regions” — sounds closer to active-active than the implementation actually is.

Engineers evaluating Global Database typically encounter it while building a DR failover plan or routing global reads to a closer region. Both use cases are real. The confusion starts when teams assume they compound into active-active behavior.

The Problem

Aurora Global Database does not detect primary region failure and promote the secondary automatically. Promotion is an API call — manually triggered or triggered by your application logic. The application’s connection string still points at the old primary endpoint after promotion. The database cluster comes up cleanly; your application is still talking to a dead region.

The “sub-one-minute RTO” claim is precise: it covers the time to promote a new primary cluster. It does not include DNS propagation, application reconfiguration, or connection pool drain. The actual application recovery time is longer, and the gap is entirely under your control rather than Aurora’s.

What does Aurora Global Database actually guarantee, where does that guarantee stop, and what does your application need to provide for the rest?

How Aurora Global Database Replicates

Aurora’s replication mechanism is not binlog-based or WAL-shipping-based in the traditional sense. The Aurora storage layer replicates storage-level redo log records directly between regions. According to AWS Aurora documentation, this typically achieves under one second of replication lag using dedicated infrastructure separate from database compute nodes. Because replication does not go through the compute layer, writes on the primary are not slowed by cross-region replication — the storage tier handles it asynchronously.

The secondary cluster can serve reads from its local storage copy. Those reads are up to one second stale. For dashboards, reporting, and non-transactional API endpoints that is fine. For reads that must reflect a just-completed write, it is not.

Planned vs. Unplanned Failover

AWS documents two distinct failover modes with different guarantees.

Managed planned failover is for intentional region migrations: maintenance, a region move, or a DR drill. Aurora coordinates the promotion, waits for the secondary to fully catch up, and promotes with RPO of zero — no data loss. The original primary must be reachable, and the operation takes longer than a forced failover.

Unplanned failover is what you invoke when the primary region has failed. There is no coordination; the secondary region’s data reflects whatever was replicated before the failure. Given sub-one-second typical lag, RPO in practice is low — but it is not zero. AWS documentation states the RPO depends on replication lag at the time of failure.

The promotion is an API call you must issue explicitly. For an unplanned failover:

aws rds failover-global-cluster \
  --global-cluster-identifier my-global-cluster \
  --target-db-cluster-identifier arn:aws:rds:us-west-2:123456789:cluster:my-secondary-cluster \
  --allow-data-loss

After promotion, the secondary cluster becomes the new writer. Your application’s connection string still points at the old primary endpoint — updating that is separate from the promotion step and is your responsibility.

In Practice

The Aurora Global Database user guide documents three patterns worth internalizing before committing to the architecture.

Storage-layer replication means the secondary cluster can be promoted without replaying a long log — a genuine DR advantage over traditional streaming replication, where a lagging replica must finish replay before accepting writes.

Read routing is not automatic. The application must explicitly send reads to the secondary cluster endpoint. Reads on the secondary reflect data up to the current replication lag behind the primary.

Cost includes storage in both regions (a full copy in each) plus cross-region data transfer for replication. For large databases, storage cost effectively doubles. This is rarely in the first-pass sizing estimate.

Where It Breaks

Scenario	What breaks	Why
Application assumes automatic endpoint failover	Application continues targeting the old primary endpoint after promotion	Aurora promotes the cluster but does not update the application’s connection string
Writes needed in both regions simultaneously	Active-active writes are not supported	The secondary is read-only until promoted; there is no multi-primary write path
RPO must be exactly zero on unplanned failure	RPO on unplanned failover is bounded by replication lag, not guaranteed zero	Only managed planned failover provides zero data loss

What to Do Next

Problem: Aurora Global Database does not automatically re-point application traffic after a regional failure, so an untested failover plan typically means manual intervention under pressure during an outage.
Solution: Build and test the full failover path — promotion API call, DNS update or connection-string reconfiguration, connection pool reset — as a runbook that runs end-to-end in a staging environment.
Proof: A successful failover drill where the application resumes writes within your RTO target, with the promotion time and application re-point time measured separately.
Action: This week, find your current RTO target in your DR documentation, then measure how long the non-Aurora steps (DNS propagation, app reconfiguration, connection validation) actually take in your environment. That is your gap.

Situation

The Problem

How Aurora Global Database Replicates

Planned vs. Unplanned Failover

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Aurora Serverless v2: Good Fit, Bad Fit

Aurora Cost Optimization: The Hidden Database Bill

Cloud Database Cost Triage: Storage, IOPS, CPU, Replicas