Most teams do not outgrow Cloud SQL because they need a more interesting database. They outgrow it when the failure domain of a single primary stops matching the business contract.

Situation

AttributeCloud SQLCloud Spanner
ArchitectureSingle primary, optional replicasDistributed, multi-region native
Write scalingPrimary is the ceilingHorizontal by key design and split routing
Read scalingCross-region replicas (async)Global reads from nearest replica
ConsistencyStrong within regionExternally consistent globally (TrueTime)
FailoverManaged event, HA standby in secondary zone (~60s)Built-in; no promotion event
Engine compatibilityPostgreSQL, MySQL, SQL ServerSpanner SQL + PostgreSQL-compatible API
Schema changesStandard DDLOnline schema changes, fully managed
Starting costLowSignificant base cost (minimum 1 processing unit)
Choose whenRegional system, standard engine tooling neededGlobal writes, distributed consistency, horizontal scale

The usual database decision starts too low in the stack. Teams compare PostgreSQL compatibility, MySQL familiarity, query syntax, managed backups, pricing pages, and migration tooling. Those details matter, but they are rarely the real decision between Cloud SQL and Cloud Spanner.

Cloud SQL is a managed relational database service for engines teams already know: PostgreSQL, MySQL, and SQL Server. Its operating model is familiar: one writable primary, optional replicas, managed backups, maintenance windows, and high availability inside the constraints of a traditional database architecture.

Cloud Spanner is a distributed relational database. It is built for horizontal scale, synchronous replication, strong consistency, and multi-region availability. Its operating model is less familiar because the database is not a single machine with replicas attached. It is a distributed system that happens to expose SQL and transactions.

That difference changes the architecture conversation. The question is not “which one is better?” The question is whether your system can survive the operational shape of a primary database.

The Problem

Cloud SQL works extremely well when the write path fits on a primary, the application can tolerate regional recovery behavior, and scaling pressure is mostly read-heavy. In that world, replicas absorb analytics and reporting, indexes are tuned, connection pools are sized, and vertical scaling buys time.

The trouble begins when the application contract quietly becomes distributed while the database contract stays centralized.

A checkout system wants writes accepted during regional impairment. A financial ledger wants globally ordered transactions. A SaaS control plane wants tenant placement across regions without writing custom shard routing. A mobile backend wants low-latency reads from multiple continents but cannot allow stale business invariants. A marketplace wants inventory decrements, payment state, and fulfillment reservations to commit consistently even as traffic shifts between regions.

Teams often respond by building the missing distribution layer above Cloud SQL. They introduce application-level sharding, dual writes, queue-based reconciliation, read-your-writes exceptions, regional failover procedures, and increasingly complicated runbooks. The database remains familiar, but the system becomes less honest. The hard part moved into application code.

So the real question is: do you need a managed relational database, or do you need the database itself to own distributed consistency and failure recovery?

The Real Decision Boundary

The clean decision boundary is the write contract.

Use Cloud SQL when the system has a natural primary region, write throughput is within the practical limits of a single primary, and failover can be treated as an operational event. Use Cloud Spanner when the write contract is distributed, the data model must scale horizontally, and consistency across failure domains is part of the product requirement rather than an optimization.

flowchart TD
    A[database decision — start with failure contract] --> B[Cloud SQL — primary database architecture]
    A --> C[Cloud Spanner — distributed database architecture]

    B --> D[single writable primary — familiar operations]
    B --> E[read replicas — scale read paths]
    B --> F[regional HA — managed failover event]

    C --> G[synchronous replication — database owned consistency]
    C --> H[horizontal splits — scale write paths]
    C --> I[multi-region topology — failure domain in design]

    D --> J[best fit — monoliths and regional services]
    E --> J
    F --> J

    G --> K[best fit — ledgers and global control planes]
    H --> K
    I --> K

Cloud SQL’s advantage is operational simplicity. You get standard engines, deep ecosystem support, straightforward local development, and a migration path that most engineers understand. If your bottleneck is schema design, query performance, connection management, or basic high availability, Cloud SQL is usually the sharper tool.

Cloud Spanner’s advantage is removing a category of application-owned distributed systems work. It gives up some engine-specific compatibility and some familiar tuning knobs, but it replaces them with a database architecture designed around replication, partitioning, and strong consistency. That trade is worth making only when the system’s correctness depends on it.

The mistake is choosing Spanner as an expensive scaling talisman. Spanner does not fix unclear ownership boundaries, unbounded transactions, careless indexes, or chatty request paths. It rewards teams that model access patterns deliberately. Poor key design can create hot ranges. Cross-region writes still pay physics. Distributed transactions are powerful, not free.

The opposite mistake is staying on Cloud SQL after the architecture has already become distributed. Once teams are coordinating shards, replaying outboxes, reconciling duplicate writes, and maintaining regional promotion playbooks, they are already paying the complexity cost. They are just paying it in application code, incident response, and human judgment.

In Practice

Context: Google’s Spanner paper, “Spanner: Google’s Globally-Distributed Database,” documents the core pattern: a database designed to distribute data across datacenters while still supporting externally consistent transactions. The important lesson is not that every company needs global SQL. The lesson is that once correctness spans datacenters, the transaction protocol and clock uncertainty become first-class architecture concerns.

Action: Spanner exposes a model where replication and transaction ordering are part of the database contract. Google’s public documentation describes TrueTime and external consistency as mechanisms for making transaction order match real-time ordering. That is a database-level answer to a problem many teams otherwise approximate with queues, timestamps, locks, and compensating jobs.

Result: The documented pattern is simpler application reasoning at the cost of a more specialized database architecture. Application code can rely on strong consistency guarantees instead of encoding a large amount of regional coordination logic itself. The tradeoff is that schema design, key choice, and transaction shape become central performance decisions.

Learning: Cloud SQL follows the traditional managed relational pattern. Google Cloud’s documentation for Cloud SQL high availability and read replicas describes a familiar architecture: a primary instance, standby or failover behavior, backups, and replicas used to offload reads. That pattern is excellent when the system can name a primary write location. It becomes strained when the product needs the database to behave like a multi-region coordination system.

The practical conclusion is not “Spanner for scale, Cloud SQL for small.” Many large systems should stay on Cloud SQL because their data ownership is regional, their operational model is simple, and their engineering leverage comes from standard PostgreSQL or MySQL behavior. Some smaller systems may need Spanner because their correctness boundary is global from day one: payments, identity, inventory, entitlement, or control-plane state.

Where It Breaks

Decision areaCloud SQL failure modeCloud Spanner failure mode
Write scalingPrimary becomes the ceiling for write throughputHot keys or poor split behavior concentrate load
Regional resilienceFailover is an event the system must tolerateMulti-region writes pay latency and topology costs
ConsistencyCross-region correctness often moves into application codeStrong consistency can encourage oversized transactions
EcosystemExcellent compatibility with PostgreSQL, MySQL, or SQL Server toolingSQL support is relational but not identical to a chosen engine
OperationsFamiliar tuning can hide growing sharding complexityDistributed design requires deliberate schema and key choices
Cost modelStarts simple, then grows through replicas, larger instances, and operationsStarts higher, but may replace custom coordination machinery

What to Do Next

  • Problem: Write down the failure contract before choosing the database. Name the maximum acceptable write outage, recovery point, recovery time, and regions that must continue accepting writes.

  • Solution: Choose Cloud SQL when a primary-region relational database satisfies that contract. Choose Cloud Spanner when consistency, availability, and horizontal write scale must be owned by the database across failure domains.

  • Proof: Test the architecture under the failure it claims to survive. Promote replicas, block regions, replay writes, measure stale reads, and verify whether application invariants still hold without manual reconciliation.

  • Action: Do not migrate because “distributed” sounds safer. Migrate when the current architecture has already forced you to build a distributed database outside the database.