GCP Reference Architecture: Cloud Run, Load Balancing, Cloud SQL, Memorystore, and Pub/Sub
A serverless web tier does not remove capacity planning; it moves the hardest part to the boundaries where autoscaling compute meets stateful systems.
Situation
Cloud Run is attractive because it gives teams a small operational surface: ship a container, expose HTTP, configure concurrency, and let the platform create more instances when traffic rises. For many product systems, that is exactly the right default. The problem is not Cloud Run. The problem is treating Cloud Run as if every dependency scales the same way.
A typical GCP production path has five moving parts. The external Application Load Balancer terminates public traffic and routes to a serverless network endpoint group. Cloud Run handles request execution. Cloud SQL stores the durable relational state. Memorystore absorbs repeated reads, coordination hints, and short-lived derived data. Pub/Sub carries work that does not need to block the user request.
That architecture is common because each component has a clear job. It fails when those jobs blur. If request handlers open unbounded database connections, autoscaling becomes a database denial-of-service. If the cache becomes the source of truth, Redis maintenance becomes a data-loss event. If Pub/Sub consumers are not idempotent, retry behavior turns a transient failure into duplicated side effects.
The Problem
The dangerous moment is a traffic spike, deploy rollback, regional incident, or upstream retry storm. The load balancer and Cloud Run can admit more work quickly. Cloud SQL cannot create infinite connections. Memorystore can reduce read pressure, but only for keys that are hot and safe to recompute. Pub/Sub can preserve work, but it also extends the lifetime of bad messages unless consumers classify failures correctly.
The system therefore needs two separate control loops. The request path must protect latency and database capacity. The asynchronous path must protect correctness and recovery. They share code, identity, observability, and deployment pipelines, but they should not share the same scaling assumptions.
The core question is: how do we use managed GCP services without letting serverless elasticity overload the stateful parts of the system?
Core Concept
flowchart TD
U[users] --> LB[external Application Load Balancer — TLS and routing]
LB --> NEG[serverless NEG — Cloud Run backend]
NEG --> WEB[Cloud Run web service — bounded concurrency]
WEB --> CACHE[Memorystore Redis — cache aside and leases]
WEB --> DB[Cloud SQL — durable relational state]
WEB --> TOPIC[Pub Sub topic — deferred work]
TOPIC --> WORKER[Cloud Run worker — idempotent consumer]
WORKER --> CACHE
WORKER --> DB
OPS[operations plane — logs metrics traces alerts] --> LB
OPS --> WEB
OPS --> WORKER
OPS --> DB
OPS --> CACHE
OPS --> TOPIC
The load balancer owns the public edge: TLS certificates, global or regional ingress, URL routing, Cloud Armor policies, and a stable IP. A serverless NEG points that edge at Cloud Run, which keeps the application container independent from the ingress policy. Google documents serverless NEGs as the mechanism for connecting Cloud Run to Application Load Balancers, and the load balancer becomes the place to centralize edge controls rather than embedding them in every service.
Cloud Run owns stateless execution. Set concurrency deliberately instead of accepting it as a neutral default. High concurrency is efficient for CPU-light handlers, but it multiplies the number of simultaneous database operations per instance. Maximum instances are also a safety control, not only a cost control. A useful starting formula is:
maximum database clients = max Cloud Run instances * per instance pool size
That number must fit under Cloud SQL connection limits with room for migrations, consoles, maintenance, background workers, and emergency access.
Cloud SQL owns durable relational state. Prefer private connectivity where possible, use connection pooling, and assume connections will be dropped during maintenance or failover. Google’s Cloud SQL guidance explicitly calls out connection pooling, exponential backoff, testing maintenance behavior, and testing failover behavior as best practices. That means the application contract is not “connections stay alive.” The contract is “the application reconnects, retries safe operations, and sheds load when the database is unavailable.”
Memorystore owns speed, not truth. Use cache-aside for expensive reads: read Redis, fall back to Cloud SQL, populate Redis with a TTL, and tolerate cache misses. Use short leases only where duplicate work is acceptable or guarded by database constraints. Do not place unrecoverable state in Redis unless the business has accepted that failure mode.
Pub/Sub owns decoupling. Publish after the durable transaction commits, or use an outbox table if the event and database write must move together. Workers should be idempotent by construction: natural keys, database uniqueness constraints, processed-event tables, or compare-and-set updates. Pub/Sub retries are useful only when repeated delivery is safe.
In Practice
Context: Google Cloud documents Application Load Balancers as Layer 7 proxies and serverless NEGs as backends that can point to Cloud Run. The documented pattern is to put Cloud Run behind the load balancer when the service needs centralized ingress features such as a stable external endpoint and edge policy controls. See Google Cloud’s documentation on external Application Load Balancers and serverless NEGs.
Action: Treat the load balancer as the public contract and Cloud Run as the revisioned compute target. Keep Cloud Run services private to intended callers where possible, grant invoker permissions intentionally, and route public traffic through the load balancer. This prevents every service from inventing its own edge behavior.
Result: Deployments become safer because traffic management, TLS, and application revision rollout are separate concerns. A bad revision can be rolled back without changing public DNS or certificate handling.
Learning: The load balancer is not decorative infrastructure. It is the boundary where product traffic becomes controlled platform traffic.
Context: Cloud Run documents concurrent request handling and maximum instances as service controls. Cloud SQL documents connection pooling and reconnect behavior because database connections can be dropped by the database or infrastructure. See Cloud Run’s concurrency, maximum instances, and Cloud SQL’s Cloud Run connection guidance.
Action: Size Cloud Run concurrency and max instances against Cloud SQL, not only against HTTP throughput. Put a small pool inside each instance, use timeouts, use exponential backoff, and fail fast when the database is saturated.
Result: The service degrades by rejecting excess work rather than turning a spike into connection exhaustion. Users see controlled errors and retries instead of a full database collapse.
Learning: Autoscaling needs a governor whenever the next hop is stateful.
Context: Google Cloud documents Memorystore connectivity from Cloud Run through VPC access patterns, and Redis itself is commonly used as a cache with expiration semantics rather than a relational source of record. See connecting Cloud Run to Memorystore for Redis.
Action: Use Redis for cache-aside reads, short-lived coordination, and rate hints. Put TTLs on cached data. Make cache population safe under concurrent misses. Keep writes authoritative in Cloud SQL.
Result: Hot reads stop hammering Cloud SQL, but the system still recovers when Redis is flushed, unavailable, or cold after maintenance.
Learning: A cache is an optimization that must be removable during an incident.
Context: Pub/Sub is documented as an asynchronous messaging service with high reliability and scalability, and authenticated push to Cloud Run requires the caller identity to have Cloud Run invoker permission. See Pub/Sub’s architecture overview and push authentication guidance.
Action: Move slow and retryable work out of the user request. Publish events after durable state changes. Make workers idempotent. Use dead-letter topics for poison messages and alert on backlog age, not just message count.
Result: User-facing latency is protected, and operational recovery becomes visible. A worker outage accumulates backlog instead of losing work, while dead-letter routing separates bad data from temporary dependency failures.
Learning: Queues do not remove failure. They make failure durable enough to inspect and replay.
Where It Breaks
| Failure mode | Symptom | Control |
|---|---|---|
| Cloud Run scales faster than Cloud SQL | Connection exhaustion, rising latency, failed logins | Bound max instances, bound pool size, use backoff |
| Cache stampede | Redis miss causes many identical database reads | Singleflight, leases, jittered TTLs |
| Redis treated as durable state | Data disappears after maintenance or flush | Keep source of truth in Cloud SQL |
| Pub/Sub consumer is not idempotent | Duplicate emails, double charges, repeated mutations | Idempotency keys and database constraints |
| Load balancer health hides dependency failure | Edge stays healthy while app returns 500s | Application health checks and dependency alerts |
| Cloud SQL failover is untested | Long recovery, stuck connections | Run failover tests and reconnect drills |
| Worker backlog is invisible | Async work misses business deadlines | Alert on oldest unacked message age |
What to Do Next
- Problem: Serverless compute can overload stateful dependencies faster than humans can react.
- Solution: Put Cloud Run behind an Application Load Balancer, cap concurrency and instances, use Cloud SQL as the source of truth, use Memorystore only for recoverable acceleration, and move non-blocking work through Pub/Sub.
- Proof: The documented GCP patterns all point to explicit boundaries: serverless NEGs for ingress, Cloud Run concurrency controls for admission, Cloud SQL pooling for connection survival, Redis access through private networking, and Pub/Sub authentication for asynchronous invocation.
- Action: Before production, run four drills: a traffic spike against max instances, a Cloud SQL failover, a Redis flush, and a Pub/Sub poison-message replay. If the system cannot survive those drills, the architecture is not ready; it is only deployed.