Load Balancers: The Hidden State Machine in Front of Your App
A load balancer is not a pipe; it is a distributed state machine making safety decisions on stale, partial, and sometimes misleading evidence.
Situation
Most application teams treat load balancers as infrastructure furniture. You define a listener, point it at a target group, add a health check, and move on to the application. The mental model is simple: clients arrive, the load balancer picks a backend, bad instances are removed, good instances receive traffic.
That model works until production starts changing faster than the control plane can agree on what is true.
Deployments drain connections. Autoscaling adds cold targets. Health checks pass while real requests fail. TLS handshakes saturate a node before CPU alarms fire. A single dependency outage makes every backend return the same error at the same time. Suddenly the component that was supposed to be boring is deciding whether to retry, eject, drain, panic, fail open, or send traffic to a target everyone believes is unhealthy.
The important shift is this: modern load balancers are not just traffic distributors. They encode policy, memory, timers, thresholds, and recovery behavior. They remember which endpoints were recently bad. They delay removal to avoid flapping. They preserve long connections while moving new requests elsewhere. They may intentionally route to unhealthy hosts when the alternative is a total outage.
The Problem
The common failure is not that the load balancer makes one wrong routing decision. The failure is that application teams design their services as if the load balancer were stateless.
A stateless router can be reasoned about request by request. A load balancer cannot. Its current decision depends on previous health checks, previous errors, configured thresholds, slow-start windows, connection draining state, availability zone policy, retry budgets, outlier detection, and how many targets remain eligible.
That hidden state creates several production traps.
First, health is sampled, not known. A target can pass /health while the application path that performs authentication, database access, or queue writes is broken. The load balancer sees green. Users see failure.
Second, removal is delayed by design. Health thresholds exist to prevent one transient miss from ejecting a healthy server. That same protection means a badly deployed instance may continue receiving traffic for several probe intervals.
Third, recovery is also delayed. A fixed health check interval and healthy threshold can turn a thirty-second application recovery into a multi-minute traffic recovery.
Fourth, all-target failure is special. Some systems fail closed, returning an error because no target is safe. Others fail open, sending traffic to all targets because every target being unhealthy may mean the health signal is wrong or the system is in a regional failure mode.
So the real question is not “Which load balancing algorithm should we use?” The better question is: what state machine are we placing in front of the application, and have we designed the application to survive its transitions?
The Load Balancer State Machine
A useful architecture starts by making the implicit state explicit. The load balancer has at least six states for a backend: unknown, warming, healthy, suspect, draining, and ejected. Different products use different names, but the operational pattern is consistent.
flowchart TD
A[client request — arrives] --> B[listener — protocol policy]
B --> C{route decision — match rules}
C -->|rule match| D[target group — weighted pool]
D --> E{endpoint state — healthy enough}
E -->|healthy| F[backend — receive request]
E -->|draining| G[connection draining — finish or timeout]
E -->|unhealthy| H[outlier set — remove from pool]
H --> I{panic rule — too few healthy targets}
I -->|normal mode| J[return failure — no safe target]
I -->|fail open| F
F --> K[feedback — latency errors resets]
K --> D
The application architecture should treat this state machine as part of the serving path.
The health endpoint should be intentionally boring, but not meaningless. It should verify that the process can serve the cheapest representative request, not that every dependency in the universe is perfect. A health check that fails on any downstream blip can evacuate the entire fleet during a dependency incident. A health check that only returns “process is alive” can keep broken application instances in rotation.
Readiness should be separated from liveness. A process can be alive while not ready to receive traffic. During startup, schema migration, cache warmup, model loading, or connection pool initialization, the correct state is not dead. It is warming.
Draining should be designed as an application behavior, not only an infrastructure setting. When a target is removed from rotation, new requests should stop, but existing work should have a bounded chance to finish. That means request deadlines, idempotency keys, retry-safe handlers, and shutdown hooks that stop accepting work before terminating the process.
Retries must be budgeted against the same pool the load balancer is protecting. If every client retries twice, and the load balancer also retries, a partial outage can become an amplification system. Retry policy belongs in the architecture diagram, not in a library default no one reviews.
Finally, observability should expose state transitions, not only request totals. You need to see healthy host count, ejection count, target response codes, load balancer generated errors, backend generated errors, connection age, drain duration, and retry attempts. If those signals are split across five dashboards, incident response will reconstruct the state machine from symptoms.
In Practice
Context. AWS documents a specific fail-open behavior for Application Load Balancer target groups: if all targets fail health checks in all enabled Availability Zones, the load balancer routes to all targets regardless of health status, according to its algorithm. See the AWS Elastic Load Balancing documentation on target group health checks.
Action. The architectural action is to treat “all targets unhealthy” as a first-class mode. Health checks should not depend on fragile shared dependencies unless removing every target is genuinely safer than serving degraded traffic. Applications should also emit a clear degraded response when dependency failure is known.
Result. The documented result is a changed failure mode: the load balancer may prefer attempting service over returning no service. That can be correct during health-check misconfiguration or probe-path failure, and dangerous when every backend is truly unable to serve.
Learning. Do not assume unhealthy means isolated. In a systemic failure, load balancer behavior often shifts from protecting individual hosts to preserving some chance of availability.
Context. Google’s SRE material on load balancing in the datacenter describes load balancing as a capacity and overload-control problem, not merely a request distribution problem. It discusses health checking, backend overload, and algorithms that avoid sending additional traffic where capacity is already constrained.
Action. The architectural action is to feed the balancer signals that approximate serving capacity, not just binary process health. Concurrency, queue depth, latency, and overload responses can be better indicators than “port is open.”
Result. The documented pattern is that load balancing becomes part of overload prevention. It steers demand away from constrained backends before total failure, but it requires trustworthy feedback from the serving systems.
Learning. A load balancer cannot invent capacity. It can only allocate demand based on the signals it receives.
Context. Envoy documents outlier detection as a mechanism for detecting hosts behaving unlike others and ejecting them from the healthy load balancing set, with caveats around panic scenarios and active health checks that do not validate real data-plane behavior.
Action. The architectural action is to distinguish active health checks from passive traffic evidence. If live requests fail while active probes pass, passive outlier detection can protect users faster than probe-only health.
Result. The documented result is adaptive ejection based on observed behavior. It improves resilience to partial backend failure, but it introduces more state, timers, and re-entry behavior to understand.
Learning. More intelligent load balancing increases the need for operational literacy. The system is safer only if engineers know when and why it ejects, restores, or panics.
Where It Breaks
| Design choice | What it protects | Where it fails |
|---|---|---|
| Simple health check | Removes crashed processes | Misses broken application paths |
| Deep dependency health check | Avoids serving known bad requests | Can evacuate the fleet during dependency incidents |
| Aggressive ejection | Reduces user-visible errors quickly | Can shrink capacity during transient spikes |
| Slow ejection | Avoids flapping | Sends traffic to bad targets longer |
| Fail closed | Prevents known-bad backends from serving | Turns probe failure into total outage |
| Fail open | Preserves a chance of service | Sends traffic to unhealthy targets |
| Sticky sessions | Preserves cache and session locality | Concentrates failure on unlucky clients |
| Client retries | Masks isolated failures | Amplifies load during partial outages |
| Connection draining | Protects in-flight work | Extends deploy and rollback windows |
The hardest production incidents happen when several of these choices interact. A deploy adds cold targets. Slow start is missing. Latency rises. Clients retry. Passive detection ejects a few hosts. Remaining hosts take more load. Health checks begin timing out. The balancer enters a different mode. By the time the application team looks at logs, the visible error is a generic gateway failure, but the root cause is a state transition cascade.
What to Do Next
-
Problem: Treating the load balancer as stateless hides the real failure modes. Write down the backend states your platform supports: warming, healthy, suspect, draining, ejected, and fail-open or fail-closed behavior.
-
Solution: Design health, readiness, retries, and draining as one serving contract. The application should know when it is ready, when it is degraded, and when it must stop accepting new work.
-
Proof: Test the state machine directly. Kill one target, break the health endpoint, break the main request path while leaving health green, make every target unhealthy, and run a deploy while long requests are active.
-
Action: Add dashboards and alerts around transitions, not just traffic volume. Healthy target count, ejection events, retry rate, load balancer errors, backend errors, and drain duration should tell one coherent story during an incident.