Terraform plan review is not a ritual for approving syntax; it is the last cheap place to catch a production architecture mistake before an API turns intent into infrastructure.

Situation

Infrastructure review used to happen in design documents, change tickets, and console screenshots. Terraform moved much of that decision-making into code, which improved repeatability but also changed the review surface. The pull request no longer shows the full operational consequence. The real artifact is the plan: the proposed state transition between what exists and what will exist after apply.

That shift matters because infrastructure changes are rarely isolated. A one-line variable change can replace a load balancer, widen a security group, rotate a database, delete an IAM binding, or change the blast radius of a deployment pipeline. Senior engineers know that Terraform is not merely declaring resources. It is coordinating cloud APIs, provider behavior, state history, dependency ordering, and organizational policy.

The practical question is not “does this plan look reasonable?” The question is sharper: “what failure mode becomes possible if this plan is applied exactly as shown?”

The Problem

Most teams review Terraform the way they review application code. They check naming, formatting, module usage, and whether the change matches the ticket. That catches some mistakes, but it misses the hardest ones.

The plan may say forces replacement, but the reviewer must know whether replacement means a harmless stateless node or a customer-facing endpoint. The plan may show a security group rule changing from one CIDR range to another, but the reviewer must infer whether this turns a private control plane into a public surface. The plan may show a tag update, but hidden provider behavior may still cause a resource recreation.

This creates a review gap. Terraform is deterministic only inside its model. The cloud provider is not a pure function. APIs have eventual consistency, quotas, mutable defaults, regional behaviors, and constraints Terraform cannot fully encode. State can drift. Imported resources can be incomplete. Modules can hide risky defaults. CI can validate syntax while missing the operational consequence.

So the core question becomes: what should a senior engineer inspect in a Terraform plan before trusting automation to apply it?

The Senior Review Loop

Senior plan review works best as a layered control loop. The reviewer starts with intent, then checks blast radius, data safety, identity, network exposure, state behavior, and rollout mechanics. Policy automation should remove obvious mistakes, but it cannot replace architectural judgment.

flowchart TD
  A[Pull request — infrastructure intent] --> B[Terraform plan — proposed state delta]
  B --> C[Blast radius review — resources changed]
  C --> D[Data safety review — destroy and replacement]
  D --> E[Identity review — roles and permissions]
  E --> F[Network review — ingress and egress]
  F --> G[State review — drift and imports]
  G --> H[Policy review — automated guardrails]
  H --> I[Apply decision — approve or redesign]

The first thing to inspect is destructive change. Any destroy, replace, or forces replacement deserves a pause. The key question is whether the resource is disposable, replicated, backed up, or externally referenced. Replacing an autoscaling group instance is different from replacing a database subnet group or a DNS zone. Terraform will describe the operation, but it will not rank the business consequence.

The second thing is identity. IAM, service accounts, role bindings, and trust policies often look verbose, which makes dangerous changes easy to hide. Senior reviewers look for privilege expansion, wildcard actions, cross-account trust, broad principals, and policies attached to automation identities. The highest-risk identity changes are not always the largest diffs. A small trust-policy change can turn a narrow deploy role into a general-purpose escalation path.

The third thing is network exposure. Look for CIDR changes, public IP assignment, route table changes, load balancer listener changes, security group ingress, firewall egress, private endpoint removal, and DNS changes. A good review asks whether the plan changes who can reach the system, what the system can reach, and whether that path bypasses an existing control.

The fourth thing is state and drift. If the plan contains unexpected changes, the reviewer should ask whether reality changed outside Terraform, whether the provider schema changed, whether a module default changed, or whether state was imported incorrectly. Unexpected no-op-to-change transitions are signals. They often mean Terraform is no longer just applying the proposed pull request; it is reconciling accumulated environmental drift.

The fifth thing is rollout behavior. Some plans are correct but unsafe to apply all at once. Changes to databases, DNS, certificates, queues, and shared networking often need sequencing. Senior engineers check whether the plan can be applied atomically, whether a two-phase migration is needed, and whether rollback is actually possible. “Terraform can roll back” is often false. Terraform can apply another desired state; it cannot necessarily restore deleted data, reused names, or external side effects.

In Practice

Context: Terraform’s own plan model separates review from apply by producing an execution plan before changing real infrastructure. HashiCorp documents this as the point where Terraform compares configuration, prior state, and remote objects to decide proposed actions.

Action: Treat that plan as the review artifact, not as a formality. A senior reviewer reads the action symbols first: create, update, destroy, and replace. Then they trace the resources with the highest operational consequence.

Result: The review becomes risk-ranked instead of line-ranked. A five-line IAM change can receive more scrutiny than a large refactor that only renames local variables.

Learning: The plan is a state transition document. Review it the way you would review a production migration.

Context: Policy-as-code systems such as HashiCorp Sentinel and Open Policy Agent are commonly used to block classes of infrastructure changes before apply. The documented pattern is to encode organizational constraints, such as disallowing public storage buckets or requiring tags.

Action: Use policy checks for invariants that should not depend on reviewer memory. Examples include prohibiting public object storage, requiring encryption, restricting allowed regions, and blocking privileged wildcard IAM patterns.

Result: Human review moves up the stack. Reviewers spend less time catching known forbidden states and more time evaluating architecture, dependency ordering, and exceptions.

Learning: Automated policy is strongest when it blocks repeatable mistakes. It is weakest when the question requires context, such as whether replacing a resource is acceptable during a migration window.

Context: Google’s Site Reliability Engineering guidance emphasizes risk reduction through automation, progressive rollout, and operational review of change. The documented pattern is that safe change management depends on understanding blast radius and recovery, not merely executing a approved command.

Action: Apply that same lens to Terraform. Before approval, identify the impacted service, the recovery path, the owner watching the apply, and the signal that would prove the change is healthy.

Result: Terraform review becomes connected to operations. The reviewer is no longer approving an isolated diff; they are approving a change with monitoring, ownership, and rollback assumptions.

Learning: Infrastructure automation does not remove change risk. It concentrates risk into fewer, faster, more repeatable workflows, which makes review quality more important.

Where It Breaks

Failure modeWhat the plan showsWhat senior reviewers ask
Hidden replacementforces replacement on a resourceIs this resource disposable, replicated, and safe to recreate now?
Privilege expansionIAM policy or binding updateDoes this grant broader action, resource, or trust than before?
Public exposureFirewall, route, listener, or CIDR changeWho can reach this system after apply?
Drift reconciliationUnexpected update unrelated to the PRDid something change outside Terraform or inside the provider?
Unsafe sequencingMany dependent resources change togetherShould this be split into phases with verification between applies?
Weak rollbackDestroy or rename of durable resourceWhat exactly restores service if apply succeeds but behavior fails?
Module opacitySmall module version or variable changeWhat resources does the module actually change underneath?

The hardest reviews are the ones where the plan is technically correct but operationally premature. Terraform may be doing exactly what the configuration requested. That does not mean the organization is ready for the consequence.

What to Do Next

  • Problem: Terraform reviews often focus on code style while the real risk lives in the generated state transition.
  • Solution: Review the plan by risk category: destructive change, identity, network exposure, state drift, and rollout sequencing.
  • Proof: Use policy-as-code for repeatable guardrails, then reserve senior review for architectural judgment and operational consequence.
  • Action: Before approving the next plan, write down the highest-risk resource change, the expected blast radius, the verification signal, and the rollback path.