Agent-to-Agent Review Loops

One agent should not be both author, reviewer, risk assessor, and release manager. Human engineering organizations separate duties because each role sees different risks. The author optimizes for implementation. The reviewer looks for correctness. Security checks access boundaries. Operations checks rollback and observability.

Situation

Human engineering organizations separate duties because each role sees different risks. The author optimizes for implementation. The reviewer looks for correctness. Security checks access boundaries. Operations checks rollback and observability.

The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.

Operating layer	Default approach	Better alternative
Context	Rely on a long prompt or chat history	Give the agent task-specific evidence and rules
Tooling	Expose broad tools and inspect later	Expose narrow tools with clear approval boundaries
Verification	Read the final answer	Check the artifact, trace, and final state

The Problem

A single agent loop compresses all those roles into one context window. It may generate a migration and then accept its own reasoning about why the migration is safe. That is not review; it is self-approval.

The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.

Failure point	What breaks	Why it matters
Weak boundary	Agent authority is broader than the task	A diagnostic run can become an unsafe change
Missing evidence	The agent cannot cite the state it used	Review becomes opinion instead of verification
No lifecycle	The workflow ends at a message	Ownership, audit, cleanup, and rollback disappear

Specialized Agent Review

Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer.

flowchart TD
    A[task request — bounded intent] --> B[specialized agent review — controls]
    B --> C[tool execution — evidence collected]
    C --> D[verification — final state checked]
    D --> E[human handoff — audit retained]

Define the operating boundary.
Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs.
Shape the evidence.
Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted.
Require proof of completion.
Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.

The author agent produces an artifact. Review agents read only the artifact, repo policy, and test output. They return findings, not merged changes.

In Practice

Context: OpenAI’s harness engineering discussion points to agent-to-agent review as part of the productivity system around Codex. The database version of that pattern is especially valuable because operational risk is multi-dimensional. Source: OpenAI, Harness engineering.

Action: The author agent produces an artifact. Review agents read only the artifact, repo policy, and test output. They return findings, not merged changes.

Result: Specialization reduces prompt overload and makes findings easier to audit because each reviewer has a limited responsibility.

Learning: Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.

Where It Breaks

Failure mode	Trigger	Fix
Self-review	Author agent validates its own work	Run independent review agents
Review sprawl	Every reviewer comments on everything	Give each reviewer one risk class
No evidence	Reviewer returns broad advice	Require file, output, or policy citation
Human overload	Five agents produce five essays	Normalize findings into severity, evidence, fix

What to Do Next

Problem: A single agent loop compresses all those roles into one context window. It may generate a migration and then accept its own reasoning about why the migration is safe. That is not review; it is self-approval.
Solution: Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer.
Proof: Specialization reduces prompt overload and makes findings easier to audit because each reviewer has a limited responsibility.
Action: Create two review prompts for database changes: one for lock risk and one for rollback completeness. Run both against the same migration PR.

The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.

Situation

The Problem

Specialized Agent Review

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Agent Productivity Depends on Context Throughput

AI Cost Incident Runbook: What to Do When Monthly Token Spend Suddenly Doubles

Agent Autonomy Ladder: Manual, Confirm, Auto-Approve, Supervised