One agent should not be both author, reviewer, risk assessor, and release manager. Human engineering organizations separate duties because each role sees different risks. The author optimizes for implementation. The reviewer looks for correctness. Security checks access boundaries. Operations checks rollback and observability.

Situation

Human engineering organizations separate duties because each role sees different risks. The author optimizes for implementation. The reviewer looks for correctness. Security checks access boundaries. Operations checks rollback and observability.

The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.

Operating layerDefault approachBetter alternative
ContextRely on a long prompt or chat historyGive the agent task-specific evidence and rules
ToolingExpose broad tools and inspect laterExpose narrow tools with clear approval boundaries
VerificationRead the final answerCheck the artifact, trace, and final state

The Problem

A single agent loop compresses all those roles into one context window. It may generate a migration and then accept its own reasoning about why the migration is safe. That is not review; it is self-approval.

The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.

Failure pointWhat breaksWhy it matters
Weak boundaryAgent authority is broader than the taskA diagnostic run can become an unsafe change
Missing evidenceThe agent cannot cite the state it usedReview becomes opinion instead of verification
No lifecycleThe workflow ends at a messageOwnership, audit, cleanup, and rollback disappear

Specialized Agent Review

Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer.

flowchart TD
    A[task request — bounded intent] --> B[specialized agent review — controls]
    B --> C[tool execution — evidence collected]
    C --> D[verification — final state checked]
    D --> E[human handoff — audit retained]
  1. Define the operating boundary.
    Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs.

  2. Shape the evidence.
    Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted.

  3. Require proof of completion.
    Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.

The author agent produces an artifact. Review agents read only the artifact, repo policy, and test output. They return findings, not merged changes.

In Practice

Context: OpenAI’s harness engineering discussion points to agent-to-agent review as part of the productivity system around Codex. The database version of that pattern is especially valuable because operational risk is multi-dimensional. Source: OpenAI, Harness engineering.

Action: The author agent produces an artifact. Review agents read only the artifact, repo policy, and test output. They return findings, not merged changes.

Result: Specialization reduces prompt overload and makes findings easier to audit because each reviewer has a limited responsibility.

Learning: Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.

Where It Breaks

Failure modeTriggerFix
Self-reviewAuthor agent validates its own workRun independent review agents
Review sprawlEvery reviewer comments on everythingGive each reviewer one risk class
No evidenceReviewer returns broad adviceRequire file, output, or policy citation
Human overloadFive agents produce five essaysNormalize findings into severity, evidence, fix

What to Do Next

  • Problem: A single agent loop compresses all those roles into one context window. It may generate a migration and then accept its own reasoning about why the migration is safe. That is not review; it is self-approval.
  • Solution: Use specialized review agents with narrow prompts and evidence requirements: locking reviewer, rollback reviewer, Terraform reviewer, observability reviewer, and security reviewer.
  • Proof: Specialization reduces prompt overload and makes findings easier to audit because each reviewer has a limited responsibility.
  • Action: Create two review prompts for database changes: one for lock risk and one for rollback completeness. Run both against the same migration PR.

The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.