Autonomy is not a switch; it is a ladder with different rungs for read, draft, approve, execute, and recover. Teams adopting coding agents quickly discover that full manual control wastes the agent’s value, while full auto-approval is irresponsible for production infrastructure. Database and cloud work makes the boundary sharper because the same agent that reads a schema can also generate a migration or edit IAM.

Situation

Teams adopting coding agents quickly discover that full manual control wastes the agent’s value, while full auto-approval is irresponsible for production infrastructure. Database and cloud work makes the boundary sharper because the same agent that reads a schema can also generate a migration or edit IAM.

The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.

Operating layerDefault approachBetter alternative
ContextRely on a long prompt or chat historyGive the agent task-specific evidence and rules
ToolingExpose broad tools and inspect laterExpose narrow tools with clear approval boundaries
VerificationRead the final answerCheck the artifact, trace, and final state

The Problem

Without an autonomy model, every task becomes an argument. One engineer lets the agent apply changes freely. Another blocks every shell command. The organization ends up with inconsistent risk handling instead of a repeatable operating model.

The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.

Failure pointWhat breaksWhy it matters
Weak boundaryAgent authority is broader than the taskA diagnostic run can become an unsafe change
Missing evidenceThe agent cannot cite the state it usedReview becomes opinion instead of verification
No lifecycleThe workflow ends at a messageOwnership, audit, cleanup, and rollback disappear

Autonomy Ladder

Use four modes: manual for exploration, confirm for draft changes, auto-approve for reversible low-risk reads, and supervised execution for bounded production actions with audit trails.

flowchart TD
    A[task request — bounded intent] --> B[autonomy ladder — controls]
    B --> C[tool execution — evidence collected]
    C --> D[verification — final state checked]
    D --> E[human handoff — audit retained]
  1. Define the operating boundary.
    Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs.

  2. Shape the evidence.
    Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted.

  3. Require proof of completion.
    Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.

Map each tool and workflow to a rung. Read-only replica queries may auto-approve. Migration PR creation may require confirm. Production DDL should require supervised execution with explicit rollback.

In Practice

Context: Anthropic’s autonomy reporting frames agent behavior in terms of how much work proceeds without human intervention and where users interrupt or approve. That framing is useful for infrastructure because approvals should depend on blast radius. Source: Anthropic, Measuring AI agent autonomy in practice.

Action: Map each tool and workflow to a rung. Read-only replica queries may auto-approve. Migration PR creation may require confirm. Production DDL should require supervised execution with explicit rollback.

Result: When the rung is attached to the tool, reviewers can inspect whether the agent had the correct authority before judging the result.

Learning: Use four modes: manual for exploration, confirm for draft changes, auto-approve for reversible low-risk reads, and supervised execution for bounded production actions with audit trails. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.

Where It Breaks

Failure modeTriggerFix
One-size autonomyAll commands require approval or none doAssign autonomy by tool and environment
Approval fatigueHumans approve low-risk read commands repeatedlyAuto-approve bounded read-only actions
Silent write pathDraft task receives write credentialsSeparate read, draft, and execute modes
No interrupt pathLong-running task cannot be stopped safelyRequire cancellation and state checkpointing

What to Do Next

  • Problem: Without an autonomy model, every task becomes an argument. One engineer lets the agent apply changes freely. Another blocks every shell command. The organization ends up with inconsistent risk handling instead of a repeatable operating model.
  • Solution: Use four modes: manual for exploration, confirm for draft changes, auto-approve for reversible low-risk reads, and supervised execution for bounded production actions with audit trails.
  • Proof: When the rung is attached to the tool, reviewers can inspect whether the agent had the correct authority before judging the result.
  • Action: Inventory agent tools and label each one manual, confirm, auto-approve, or supervised for dev, staging, and production.

The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.