AI Agents Need a Control Plane, Not More Interfaces

AI agent platforms are converging on one useful primitive: a strong coding model operating inside a governed execution environment. The default approach is fragmented agent interfaces: one chat for coding, another for browser work, another for documents, another for scheduled jobs. The better alternative is an agent control plane: one permissioned runtime for files, tools, browsers, code repositories, and business artifacts.

Situation

The 2024 agent race looks noisy because every vendor is shipping new surfaces: OpenAI Codex, Claude Code, Cursor, OpenClaw, browser use, computer use, schedules, routines, dispatch, remote runs, and workflow-specific applications. Underneath the product sprawl, the architecture is becoming boring in the best possible way.

A coding model is no longer just a code generator. It is a general-purpose knowledge-work engine because code, SQL, spreadsheets, documents, slide decks, test traces, and browser sessions all reduce to structured artifacts plus tool calls.

	Fragmented agent interfaces	Agent control plane
User experience	Different apps for code, docs, browser, schedules	Task-specific views over one runtime
Permissions	Repeated per tool	Central policy and approval gates
Observability	Scattered transcripts	One audit log across actions
Failure recovery	Manual reconstruction	Replayable job history and artifact diffs
Best fit	Individual experimentation	Production teams and regulated workflows

The Problem

The failure is not that teams have too many chat boxes. The failure is that each chat box becomes a separate execution path with its own credentials, logs, filesystem assumptions, and review model. That is how a harmless “summarize this dashboard” workflow quietly becomes an unreviewed production automation path.

Failure point	What breaks	Why it matters
Filesystem access	Agent edits repo, docs, and generated artifacts without a durable diff model	Incident response cannot prove what changed, when, or why
Browser use	Agent clicks through `admin.internal.example.com` like a human with no replay trace	“It submitted the form” is not an audit strategy
Scheduled jobs	Routines, remote runs, and dispatch execute the same primitive through different paths	Policy drift appears before anyone notices
Model routing	Frontier model handles one task, open model handles another, with no shared contract	Cost drops, but behavior becomes inconsistent
Tool-specific UX	Codex, Claude Code, Cursor, Warp, and internal tools all keep separate context	Engineers spend time reconciling agent state instead of reviewing output

Modern models can infer nuance, fix typos, and handle vague intent better than skeptics expected. The production problem is different: autonomous agents still make expensive assumptions when the system does not define when they must ask for clarification. How do we govern agent execution paths so that an exploratory workflow does not quietly become an unreviewed production automation path?

Core Concept

The right architecture is an agent control plane: a single job model that routes requests into governed sandboxes, grants scoped tools, captures artifacts, and requires human approval at the boundary where risk changes.

flowchart TD
    User[senior engineer] --> Intake[agent control plane — task intake]
    Intake --> Classifier[classify — code, sql, browser, doc, schedule]
    Classifier --> Policy[RBAC policy and approval rules]
    Policy --> Sandbox[ephemeral workspace — repo checkout]
    Sandbox --> Model[strong coding model]
    Model --> FS[filesystem diff]
    Model --> Browser[browser use or Playwright]
    Model --> SQL[read-only PostgreSQL replica]
    Model --> Docs[docs and spreadsheets]
    FS --> Review[diff and artifact review]
    Browser --> Replay[browser trace and screenshots]
    SQL --> Evidence[query results and explain plans]
    Docs --> Review
    Review --> Approval[human approval gate]
    Replay --> Approval
    Evidence --> Approval
    Approval --> Publish[merge, deploy, or schedule]
    Publish --> Audit[immutable audit log]

Define one job schema for every agent task.

{
  "job_type": "browser_automation",
  "repo": "payments-api",
  "tools": ["filesystem", "browser", "playwright"],
  "approval_required_for": ["submit", "delete", "purchase"],
  "artifact_contract": "diff_plus_trace"
}

Verify: every task produces the same minimum record: prompt, tools granted, artifacts created, approvals requested, and final state.

Treat browser and computer use as privileged automation.

Native browser control is useful for exploratory debugging. Playwright is better for repeatable continuous integration, meaning automated tests that run on every code change. Agentic browser use belongs between those modes: flexible enough to inspect unknown pages, constrained enough to produce screenshots, traces, and approval pauses.

Verify: any action that mutates data must have a replayable trace and a human approval checkpoint.

Separate interaction layer from execution layer.

Warp, Cursor, Codex, Claude Code, and internal portals can all be front doors. They should not each invent a different security model. The execution layer owns sandboxing, credentials, logging, and rollback.

Verify: the same policy applies whether the task starts from a terminal, browser, chat panel, or scheduled job.

Route models by risk, not fashion.

Frontier hosted models should handle ambiguous architecture changes, production debugging, and multi-artifact work. Smaller open models can handle scaffolding, search, formatting, and low-risk refactors. The control plane decides based on task class, data sensitivity, latency, and cost.

Verify: model choice is visible in the audit log and tied to an explicit task policy.

In Practice

Context: The documented pattern for agent deployment in shared environments is a unified control plane. Once more than one engineer uses autonomous agents against shared infrastructure, the primary operational question stops being “which agent is best” and becomes “who approved this action and what exactly did it change.”

Action: The minimum viable control plane for a small team relies on three invariant components: a job schema (what the agent may read, write, and call per task), an immutable record per run (prompt, tools granted, artifacts produced, approval decisions), and a strict policy for clarification before proceeding. SQL diagnostics should be restricted to read-only PostgreSQL replicas and standard views like pg_stat_statements, rather than production write connections. Browser actions on internal admin consoles require a human approval checkpoint before any submit or delete event. Everything else — model routing, sandboxed worktrees, artifact diffs — extends from those constraints.

Result: The first measurable gain is provenance, not speed. Debugging an agent-assisted system change becomes tractable because the immutable job record reliably answers the core operational questions: what the prompt was, which files were modified, which tools were called, and whether a human checkpoint was triggered before production state changed.

Learning: Vertical vendor stacks (e.g., Google AI Studio to Cloud Run, or Vercel’s v0 to production) are excellent when deployment friction is the primary bottleneck. The engineering tradeoff is architectural portability. A modular control plane costs more to build initially, but it ensures that model choice, system observability, and RBAC policy enforcement do not degrade into vendor-specific configuration understood by only one person on the team.

Where It Breaks

Failure mode	Trigger	Fix
Audit gaps	Agent has broad filesystem or browser access but only saves chat history	Store immutable job records, diffs, traces, screenshots, and approval decisions
False confidence	Evaluation checks only “task completed”	Add evals for permission adherence, rollback quality, artifact correctness, latency, and cost
Browser flakiness	Agent relies on visual clicking for a stable workflow	Convert repeated paths to Playwright tests with assertions and traces
Cost shock	Frontier models are used for every low-risk edit	Route simple tasks to cheaper hosted or open models with the same output contract
Permission drift	Schedules, routines, and remote jobs use separate configuration	Collapse them into one scheduler with shared policy
Bad assumptions	Agent proceeds when intent is underspecified	Require clarification when confidence is low or mutation risk is high

What to Do Next

Problem: agent tools are multiplying faster than teams can govern them.
Solution: build one agent control plane for code, files, browser actions, SQL analysis, documents, and scheduled jobs.
Proof: the same review model can cover a code diff, a browser trace, and a generated spreadsheet.
Action: this week, define your internal agent job schema with filesystem scope, network scope, browser domains, credentials, approval gates, logging, rollback, and artifact review.

Situation

The Problem

Core Concept

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Agent Productivity Depends on Context Throughput

AI Cost Incident Runbook: What to Do When Monthly Token Spend Suddenly Doubles

Agent-to-Agent Review Loops