Parallel AI Agents Need an Operating Model
Content reflects the state as of June 2025. AI tooling and model capabilities in this area change frequently.
Parallel coding agents do not fail because the model is too slow; they fail because the repository, permissions, memory, and verification loop were still designed for one human typing in one terminal.
Situation
The default approach is sequential single-agent prompting: one coding agent, one checkout, one context window, one review loop. The alternative is an agent control plane: multiple isolated agents working in parallel, with explicit rules for workspace ownership, shared memory, tool permissions, automated checks, and integration order.
| Mode | What scales | What becomes the bottleneck |
|---|---|---|
| Single agent session | Prompt quality and patience | Human steering time |
| Parallel agents in shared checkout | Nothing useful for long | File conflicts and partial edits |
| Parallel agents with control plane | Independent work streams | Review, merge order, and verification quality |
This is the same shift platform teams already made with CI, feature flags, and deployment systems. Raw execution is cheap; uncontrolled execution is expensive.
The Problem
A coding agent is not just a smarter autocomplete. Once it can edit files, run commands, open pull requests, query logs, and call Model Context Protocol (MCP) servers, it becomes an actor inside the engineering system.
| Failure point | What breaks | Why it matters |
|---|---|---|
| Shared working tree | Two agents edit the same files, generated artifacts churn, test fixes overwrite feature work | Git conflict resolution moves from rare human cleanup to the normal path |
| Unbounded memory files | CLAUDE.md becomes a policy landfill with stale rules, duplicated commands, and contradictory guidance | The agent obeys the loudest instruction, not the most correct one |
| Permission sprawl | Shell, network, secrets, deploy commands, and MCP tools sit behind the same approval habit | One careless approval can turn a coding session into an operational incident |
| Hook loops | PostToolUse formatters and Stop hooks keep chasing green tests without diagnosing root cause | The system can burn time repeatedly repairing symptoms |
| Review collision | Fifteen branches arrive with overlapping abstractions, renamed modules, and incompatible migration order | The bottleneck moves from coding to architectural arbitration |
| Weak verification | Agents run npm test when the real gate is npm run check, Playwright, migration dry runs, or mobile simulators | False confidence ships faster than correct code |
The non-obvious failure is not concurrency itself. Databases, CI systems, and distributed job runners have handled concurrency for decades. The failure is treating an autonomous coding agent like a chat window instead of a worker with identity, scope, state, privileges, and exit criteria.
The core question is simple: what operating model lets agent parallelism increase throughput without turning the repository into a merge queue with opinions?
Build an Agent Control Plane, Not a Prompt Pile
Make the control plane concrete. Consider a small Astro documentation site with this shape:
repo/
src/content/blog/
src/content/config.ts
src/layouts/BaseLayout.astro
src/pages/blog/index.astro
src/pages/blog/[...slug].astro
src/config/site.ts
public/
package.json
The request is: improve blog discovery without breaking post rendering. That sounds small, but it crosses content schema, listing UI, page rendering, and build verification. Do not put three agents into the same checkout and ask them to “make it better.” Split the work by ownership.
flowchart TD
Request[improve blog discovery] --> Planner[planning session]
Planner --> Contract[scope and verification contract]
Contract --> Router[agent router]
Router -->|content schema| AgentA[worktree A — metadata agent]
Router -->|listing UI| AgentB[worktree B — search agent]
Router -->|verification| AgentC[worktree C — review agent]
Memory[shared memory — repo rules and commands] --> Planner
Memory --> AgentA
Memory --> AgentB
Memory --> AgentC
Policy[permission policy — shell and tool boundaries] --> AgentA
Policy --> AgentB
Policy --> AgentC
AgentA --> Checks[verification matrix]
AgentB --> Checks
AgentC --> Checks
Checks --> Integrator[integration branch owner]
Integrator --> PR[pull request with evidence]
Use three worktrees and three branches:
| Agent | Branch | Worktree | Owns | Cannot touch |
|---|---|---|---|---|
| Metadata agent | agent/metadata-filter-contract | ../repo-agent-metadata | src/content/config.ts, content frontmatter validation, listing data shape | src/layouts/BaseLayout.astro, visual layout changes |
| Search agent | agent/blog-search-ui | ../repo-agent-search | src/pages/blog/index.astro, client-side search and tag behavior | content schema, Markdown post bodies |
| Review agent | agent/blog-render-verifier | ../repo-agent-review | test plan, rendered page review, Mermaid and TOC regression checks | implementation edits unless explicitly reassigned |
The ownership rules are deliberately narrow:
| Rule | Verification |
|---|---|
| One agent owns one branch and one worktree | git branch --show-current matches the assigned branch |
| Work starts only from a clean base | git status --short is empty before assignment |
| Agents may edit only owned files unless the planner expands scope | git diff --name-only main...HEAD stays inside the assigned paths |
| Generated files are not committed unless the repo already tracks them | git status --short shows no unexpected build output |
| Integration happens in a fourth branch owned by a human or integrator agent | agent branches merge into integration/blog-discovery, not into each other |
The permission policy should be boring and explicit:
| Permission class | Allowed without approval | Requires approval |
|---|---|---|
| Git inspection | git status, git diff, git log, git branch --show-current | branch deletion, reset, force push |
| File edits | assigned source files | shared layouts, lockfiles, generated files, ignored private notes |
| Local commands | npm run check, ASTRO_TELEMETRY_DISABLED=1 npm run build | package installs, dependency upgrades |
| Network | none for this task | external fetches, package registry calls, write-capable MCP tools |
| Secrets and deploys | none | environment files, Cloudflare deploy commands, production data |
The verification matrix becomes the contract, not an afterthought:
| Check | Metadata agent | Search agent | Review agent | Integrator |
|---|---|---|---|---|
git diff --name-only main...HEAD matches ownership | Required | Required | Required | Required |
npm run check | Required | Required | Required | Required |
ASTRO_TELEMETRY_DISABLED=1 npm run build | Required | Required | Required | Required |
| Blog index search still filters by text and tag | Not required | Required | Required | Required |
Markdown post page still renders TOC for ## and ### | Not required | Not required | Required | Required |
Mermaid blocks still target pre[data-language='mermaid'] | Not required | Not required | Required | Required |
| PR notes include commands run and remaining risk | Required | Required | Required | Required |
This prevents a specific merge failure: the Search agent renames the tag data shape in src/pages/blog/index.astro while the Metadata agent changes the content schema to support the same idea differently. Each branch builds alone. Together, the index page silently drops filtering because the UI expects one field name and the collection query returns another. With branch ownership and an integration branch, the conflict appears as an interface review before it becomes a deployed behavior bug.
The control plane is not a large platform. It is the minimum set of rules that makes parallel work reviewable: isolated worktrees, file ownership, permission boundaries, a verification matrix, and one integration owner.
In Practice
Anthropic’s Claude Code documentation treats these primitives as first-class features, not prompt folklore: slash commands include workflow entry points, and /init creates a CLAUDE.md project guide in the repository workflow (Anthropic slash commands).
The documented pattern is that subagents are separate workers: Claude Code states that each subagent has its own context window, custom system prompt, tool access, and independent permissions (Claude Code subagents). That maps directly to the production need to separate implementation, simplification, and verification rather than asking one saturated context window to produce and audit the same change.
Hooks are also documented as lifecycle controls, not decoration. Claude Code documents PostToolUse hooks for actions after edits and broader hook events around tool use, permissions, subagents, and stop conditions (Claude Code hooks). The documented pattern is useful, but the operational risk is plain: a hook can automate formatting or verification, and it can also hide a design problem if it repeatedly patches output without escalating the underlying cause.
Git provides the isolation primitive underneath the workflow. The official git worktree documentation describes multiple working trees attached to the same repository (Git worktree). The production pattern that follows is branch-per-agent ownership, because isolation without integration order only moves the conflict from the filesystem to the pull request queue.
MCP expands the same operating model beyond the repository. The MCP specification defines servers exposing tools, resources, and prompts over JSON-RPC, and its authorization specification separates HTTP authorization from stdio-style environment credentials (MCP base protocol, MCP authorization). The practical consequence is blunt: a log, data warehouse, messaging, or deployment connector is not “context.” It is capability. Capability needs least privilege, auditability, and separate read-only and write-capable paths.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Branch pileup | More than 3 to 5 active agents touching the same subsystem | Assign subsystem ownership and merge in dependency order |
| Stale shared memory | CLAUDE.md grows after every review comment and never shrinks | Review it like code; delete rules that no longer match the repo |
| Hook masking | Formatters and stop hooks modify output until checks pass | Cap retries, persist logs, and escalate repeated failure signatures |
| Permission drift | Engineers approve one-off shell or MCP actions until the exception becomes normal | Move recurring approvals into reviewed settings; keep deploys and secrets manual |
| False verification | Agent reports success after running a narrow test command | Require the repo’s real gate: typecheck, lint, unit tests, build, and domain-specific smoke tests |
| Integration conflict | Parallel agents produce individually valid but mutually incompatible changes | Use an integration branch owner and require architectural review for shared interfaces |
| Expensive model choice | Faster model needs repeated steering and reviewer cleanup | Measure elapsed human interventions per accepted PR, not token latency alone |
| MCP blast radius | One connector can read logs, post messages, query data, or trigger workflows | Use separate tokens, scoped environments, audit logs, and read-only defaults |
What to Do Next
- Problem: Parallel agents fail when the engineering system still assumes one actor, one checkout, and one judgment loop.
- Solution: Build a small agent control plane with isolated workspaces, reviewed shared memory, command automation, permission policy, independent verification, and one integration branch owner.
- Proof: Track accepted PRs by task type, model, elapsed time, human interventions, failed checks, review fixes, and integration conflicts; the useful metric is cost per merged change.
- Action: This week, create three git worktrees, assign branch and file ownership before edits begin, write the verification matrix into the task, and require
npm run checkplusASTRO_TELEMETRY_DISABLED=1 npm run buildbefore any agent-authored PR.
The teams that win with coding agents will not be the ones with the longest prompt library; they will be the ones that make autonomy boring, bounded, and observable.