Prompt Architecture Needs Load Boundaries

The default approach is a single always-on instruction pile; the production alternative is a layered instruction architecture where project memory, task skills, explicit commands, plugins, and Model Context Protocol integrations each have a load boundary.

Situation

AI coding assistants have moved from autocomplete into the build path: they read diffs, edit production code, run tests, call tools, and increasingly encode team workflow. That changes prompt files from personal preference into operational configuration.

Claude Code makes this visible through CLAUDE.md, skills, slash-style invocation, plugins, and Model Context Protocol servers. The engineering question is not “where do I put this prompt?” The question is: which instructions must be present on every turn, which should be loaded only when relevant, which require human intent, and which should be distributed as versioned team infrastructure?

Layer	Primary job	Load boundary	Production risk
`CLAUDE.md`	Repository memory and standing rules	Loaded at startup	Context bloat and stale global policy
Skill	Task-specific procedure	Auto-loaded or invoked by name	Bad descriptions cause missed or accidental routing
Command-style invocation	Human-triggered workflow	Explicit user call	Becomes tribal automation if not versioned
Plugin	Distribution package	Installed capability bundle	Silent behavior drift across machines
MCP server	External tools and data	Connected tool surface	Latency, permission, and data boundary failures

The Problem

Instruction systems fail the same way configuration systems fail: the first version is convenient, the fifth version is ambiguous, and the tenth version has undocumented precedence. A prompt layer that starts as “be concise and run tests” becomes a half-remembered operating manual for release policy, coding style, database migrations, security review, and incident response.

Failure point	What breaks	Why it matters
`CLAUDE.md` becomes a wiki	Claude Code loads memory files at startup, so every unrelated task carries old instructions and repository lore	The model spends attention on irrelevant policy before it reads the actual change
Skills are described too broadly	A description like “use for code quality” can match refactors, reviews, bug fixes, and design work	The wrong procedure runs with confidence, which is worse than no procedure
Skill and command names collide	Claude Code docs state that a skill and `.claude/commands/` file with the same name create the same invocation path, with the skill taking precedence	A developer may believe they invoked a command while the skill body controls behavior
Plugin installs are treated as local convenience	Plugins can bundle skills, commands, agents, hooks, and MCP configuration	A plugin update changes coding-agent behavior across a team without the review discipline normally applied to build tooling
MCP tools are always loaded without a reason	Claude Code `alwaysLoad` for MCP requires v2.1.121 or later and can block startup until connect, capped by the standard five-second timeout	Tool availability becomes part of first-prompt latency and reliability, not just a feature toggle

The hard part is not creating more instructions. The hard part is keeping them governable after they become part of the engineering system.

Layered Instruction Control Plane

The right architecture is to treat agent instructions as a control plane with explicit ownership, routing, verification, and rollout. CLAUDE.md should contain only invariants. Skills should contain procedures. Command-style workflows should represent deliberate human operations. Plugins should package reusable capability. MCP servers should expose external state through bounded, permissioned tools.

flowchart TD
    Task[developer asks for code change] --> Memory[CLAUDE.md — standing project rules]
    Memory --> Router[instruction router — classify task]
    Router -->|matches description| Skill[skill — detailed task procedure]
    Router -->|human invokes workflow| Command[command — explicit operation]
    Skill --> Verify[verification recipe — tests and checks]
    Command --> Verify
    Plugin[plugin — packaged team capability] --> Skill
    Plugin --> Command
    MCP[MCP server — external tool boundary] --> Skill
    Verify --> Output[code change with evidence]

Keep CLAUDE.md boring.

Put only rules that are true for almost every task: build commands, schema constraints, forbidden files, deployment model, and non-negotiable repo conventions. For an Astro technical blog, that means rules like “posts live in src/content/blog/,” “never add type frontmatter,” and “run npm run check plus ASTRO_TELEMETRY_DISABLED=1 npm run build before push.”

Verification: Start a clean session and ask for an unrelated task. If more than 10 percent of the visible instruction text is irrelevant to that task, the memory file is carrying skill content.
Move specialized work into skills.

A review procedure, migration checklist, blog editorial rubric, incident summary format, or security audit should be a skill with a narrow description. Claude Code skills use SKILL.md with frontmatter; the directory name becomes the invocation name, and the description helps decide automatic loading, according to the Claude Code skills documentation.

Verification: Create five representative prompts: one that should trigger the skill, three that should not, and one ambiguous prompt. The ambiguous case is the useful one. If it loads the skill accidentally, tighten the description.
Treat command-style workflows as human intent.

Current Claude Code documentation says custom commands have merged into skills: .claude/commands/deploy.md and .claude/skills/deploy/SKILL.md both create /deploy, while skills add supporting files and invocation controls. The conceptual distinction still matters. A deploy review, release note, data backfill, or rollback plan should require explicit invocation because the timing matters.

Verification: The workflow should not activate from vague language like “clean this up.” It should activate when the user calls the named operation or asks for that exact workflow.
Package team standards as plugins.

Plugins are the distribution layer. Claude’s plugin reference says plugins can add skills, commands, agents, hooks, and MCP servers, with plugin skills automatically discovered after installation. That makes plugins closer to internal developer tooling than prompt snippets.

Verification: Pin plugin versions in onboarding docs, keep a changelog, and run the same five-to-ten task evaluation set before and after plugin changes.
Put MCP behind permission and latency budgets.

MCP is where the assistant crosses from prompt behavior into real systems: repositories, calendars, issue trackers, databases, observability, and internal docs. Claude Code can expose MCP prompts as commands and can load tools eagerly with alwaysLoad, but eager loading changes startup behavior.

Verification: Record tool-call count, failed-tool rate, and first-response latency before enabling a new MCP server by default. If the server is not needed in most sessions, keep it discoverable rather than always loaded.

In Practice

The documented pattern from Anthropic is already a control-plane model, even if the file names make it look like convenience scripting.

Publicly documented behavior	Engineering lesson
Claude Code settings describe memory files, settings files, skills, and MCP servers as distinct customization surfaces, with managed settings taking precedence over user and project levels	Enterprise policy belongs in managed configuration, not in every repository’s prompt file
The skills docs define enterprise, personal, project, and plugin skill locations; name conflicts resolve enterprise over personal over project, while plugin skills use a plugin namespace	Skill names are API surface. Treat them like command names in a CLI, not folder labels
The slash command docs state that custom commands have merged into skills while existing `.claude/commands/` files keep working	Governance should be based on invocation semantics and ownership, not the legacy directory path
The MCP docs say prompts exposed by servers appear as commands such as `/mcp__servername__promptname`	External systems can inject operational workflows into the assistant surface, so server naming and prompt design need review
The MCP docs also specify `alwaysLoad` for Claude Code v2.1.121 or later and note startup blocking up to the standard five-second connect timeout	Tool loading is a reliability decision, not just a convenience setting

I have not run Anthropic’s managed Claude Code configuration across Raj’s organization, so the honest claim is narrower: the documented failure mode is instruction drift. If enterprise, personal, project, plugin, and MCP layers all carry overlapping review rules, the assistant can follow a different policy depending on machine, repository, plugin install, and session startup path.

That is familiar engineering terrain. PostgreSQL configuration has postgresql.conf, ALTER SYSTEM, role settings, database settings, and session settings for a reason: operational control depends on knowing which layer wins. Agent instruction stacks need the same discipline. The fact that the payload is Markdown instead of shared_buffers = 8GB does not make it less operational.

A practical evaluation does not need a large benchmark. It needs a fixed task suite and observable routing outcomes. For a repository using CLAUDE.md, skills, commands, plugins, and MCP, run the same prompts before and after an instruction change and record whether the right layer loaded.

Test prompt	Expected layer	Measurement
“Fix the Astro type error in the blog index page”	`CLAUDE.md` only, plus normal code tools	Did a blog-writing skill stay unloaded? Did the assistant run the repo check command?
“Review this draft against the blog rubric”	Blog review skill	Did the skill load? Did it preserve SCQA, CARL, and 4P structure?
“Prepare a release checklist”	Explicit command-style workflow	Did it wait for a named release workflow instead of inferring one from vague language?
“Summarize the latest production incidents from the tracker”	MCP tool, only after permissioned tool use	Did it call the intended MCP server? Did it avoid unrelated local memory as evidence?
“Clean this up”	No specialized workflow	Did broad skill descriptions cause accidental activation?

The useful numbers are simple: misrouted skill count, accidental command activation count, unnecessary MCP call count, and first-response latency. A before-and-after table with those four fields is enough to catch most instruction regressions.

Metric	Before instruction change	After instruction change	Target
Skill misroutes across fixed task suite	Measured count	Measured count	Lower
Accidental command-style workflow activation	Measured count	Measured count	Zero
Unnecessary MCP calls	Measured count	Measured count	Lower
Median first-response latency	Measured time	Measured time	No regression without a reason

The point is not to prove that the assistant is globally better. The point is to prove that a prompt, skill, plugin, or MCP change did not move operational behavior in an unreviewed direction.

Where It Breaks

Failure mode	Trigger	Fix
Global memory overload	`CLAUDE.md` contains review checklists, release steps, coding style essays, and architecture history	Restrict it to invariants; move procedures into named skills
Accidental skill activation	Skill description uses broad phrases like “quality,” “architecture,” or “best practices”	Write descriptions around user intent, input shape, and exclusion cases
Legacy command confusion	Both `.claude/commands/review.md` and `.claude/skills/review/SKILL.md` exist	Consolidate into a skill; keep one canonical invocation name
Plugin drift	Developers install different plugin versions or local forks	Version plugins, review diffs, and publish release notes like internal packages
MCP startup drag	`alwaysLoad: true` is applied to tools needed only in rare workflows	Use lazy discovery unless the first prompt truly depends on the tool
Hidden policy conflict	Enterprise, personal, and project skills define the same behavior differently	Assign ownership by layer: enterprise for policy, project for repo mechanics, personal for preferences
Unverified prompt edits	A small wording change changes model routing or test discipline	Maintain a regression set of representative tasks and compare outputs before rollout
Evaluation theater	The task suite only checks happy paths that should obviously trigger a skill	Include negative and ambiguous prompts; misrouting usually appears in the gray cases
Permission sprawl	MCP servers are added because they are convenient, not because the workflow requires them	Tie each tool surface to a named workflow, owner, and latency budget
Namespace sprawl	Skills, commands, plugin skills, and MCP prompts all expose similar names	Treat invocation names as public interfaces; reserve names, document ownership, and remove duplicates

What to Do Next

Problem: Your coding agent is probably carrying too much always-on instruction and too little explicit routing.
Solution: Split instructions into invariants, skills, deliberate workflows, packaged capabilities, and tool boundaries.
Proof: Run a fixed five-to-ten prompt task suite before and after instruction changes, then compare misroutes, accidental workflow activation, unnecessary MCP calls, and first-response latency.
Action: This week, audit CLAUDE.md, .claude/skills/, .claude/commands/, plugin installs, and MCP configuration, then remove one procedural checklist from global memory and turn it into a tested skill.

The teams that win with coding agents will not have the longest prompt files; they will have the cleanest load boundaries.

Situation

The Problem

Layered Instruction Control Plane

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Agent Productivity Depends on Context Throughput

AI Cost Incident Runbook: What to Do When Monthly Token Spend Suddenly Doubles

Agent-to-Agent Review Loops