Prompt Architecture Needs Load Boundaries
The default approach is a single always-on instruction pile; the production alternative is a layered instruction architecture where project memory, task skills, explicit commands, plugins, and Model Context Protocol integrations each have a load boundary.
Situation
AI coding assistants have moved from autocomplete into the build path: they read diffs, edit production code, run tests, call tools, and increasingly encode team workflow. That changes prompt files from personal preference into operational configuration.
Claude Code makes this visible through CLAUDE.md, skills, slash-style invocation, plugins, and Model Context Protocol servers. The engineering question is not “where do I put this prompt?” The question is: which instructions must be present on every turn, which should be loaded only when relevant, which require human intent, and which should be distributed as versioned team infrastructure?
| Layer | Primary job | Load boundary | Production risk |
|---|---|---|---|
CLAUDE.md | Repository memory and standing rules | Loaded at startup | Context bloat and stale global policy |
| Skill | Task-specific procedure | Auto-loaded or invoked by name | Bad descriptions cause missed or accidental routing |
| Command-style invocation | Human-triggered workflow | Explicit user call | Becomes tribal automation if not versioned |
| Plugin | Distribution package | Installed capability bundle | Silent behavior drift across machines |
| MCP server | External tools and data | Connected tool surface | Latency, permission, and data boundary failures |
The Problem
Instruction systems fail the same way configuration systems fail: the first version is convenient, the fifth version is ambiguous, and the tenth version has undocumented precedence. A prompt layer that starts as “be concise and run tests” becomes a half-remembered operating manual for release policy, coding style, database migrations, security review, and incident response.
| Failure point | What breaks | Why it matters |
|---|---|---|
CLAUDE.md becomes a wiki | Claude Code loads memory files at startup, so every unrelated task carries old instructions and repository lore | The model spends attention on irrelevant policy before it reads the actual change |
| Skills are described too broadly | A description like “use for code quality” can match refactors, reviews, bug fixes, and design work | The wrong procedure runs with confidence, which is worse than no procedure |
| Skill and command names collide | Claude Code docs state that a skill and .claude/commands/ file with the same name create the same invocation path, with the skill taking precedence | A developer may believe they invoked a command while the skill body controls behavior |
| Plugin installs are treated as local convenience | Plugins can bundle skills, commands, agents, hooks, and MCP configuration | A plugin update changes coding-agent behavior across a team without the review discipline normally applied to build tooling |
| MCP tools are always loaded without a reason | Claude Code alwaysLoad for MCP requires v2.1.121 or later and can block startup until connect, capped by the standard five-second timeout | Tool availability becomes part of first-prompt latency and reliability, not just a feature toggle |
The hard part is not creating more instructions. The hard part is keeping them governable after they become part of the engineering system.
Layered Instruction Control Plane
The right architecture is to treat agent instructions as a control plane with explicit ownership, routing, verification, and rollout. CLAUDE.md should contain only invariants. Skills should contain procedures. Command-style workflows should represent deliberate human operations. Plugins should package reusable capability. MCP servers should expose external state through bounded, permissioned tools.
flowchart TD
Task[developer asks for code change] --> Memory[CLAUDE.md — standing project rules]
Memory --> Router[instruction router — classify task]
Router -->|matches description| Skill[skill — detailed task procedure]
Router -->|human invokes workflow| Command[command — explicit operation]
Skill --> Verify[verification recipe — tests and checks]
Command --> Verify
Plugin[plugin — packaged team capability] --> Skill
Plugin --> Command
MCP[MCP server — external tool boundary] --> Skill
Verify --> Output[code change with evidence]
-
Keep
CLAUDE.mdboring.Put only rules that are true for almost every task: build commands, schema constraints, forbidden files, deployment model, and non-negotiable repo conventions. For an Astro technical blog, that means rules like “posts live in
src/content/blog/,” “never addtypefrontmatter,” and “runnpm run checkplusASTRO_TELEMETRY_DISABLED=1 npm run buildbefore push.”Verification: Start a clean session and ask for an unrelated task. If more than 10 percent of the visible instruction text is irrelevant to that task, the memory file is carrying skill content.
-
Move specialized work into skills.
A review procedure, migration checklist, blog editorial rubric, incident summary format, or security audit should be a skill with a narrow description. Claude Code skills use
SKILL.mdwith frontmatter; the directory name becomes the invocation name, and the description helps decide automatic loading, according to the Claude Code skills documentation.Verification: Create five representative prompts: one that should trigger the skill, three that should not, and one ambiguous prompt. The ambiguous case is the useful one. If it loads the skill accidentally, tighten the description.
-
Treat command-style workflows as human intent.
Current Claude Code documentation says custom commands have merged into skills:
.claude/commands/deploy.mdand.claude/skills/deploy/SKILL.mdboth create/deploy, while skills add supporting files and invocation controls. The conceptual distinction still matters. A deploy review, release note, data backfill, or rollback plan should require explicit invocation because the timing matters.Verification: The workflow should not activate from vague language like “clean this up.” It should activate when the user calls the named operation or asks for that exact workflow.
-
Package team standards as plugins.
Plugins are the distribution layer. Claude’s plugin reference says plugins can add skills, commands, agents, hooks, and MCP servers, with plugin skills automatically discovered after installation. That makes plugins closer to internal developer tooling than prompt snippets.
Verification: Pin plugin versions in onboarding docs, keep a changelog, and run the same five-to-ten task evaluation set before and after plugin changes.
-
Put MCP behind permission and latency budgets.
MCP is where the assistant crosses from prompt behavior into real systems: repositories, calendars, issue trackers, databases, observability, and internal docs. Claude Code can expose MCP prompts as commands and can load tools eagerly with
alwaysLoad, but eager loading changes startup behavior.Verification: Record tool-call count, failed-tool rate, and first-response latency before enabling a new MCP server by default. If the server is not needed in most sessions, keep it discoverable rather than always loaded.
In Practice
The documented pattern from Anthropic is already a control-plane model, even if the file names make it look like convenience scripting.
| Publicly documented behavior | Engineering lesson |
|---|---|
| Claude Code settings describe memory files, settings files, skills, and MCP servers as distinct customization surfaces, with managed settings taking precedence over user and project levels | Enterprise policy belongs in managed configuration, not in every repository’s prompt file |
| The skills docs define enterprise, personal, project, and plugin skill locations; name conflicts resolve enterprise over personal over project, while plugin skills use a plugin namespace | Skill names are API surface. Treat them like command names in a CLI, not folder labels |
The slash command docs state that custom commands have merged into skills while existing .claude/commands/ files keep working | Governance should be based on invocation semantics and ownership, not the legacy directory path |
The MCP docs say prompts exposed by servers appear as commands such as /mcp__servername__promptname | External systems can inject operational workflows into the assistant surface, so server naming and prompt design need review |
The MCP docs also specify alwaysLoad for Claude Code v2.1.121 or later and note startup blocking up to the standard five-second connect timeout | Tool loading is a reliability decision, not just a convenience setting |
I have not run Anthropic’s managed Claude Code configuration across Raj’s organization, so the honest claim is narrower: the documented failure mode is instruction drift. If enterprise, personal, project, plugin, and MCP layers all carry overlapping review rules, the assistant can follow a different policy depending on machine, repository, plugin install, and session startup path.
That is familiar engineering terrain. PostgreSQL configuration has postgresql.conf, ALTER SYSTEM, role settings, database settings, and session settings for a reason: operational control depends on knowing which layer wins. Agent instruction stacks need the same discipline. The fact that the payload is Markdown instead of shared_buffers = 8GB does not make it less operational.
A practical evaluation does not need a large benchmark. It needs a fixed task suite and observable routing outcomes. For a repository using CLAUDE.md, skills, commands, plugins, and MCP, run the same prompts before and after an instruction change and record whether the right layer loaded.
| Test prompt | Expected layer | Measurement |
|---|---|---|
| “Fix the Astro type error in the blog index page” | CLAUDE.md only, plus normal code tools | Did a blog-writing skill stay unloaded? Did the assistant run the repo check command? |
| “Review this draft against the blog rubric” | Blog review skill | Did the skill load? Did it preserve SCQA, CARL, and 4P structure? |
| “Prepare a release checklist” | Explicit command-style workflow | Did it wait for a named release workflow instead of inferring one from vague language? |
| “Summarize the latest production incidents from the tracker” | MCP tool, only after permissioned tool use | Did it call the intended MCP server? Did it avoid unrelated local memory as evidence? |
| “Clean this up” | No specialized workflow | Did broad skill descriptions cause accidental activation? |
The useful numbers are simple: misrouted skill count, accidental command activation count, unnecessary MCP call count, and first-response latency. A before-and-after table with those four fields is enough to catch most instruction regressions.
| Metric | Before instruction change | After instruction change | Target |
|---|---|---|---|
| Skill misroutes across fixed task suite | Measured count | Measured count | Lower |
| Accidental command-style workflow activation | Measured count | Measured count | Zero |
| Unnecessary MCP calls | Measured count | Measured count | Lower |
| Median first-response latency | Measured time | Measured time | No regression without a reason |
The point is not to prove that the assistant is globally better. The point is to prove that a prompt, skill, plugin, or MCP change did not move operational behavior in an unreviewed direction.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Global memory overload | CLAUDE.md contains review checklists, release steps, coding style essays, and architecture history | Restrict it to invariants; move procedures into named skills |
| Accidental skill activation | Skill description uses broad phrases like “quality,” “architecture,” or “best practices” | Write descriptions around user intent, input shape, and exclusion cases |
| Legacy command confusion | Both .claude/commands/review.md and .claude/skills/review/SKILL.md exist | Consolidate into a skill; keep one canonical invocation name |
| Plugin drift | Developers install different plugin versions or local forks | Version plugins, review diffs, and publish release notes like internal packages |
| MCP startup drag | alwaysLoad: true is applied to tools needed only in rare workflows | Use lazy discovery unless the first prompt truly depends on the tool |
| Hidden policy conflict | Enterprise, personal, and project skills define the same behavior differently | Assign ownership by layer: enterprise for policy, project for repo mechanics, personal for preferences |
| Unverified prompt edits | A small wording change changes model routing or test discipline | Maintain a regression set of representative tasks and compare outputs before rollout |
| Evaluation theater | The task suite only checks happy paths that should obviously trigger a skill | Include negative and ambiguous prompts; misrouting usually appears in the gray cases |
| Permission sprawl | MCP servers are added because they are convenient, not because the workflow requires them | Tie each tool surface to a named workflow, owner, and latency budget |
| Namespace sprawl | Skills, commands, plugin skills, and MCP prompts all expose similar names | Treat invocation names as public interfaces; reserve names, document ownership, and remove duplicates |
What to Do Next
- Problem: Your coding agent is probably carrying too much always-on instruction and too little explicit routing.
- Solution: Split instructions into invariants, skills, deliberate workflows, packaged capabilities, and tool boundaries.
- Proof: Run a fixed five-to-ten prompt task suite before and after instruction changes, then compare misroutes, accidental workflow activation, unnecessary MCP calls, and first-response latency.
- Action: This week, audit
CLAUDE.md,.claude/skills/,.claude/commands/, plugin installs, and MCP configuration, then remove one procedural checklist from global memory and turn it into a tested skill.
The teams that win with coding agents will not have the longest prompt files; they will have the cleanest load boundaries.