The default approach is a single always-on instruction pile; the production alternative is a layered instruction architecture where project memory, task skills, explicit commands, plugins, and Model Context Protocol integrations each have a load boundary.

Situation

AI coding assistants have moved from autocomplete into the build path: they read diffs, edit production code, run tests, call tools, and increasingly encode team workflow. That changes prompt files from personal preference into operational configuration.

Claude Code makes this visible through CLAUDE.md, skills, slash-style invocation, plugins, and Model Context Protocol servers. The engineering question is not “where do I put this prompt?” The question is: which instructions must be present on every turn, which should be loaded only when relevant, which require human intent, and which should be distributed as versioned team infrastructure?

LayerPrimary jobLoad boundaryProduction risk
CLAUDE.mdRepository memory and standing rulesLoaded at startupContext bloat and stale global policy
SkillTask-specific procedureAuto-loaded or invoked by nameBad descriptions cause missed or accidental routing
Command-style invocationHuman-triggered workflowExplicit user callBecomes tribal automation if not versioned
PluginDistribution packageInstalled capability bundleSilent behavior drift across machines
MCP serverExternal tools and dataConnected tool surfaceLatency, permission, and data boundary failures

The Problem

Instruction systems fail the same way configuration systems fail: the first version is convenient, the fifth version is ambiguous, and the tenth version has undocumented precedence. A prompt layer that starts as “be concise and run tests” becomes a half-remembered operating manual for release policy, coding style, database migrations, security review, and incident response.

Failure pointWhat breaksWhy it matters
CLAUDE.md becomes a wikiClaude Code loads memory files at startup, so every unrelated task carries old instructions and repository loreThe model spends attention on irrelevant policy before it reads the actual change
Skills are described too broadlyA description like “use for code quality” can match refactors, reviews, bug fixes, and design workThe wrong procedure runs with confidence, which is worse than no procedure
Skill and command names collideClaude Code docs state that a skill and .claude/commands/ file with the same name create the same invocation path, with the skill taking precedenceA developer may believe they invoked a command while the skill body controls behavior
Plugin installs are treated as local conveniencePlugins can bundle skills, commands, agents, hooks, and MCP configurationA plugin update changes coding-agent behavior across a team without the review discipline normally applied to build tooling
MCP tools are always loaded without a reasonClaude Code alwaysLoad for MCP requires v2.1.121 or later and can block startup until connect, capped by the standard five-second timeoutTool availability becomes part of first-prompt latency and reliability, not just a feature toggle

The hard part is not creating more instructions. The hard part is keeping them governable after they become part of the engineering system.

Layered Instruction Control Plane

The right architecture is to treat agent instructions as a control plane with explicit ownership, routing, verification, and rollout. CLAUDE.md should contain only invariants. Skills should contain procedures. Command-style workflows should represent deliberate human operations. Plugins should package reusable capability. MCP servers should expose external state through bounded, permissioned tools.

flowchart TD
    Task[developer asks for code change] --> Memory[CLAUDE.md — standing project rules]
    Memory --> Router[instruction router — classify task]
    Router -->|matches description| Skill[skill — detailed task procedure]
    Router -->|human invokes workflow| Command[command — explicit operation]
    Skill --> Verify[verification recipe — tests and checks]
    Command --> Verify
    Plugin[plugin — packaged team capability] --> Skill
    Plugin --> Command
    MCP[MCP server — external tool boundary] --> Skill
    Verify --> Output[code change with evidence]
  1. Keep CLAUDE.md boring.

    Put only rules that are true for almost every task: build commands, schema constraints, forbidden files, deployment model, and non-negotiable repo conventions. For an Astro technical blog, that means rules like “posts live in src/content/blog/,” “never add type frontmatter,” and “run npm run check plus ASTRO_TELEMETRY_DISABLED=1 npm run build before push.”

    Verification: Start a clean session and ask for an unrelated task. If more than 10 percent of the visible instruction text is irrelevant to that task, the memory file is carrying skill content.

  2. Move specialized work into skills.

    A review procedure, migration checklist, blog editorial rubric, incident summary format, or security audit should be a skill with a narrow description. Claude Code skills use SKILL.md with frontmatter; the directory name becomes the invocation name, and the description helps decide automatic loading, according to the Claude Code skills documentation.

    Verification: Create five representative prompts: one that should trigger the skill, three that should not, and one ambiguous prompt. The ambiguous case is the useful one. If it loads the skill accidentally, tighten the description.

  3. Treat command-style workflows as human intent.

    Current Claude Code documentation says custom commands have merged into skills: .claude/commands/deploy.md and .claude/skills/deploy/SKILL.md both create /deploy, while skills add supporting files and invocation controls. The conceptual distinction still matters. A deploy review, release note, data backfill, or rollback plan should require explicit invocation because the timing matters.

    Verification: The workflow should not activate from vague language like “clean this up.” It should activate when the user calls the named operation or asks for that exact workflow.

  4. Package team standards as plugins.

    Plugins are the distribution layer. Claude’s plugin reference says plugins can add skills, commands, agents, hooks, and MCP servers, with plugin skills automatically discovered after installation. That makes plugins closer to internal developer tooling than prompt snippets.

    Verification: Pin plugin versions in onboarding docs, keep a changelog, and run the same five-to-ten task evaluation set before and after plugin changes.

  5. Put MCP behind permission and latency budgets.

    MCP is where the assistant crosses from prompt behavior into real systems: repositories, calendars, issue trackers, databases, observability, and internal docs. Claude Code can expose MCP prompts as commands and can load tools eagerly with alwaysLoad, but eager loading changes startup behavior.

    Verification: Record tool-call count, failed-tool rate, and first-response latency before enabling a new MCP server by default. If the server is not needed in most sessions, keep it discoverable rather than always loaded.

In Practice

The documented pattern from Anthropic is already a control-plane model, even if the file names make it look like convenience scripting.

Publicly documented behaviorEngineering lesson
Claude Code settings describe memory files, settings files, skills, and MCP servers as distinct customization surfaces, with managed settings taking precedence over user and project levelsEnterprise policy belongs in managed configuration, not in every repository’s prompt file
The skills docs define enterprise, personal, project, and plugin skill locations; name conflicts resolve enterprise over personal over project, while plugin skills use a plugin namespaceSkill names are API surface. Treat them like command names in a CLI, not folder labels
The slash command docs state that custom commands have merged into skills while existing .claude/commands/ files keep workingGovernance should be based on invocation semantics and ownership, not the legacy directory path
The MCP docs say prompts exposed by servers appear as commands such as /mcp__servername__promptnameExternal systems can inject operational workflows into the assistant surface, so server naming and prompt design need review
The MCP docs also specify alwaysLoad for Claude Code v2.1.121 or later and note startup blocking up to the standard five-second connect timeoutTool loading is a reliability decision, not just a convenience setting

I have not run Anthropic’s managed Claude Code configuration across Raj’s organization, so the honest claim is narrower: the documented failure mode is instruction drift. If enterprise, personal, project, plugin, and MCP layers all carry overlapping review rules, the assistant can follow a different policy depending on machine, repository, plugin install, and session startup path.

That is familiar engineering terrain. PostgreSQL configuration has postgresql.conf, ALTER SYSTEM, role settings, database settings, and session settings for a reason: operational control depends on knowing which layer wins. Agent instruction stacks need the same discipline. The fact that the payload is Markdown instead of shared_buffers = 8GB does not make it less operational.

A practical evaluation does not need a large benchmark. It needs a fixed task suite and observable routing outcomes. For a repository using CLAUDE.md, skills, commands, plugins, and MCP, run the same prompts before and after an instruction change and record whether the right layer loaded.

Test promptExpected layerMeasurement
“Fix the Astro type error in the blog index page”CLAUDE.md only, plus normal code toolsDid a blog-writing skill stay unloaded? Did the assistant run the repo check command?
“Review this draft against the blog rubric”Blog review skillDid the skill load? Did it preserve SCQA, CARL, and 4P structure?
“Prepare a release checklist”Explicit command-style workflowDid it wait for a named release workflow instead of inferring one from vague language?
“Summarize the latest production incidents from the tracker”MCP tool, only after permissioned tool useDid it call the intended MCP server? Did it avoid unrelated local memory as evidence?
“Clean this up”No specialized workflowDid broad skill descriptions cause accidental activation?

The useful numbers are simple: misrouted skill count, accidental command activation count, unnecessary MCP call count, and first-response latency. A before-and-after table with those four fields is enough to catch most instruction regressions.

MetricBefore instruction changeAfter instruction changeTarget
Skill misroutes across fixed task suiteMeasured countMeasured countLower
Accidental command-style workflow activationMeasured countMeasured countZero
Unnecessary MCP callsMeasured countMeasured countLower
Median first-response latencyMeasured timeMeasured timeNo regression without a reason

The point is not to prove that the assistant is globally better. The point is to prove that a prompt, skill, plugin, or MCP change did not move operational behavior in an unreviewed direction.

Where It Breaks

Failure modeTriggerFix
Global memory overloadCLAUDE.md contains review checklists, release steps, coding style essays, and architecture historyRestrict it to invariants; move procedures into named skills
Accidental skill activationSkill description uses broad phrases like “quality,” “architecture,” or “best practices”Write descriptions around user intent, input shape, and exclusion cases
Legacy command confusionBoth .claude/commands/review.md and .claude/skills/review/SKILL.md existConsolidate into a skill; keep one canonical invocation name
Plugin driftDevelopers install different plugin versions or local forksVersion plugins, review diffs, and publish release notes like internal packages
MCP startup dragalwaysLoad: true is applied to tools needed only in rare workflowsUse lazy discovery unless the first prompt truly depends on the tool
Hidden policy conflictEnterprise, personal, and project skills define the same behavior differentlyAssign ownership by layer: enterprise for policy, project for repo mechanics, personal for preferences
Unverified prompt editsA small wording change changes model routing or test disciplineMaintain a regression set of representative tasks and compare outputs before rollout
Evaluation theaterThe task suite only checks happy paths that should obviously trigger a skillInclude negative and ambiguous prompts; misrouting usually appears in the gray cases
Permission sprawlMCP servers are added because they are convenient, not because the workflow requires themTie each tool surface to a named workflow, owner, and latency budget
Namespace sprawlSkills, commands, plugin skills, and MCP prompts all expose similar namesTreat invocation names as public interfaces; reserve names, document ownership, and remove duplicates

What to Do Next

  • Problem: Your coding agent is probably carrying too much always-on instruction and too little explicit routing.
  • Solution: Split instructions into invariants, skills, deliberate workflows, packaged capabilities, and tool boundaries.
  • Proof: Run a fixed five-to-ten prompt task suite before and after instruction changes, then compare misroutes, accidental workflow activation, unnecessary MCP calls, and first-response latency.
  • Action: This week, audit CLAUDE.md, .claude/skills/, .claude/commands/, plugin installs, and MCP configuration, then remove one procedural checklist from global memory and turn it into a tested skill.

The teams that win with coding agents will not have the longest prompt files; they will have the cleanest load boundaries.