Context Anxiety and Harness Decay
Content reflects the state as of February 2026. AI tooling and model capabilities in this area change frequently.
A harness that patches around today’s model weakness can become tomorrow’s technical debt. Agent teams often add rules after a bad run: always restate the plan, never call this tool first, summarize every file, ask for approval every time. Some rules are durable. Others are workarounds for a specific model version.
Situation
Agent teams often add rules after a bad run: always restate the plan, never call this tool first, summarize every file, ask for approval every time. Some rules are durable. Others are workarounds for a specific model version.
The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.
| Operating layer | Default approach | Better alternative |
|---|---|---|
| Context | Rely on a long prompt or chat history | Give the agent task-specific evidence and rules |
| Tooling | Expose broad tools and inspect later | Expose narrow tools with clear approval boundaries |
| Verification | Read the final answer | Check the artifact, trace, and final state |
The Problem
As models improve, old workarounds can make the system slower, noisier, or less capable. The harness becomes a pile of anxieties rather than a clear execution contract.
The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.
| Failure point | What breaks | Why it matters |
|---|---|---|
| Weak boundary | Agent authority is broader than the task | A diagnostic run can become an unsafe change |
| Missing evidence | The agent cannot cite the state it used | Review becomes opinion instead of verification |
| No lifecycle | The workflow ends at a message | Ownership, audit, cleanup, and rollback disappear |
Stable Harness Contracts
Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help.
flowchart TD
A[task request — bounded intent] --> B[stable harness contracts — controls]
B --> C[tool execution — evidence collected]
C --> D[verification — final state checked]
D --> E[human handoff — audit retained]
-
Define the operating boundary.
Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs. -
Shape the evidence.
Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted. -
Require proof of completion.
Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.
Review harness rules like production code. Each rule needs an owner, reason, eval coverage, and removal condition.
In Practice
Context: Anthropic’s managed agents writing argues for decoupling the brain from the hands: stable interfaces and execution contracts should outlast current model implementations. Source: Anthropic, Scaling Managed Agents.
Action: Review harness rules like production code. Each rule needs an owner, reason, eval coverage, and removal condition.
Result: If removing a rule does not hurt eval outcomes, the rule was not a control; it was drag.
Learning: Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| Prompt fossil | Old workaround stays forever | Add expiration review |
| Over-constrained model | Agent cannot use improved capability | Retest against eval suite |
| Mixed concerns | Policy and style live in same prompt | Move policy to harness code |
| No ownership | Nobody can delete stale rules | Assign harness owners |
What to Do Next
- Problem: As models improve, old workarounds can make the system slower, noisier, or less capable. The harness becomes a pile of anxieties rather than a clear execution contract.
- Solution: Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help.
- Proof: If removing a rule does not hurt eval outcomes, the rule was not a control; it was drag.
- Action: Audit one agent instruction file and label each rule as policy, tool contract, style preference, or model workaround.
The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.