Context Anxiety and Harness Decay

A harness that patches around today’s model weakness can become tomorrow’s technical debt. Agent teams often add rules after a bad run: always restate the plan, never call this tool first, summarize every file, ask for approval every time. Some rules are durable. Others are workarounds for a specific model version.

Situation

Agent teams often add rules after a bad run: always restate the plan, never call this tool first, summarize every file, ask for approval every time. Some rules are durable. Others are workarounds for a specific model version.

The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.

Operating layer	Default approach	Better alternative
Context	Rely on a long prompt or chat history	Give the agent task-specific evidence and rules
Tooling	Expose broad tools and inspect later	Expose narrow tools with clear approval boundaries
Verification	Read the final answer	Check the artifact, trace, and final state

The Problem

As models improve, old workarounds can make the system slower, noisier, or less capable. The harness becomes a pile of anxieties rather than a clear execution contract.

The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.

Failure point	What breaks	Why it matters
Weak boundary	Agent authority is broader than the task	A diagnostic run can become an unsafe change
Missing evidence	The agent cannot cite the state it used	Review becomes opinion instead of verification
No lifecycle	The workflow ends at a message	Ownership, audit, cleanup, and rollback disappear

Stable Harness Contracts

Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help.

flowchart TD
    A[task request — bounded intent] --> B[stable harness contracts — controls]
    B --> C[tool execution — evidence collected]
    C --> D[verification — final state checked]
    D --> E[human handoff — audit retained]

Define the operating boundary.
Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs.
Shape the evidence.
Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted.
Require proof of completion.
Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.

Review harness rules like production code. Each rule needs an owner, reason, eval coverage, and removal condition.

In Practice

Context: Anthropic’s managed agents writing argues for decoupling the brain from the hands: stable interfaces and execution contracts should outlast current model implementations. Source: Anthropic, Scaling Managed Agents.

Action: Review harness rules like production code. Each rule needs an owner, reason, eval coverage, and removal condition.

Result: If removing a rule does not hurt eval outcomes, the rule was not a control; it was drag.

Learning: Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.

Where It Breaks

Failure mode	Trigger	Fix
Prompt fossil	Old workaround stays forever	Add expiration review
Over-constrained model	Agent cannot use improved capability	Retest against eval suite
Mixed concerns	Policy and style live in same prompt	Move policy to harness code
No ownership	Nobody can delete stale rules	Assign harness owners

What to Do Next

Problem: As models improve, old workarounds can make the system slower, noisier, or less capable. The harness becomes a pile of anxieties rather than a clear execution contract.
Solution: Separate durable controls from model-specific prompts. Durable controls include permissions, tool APIs, evals, logging, and approval gates. Prompt workarounds should expire unless evals prove they still help.
Proof: If removing a rule does not hurt eval outcomes, the rule was not a control; it was drag.
Action: Audit one agent instruction file and label each rule as policy, tool contract, style preference, or model workaround.

The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.

Situation

The Problem

Stable Harness Contracts

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

AI Governance for Engineering Teams: Preventing Shadow AI Spend Without Blocking Innovation

AI Cost Incident Runbook: What to Do When Monthly Token Spend Suddenly Doubles

AI Coding Assistant ROI: When $200/Developer/Month Is Cheap — and When It Is Waste