Durable State for Long-Running LLM Coding Sessions
A long-running LLM coding session usually fails in a predictable, boring way: the context window fills up with operational residue before the implementation is finished.
Situation
Most LLM coding workflows treat the context window as both an execution environment and a system of record. That is fine for small, isolated edits. However, as agentic coding shifts toward multi-phase, architectural changes, the session needs to retain memory of decisions, progress, and recovery instructions over a much longer horizon.
The root cause of collapse is architectural. Large changes create more than one kind of state, and each kind ages differently:
| State class | Example |
|---|---|
| Repository understanding | Entry points, call graphs, config surface |
| Decisions | Positional args vs required options |
| Execution progress | Phase 1 done, Phase 2 partial |
| Recovery instructions | What to do after reset |
The Problem
The failure signature is usually dull rather than dramatic. The session starts repeating conclusions it already reached, requires more prompting to stay on task, and spends tokens re-explaining the repository back to itself. This happens because token pressure compounds even when work is progressing: the session retains old hypotheses, rejected decisions, and raw tool output alongside the actual implementation state. The model keeps paying rent on old reasoning. Eventually, the operator faces a bad tradeoff: keep the context and risk degradation, or clear it and lose the implementation thread.
The checkpoint needs to preserve only the state that would be expensive to rediscover:
| Persist this | Do not persist this |
|---|---|
| Locked decisions | Full reasoning transcript |
| Phase status | Every exploratory dead end |
| Remaining risks | Raw tool output |
| Exact resume point | Verbose prose summaries |
| Files/modules to re-read | Ephemeral conversational phrasing |
How can an LLM session maintain durable state across a large implementation without collapsing under its own context weight?
Core Concept
The durable-state pattern separates planning from execution, externalizing execution state before the context window becomes the bottleneck.
| Problem | Default LLM workflow | Durable-state workflow |
|---|---|---|
| Planning for multi-phase changes | Lives inside one context window | Written to external state |
| Ambiguity handling | Mixed into implementation | Resolved first as explicit unanswered questions |
| Token pressure | Grows monotonically | Reset between phases |
| Session interruption | Often loses momentum | Resume with claude continue |
| Cross-session continuity | Weak | Restore from GitHub issue |
| Main failure mode | Context collapse | State drift between model view and filesystem |
- Use the LLM for exploration and planning.
- Force it to emit unresolved questions first.
- Convert the result into a compact multi-phase checklist.
- Persist that checklist outside the context window (e.g., as a GitHub issue).
- Rehydrate the next session from that external state.
flowchart TD
Engineer["Engineer"] -->|"Start in plan mode"| AgentA["Agent Session A"]
AgentA -->|"Explore codebase"| Repo["Repository"]
AgentA -->|"Return unresolved questions"| Engineer
Engineer -->|"Provide answers"| AgentA
AgentA -->|"Generate multi-phase plan"| Engineer
Engineer -->|"Execute Phase 1"| AgentA
AgentA -->|"Patch files"| Repo
Engineer -->|"Execute Phase 2"| AgentA
AgentA -->|"Create checkpoint issue"| GH["GitHub Issue"]
Engineer -->|"Start fresh session"| AgentB["Agent Session B"]
AgentB -->|"Read checkpoint issue"| GH
AgentB -->|"Re-read relevant files"| Repo
AgentB -->|"Resume at next pending phase"| Engineer
In Practice
The documented pattern for maintaining durable state relies on separating planning from execution. The underlying behavior of large language models dictates that as context windows fill with token-heavy tool output, instruction adherence degrades.
1. Start in plan mode, not patch mode A documented operational rule is to force the agent to surface uncertainties before it commits to an implementation path. Ambiguity is cheap to resolve during planning but expensive after a half-finished patch set exists.
Example operator sequence for planning:
claude
# instruct agent:
# - explore relevant files
# - stay concise
# - list unresolved questions first
# - do not implement yet
2. Compress the plan aggressively Compression reduces the token footprint while preserving operational meaning. “Strict by default, fuzzy flag optional” is compressed and useful. “Matching done” is operationally useless.
Example plan format:
Phase 1
- add parser opts
- validate mutually exclusive flags
- unit tests happy path
Phase 2
- strict/fuzzy matcher abstraction
- wire config
- test edge cases
3. Execute in bounded phases Phases are bounded units that keep the live context focused on the current step. The documented pattern is to checkpoint before the session feels degraded, not after. Waiting until the context is obviously degraded means the checkpoint itself may already be low quality.
for phase in plan.phases:
implement(phase)
inspect(diff)
commit_or_iterate()
if context_pressure_high:
persist_state()
clear_context()
resume_from_external_state()
4. Persist execution state before the reset
GitHub’s CLI (gh issue create) behaves as a low-friction state store. The issue becomes the working-memory checkpoint, capturing what is done, decisions that should not be reopened casually, remaining risks, and exact resume instructions.
GitHub issues work well here for documented operational reasons:
- They are already part of the engineering workflow.
- They are durable and searchable.
- They are reviewable by humans.
- They are easy to create from the command line.
- They are stable across terminal resets and model restarts.
gh issue create \
--title "LLM execution checkpoint: CLI refactor" \
--body "$(cat plan-status.md)"
Recommended body shape:
## Current status
- [x] Phase 1: parser changes
- [ ] Phase 2: matcher abstraction
## Decisions locked
- required flags, not positional
## Resume instruction
Start at Phase 2. Re-read parser module and tests before editing matcher code.
5. Clear context and rehydrate cleanly By clearing the session and fetching the GitHub issue in a fresh prompt, the context resets to a low baseline. This bridges agent execution with normal engineering review habits.
# Session A
claude
# ... plan, implement, checkpoint to GitHub issue ...
# clear session
# Session B
claude
# instruct agent:
# fetch issue 24
# rebuild working context from issue
# continue at next unchecked phase
6. Resynchronize the filesystem deliberately Git behaves predictably when files are edited out-of-band: if an operator runs a formatter or modifies a file, the agent’s prior mental model is stale. The explicit refresh step forces the agent to re-read specific modules before executing the next phase.
Read issue 24.
Re-read parser.ts and parser.test.ts.
Assume any earlier mental model is stale.
Continue at Phase 2 only after confirming current file state.
7. Keep planning prompts and execution prompts structurally different Mode confusion occurs when planning and execution prompts sound similar. A planning prompt requires unresolved questions first; an execution prompt requires bounded diff generation against an existing plan.
Where It Breaks
| Scenario | Failure Mode | Mitigation |
|---|---|---|
| Context collapse without checkpoints | Session becomes slower and noisier over time | Persist execution state before degradation |
| State drift from out-of-band edits | Agent patches code against a stale mental model | Explicitly instruct agent to re-read files upon resume |
| Mode confusion | Agent continues planning during execution | Keep planning and execution prompts structurally different |
| Rapid parallel human edits | Repository changes invalidate the checkpoint | Ensure the checkpoint locks specific, stable decisions |
| Summary drift | Each new session interprets the checkpoint differently | Make the checkpoint format stricter and operationally specific |
What to Do Next
- Problem: Long-running LLM coding sessions fail due to context collapse and state drift.
- Solution: Separate planning from execution and externalize multi-phase checklists into GitHub issues.
- Proof: Documented model behavior shows that clearing context and rehydrating from external text prevents instruction degradation.
- Action: Adopt a lightweight GitHub issue template with fixed sections for completion state, locked decisions, open risks, and exact resume instructions to make cross-session recovery reliable.