A long-running LLM coding session usually fails in a predictable, boring way: the context window fills up with operational residue before the implementation is finished.

Situation

Most LLM coding workflows treat the context window as both an execution environment and a system of record. That is fine for small, isolated edits. However, as agentic coding shifts toward multi-phase, architectural changes, the session needs to retain memory of decisions, progress, and recovery instructions over a much longer horizon.

The root cause of collapse is architectural. Large changes create more than one kind of state, and each kind ages differently:

State classExample
Repository understandingEntry points, call graphs, config surface
DecisionsPositional args vs required options
Execution progressPhase 1 done, Phase 2 partial
Recovery instructionsWhat to do after reset

The Problem

The failure signature is usually dull rather than dramatic. The session starts repeating conclusions it already reached, requires more prompting to stay on task, and spends tokens re-explaining the repository back to itself. This happens because token pressure compounds even when work is progressing: the session retains old hypotheses, rejected decisions, and raw tool output alongside the actual implementation state. The model keeps paying rent on old reasoning. Eventually, the operator faces a bad tradeoff: keep the context and risk degradation, or clear it and lose the implementation thread.

The checkpoint needs to preserve only the state that would be expensive to rediscover:

Persist thisDo not persist this
Locked decisionsFull reasoning transcript
Phase statusEvery exploratory dead end
Remaining risksRaw tool output
Exact resume pointVerbose prose summaries
Files/modules to re-readEphemeral conversational phrasing

How can an LLM session maintain durable state across a large implementation without collapsing under its own context weight?

Core Concept

The durable-state pattern separates planning from execution, externalizing execution state before the context window becomes the bottleneck.

ProblemDefault LLM workflowDurable-state workflow
Planning for multi-phase changesLives inside one context windowWritten to external state
Ambiguity handlingMixed into implementationResolved first as explicit unanswered questions
Token pressureGrows monotonicallyReset between phases
Session interruptionOften loses momentumResume with claude continue
Cross-session continuityWeakRestore from GitHub issue
Main failure modeContext collapseState drift between model view and filesystem
  1. Use the LLM for exploration and planning.
  2. Force it to emit unresolved questions first.
  3. Convert the result into a compact multi-phase checklist.
  4. Persist that checklist outside the context window (e.g., as a GitHub issue).
  5. Rehydrate the next session from that external state.
flowchart TD
    Engineer["Engineer"] -->|"Start in plan mode"| AgentA["Agent Session A"]
    AgentA -->|"Explore codebase"| Repo["Repository"]
    AgentA -->|"Return unresolved questions"| Engineer
    Engineer -->|"Provide answers"| AgentA
    AgentA -->|"Generate multi-phase plan"| Engineer
    Engineer -->|"Execute Phase 1"| AgentA
    AgentA -->|"Patch files"| Repo
    Engineer -->|"Execute Phase 2"| AgentA
    AgentA -->|"Create checkpoint issue"| GH["GitHub Issue"]
    Engineer -->|"Start fresh session"| AgentB["Agent Session B"]
    AgentB -->|"Read checkpoint issue"| GH
    AgentB -->|"Re-read relevant files"| Repo
    AgentB -->|"Resume at next pending phase"| Engineer

In Practice

The documented pattern for maintaining durable state relies on separating planning from execution. The underlying behavior of large language models dictates that as context windows fill with token-heavy tool output, instruction adherence degrades.

1. Start in plan mode, not patch mode A documented operational rule is to force the agent to surface uncertainties before it commits to an implementation path. Ambiguity is cheap to resolve during planning but expensive after a half-finished patch set exists.

Example operator sequence for planning:

claude
# instruct agent:
# - explore relevant files
# - stay concise
# - list unresolved questions first
# - do not implement yet

2. Compress the plan aggressively Compression reduces the token footprint while preserving operational meaning. “Strict by default, fuzzy flag optional” is compressed and useful. “Matching done” is operationally useless.

Example plan format:

Phase 1
- add parser opts
- validate mutually exclusive flags
- unit tests happy path

Phase 2
- strict/fuzzy matcher abstraction
- wire config
- test edge cases

3. Execute in bounded phases Phases are bounded units that keep the live context focused on the current step. The documented pattern is to checkpoint before the session feels degraded, not after. Waiting until the context is obviously degraded means the checkpoint itself may already be low quality.

for phase in plan.phases:
    implement(phase)
    inspect(diff)
    commit_or_iterate()
    if context_pressure_high:
        persist_state()
        clear_context()
        resume_from_external_state()

4. Persist execution state before the reset GitHub’s CLI (gh issue create) behaves as a low-friction state store. The issue becomes the working-memory checkpoint, capturing what is done, decisions that should not be reopened casually, remaining risks, and exact resume instructions.

GitHub issues work well here for documented operational reasons:

  • They are already part of the engineering workflow.
  • They are durable and searchable.
  • They are reviewable by humans.
  • They are easy to create from the command line.
  • They are stable across terminal resets and model restarts.
gh issue create \
  --title "LLM execution checkpoint: CLI refactor" \
  --body "$(cat plan-status.md)"

Recommended body shape:

## Current status
- [x] Phase 1: parser changes
- [ ] Phase 2: matcher abstraction

## Decisions locked
- required flags, not positional

## Resume instruction
Start at Phase 2. Re-read parser module and tests before editing matcher code.

5. Clear context and rehydrate cleanly By clearing the session and fetching the GitHub issue in a fresh prompt, the context resets to a low baseline. This bridges agent execution with normal engineering review habits.

# Session A
claude
# ... plan, implement, checkpoint to GitHub issue ...

# clear session

# Session B
claude
# instruct agent:
# fetch issue 24
# rebuild working context from issue
# continue at next unchecked phase

6. Resynchronize the filesystem deliberately Git behaves predictably when files are edited out-of-band: if an operator runs a formatter or modifies a file, the agent’s prior mental model is stale. The explicit refresh step forces the agent to re-read specific modules before executing the next phase.

Read issue 24.
Re-read parser.ts and parser.test.ts.
Assume any earlier mental model is stale.
Continue at Phase 2 only after confirming current file state.

7. Keep planning prompts and execution prompts structurally different Mode confusion occurs when planning and execution prompts sound similar. A planning prompt requires unresolved questions first; an execution prompt requires bounded diff generation against an existing plan.

Where It Breaks

ScenarioFailure ModeMitigation
Context collapse without checkpointsSession becomes slower and noisier over timePersist execution state before degradation
State drift from out-of-band editsAgent patches code against a stale mental modelExplicitly instruct agent to re-read files upon resume
Mode confusionAgent continues planning during executionKeep planning and execution prompts structurally different
Rapid parallel human editsRepository changes invalidate the checkpointEnsure the checkpoint locks specific, stable decisions
Summary driftEach new session interprets the checkpoint differentlyMake the checkpoint format stricter and operationally specific

What to Do Next

  • Problem: Long-running LLM coding sessions fail due to context collapse and state drift.
  • Solution: Separate planning from execution and externalize multi-phase checklists into GitHub issues.
  • Proof: Documented model behavior shows that clearing context and rehydrating from external text prevents instruction degradation.
  • Action: Adopt a lightweight GitHub issue template with fixed sections for completion state, locked decisions, open risks, and exact resume instructions to make cross-session recovery reliable.