Runtime Boundaries for Agentic App Builders
A Replit-for-agents clone fails when the mobile chat is treated as the platform instead of the control plane. The common version is “Swift app calls a coding agent and opens the last URL it sees.” The production version is a hosted agent bridge: the iOS app orchestrates state, while secrets, sandboxed execution, logs, retries, and preview artifacts live server-side.
Situation
AI app builders are moving from desktop coding assistants into chat-shaped product surfaces: mobile clients, internal portals, Slack commands, and browser agents. That shift changes the blast radius. A failed Codex or Claude Code session on a laptop is annoying; a failed hosted builder can leak API keys, fork duplicate projects, or leave paid model jobs running for 30 minutes.
| Mobile-agent wrapper | Hosted agent bridge | |
|---|---|---|
| Runtime | Agent logic pushed near the client | Agent logic runs behind an API |
| Secrets | Tempting to store in app config | Kept server-side or minted as short-lived tokens |
| Preview | Parse URL from assistant text | Typed artifact returned by job system |
| Failure handling | Hung chat bubble | Observable state machine with retries |
The important correction is that this is not “building Replit” yet. It is a prototype wrapper around a coding command-line interface (CLI), a tool run from a shell. That can still be useful, but only if the architecture admits what it is.
The Problem
The failure mode is not that the agent is bad at Swift. The failure mode is boundary confusion: chat, agent reasoning, generated-code execution, preview hosting, and deployment state are allowed to blur together.
| Failure point | What breaks | Why it matters |
|---|---|---|
| API keys in iOS | Claude, Vibe Code, or deployment keys can be extracted from binaries or local storage | Mobile clients are inspectable; “private app” is not a security boundary |
| Last-link parsing | The app opens the wrong URL or an old preview | Large language model (LLM) prose is not a protocol |
| No idempotency key | Mobile retry creates two projects from one prompt | Flaky networks become duplicate builds and inconsistent project history |
| Long-running build in chat state | “Jerry is thinking” hides compile, install, test, and deploy phases | Users cannot tell whether to wait, retry, or inspect logs |
| No cost accounting | Reasoning mode and tool calls run without budget visibility | A single build loop can quietly become the most expensive button in the app |
There is also a platform trap. If the client is a native iOS app that creates apps, executes generated code, or exposes app-building behavior, Apple review policy becomes part of the architecture. For personal use, a web app may be the right first target: faster iteration, fewer distribution constraints, and a cleaner fit for backend-heavy agent workflows.
The Implementation
The right architecture is a hosted agent bridge with typed artifacts. The iOS app is an orchestration UI. The bridge owns agent execution. The sandbox owns generated code. The preview service owns URLs. Datadog, OpenTelemetry, or LangSmith-style traces own the postmortem.
flowchart TD
Client[iOS client] --> Bridge[agent-bridge-api]
Bridge --> Agent[Claude Agent SDK — tool contract]
Agent --> Sandbox[sandbox — isolated job with timeout]
Sandbox --> CLI[vibe-code-cli — build, test, artifact manifest]
CLI --> Preview[preview host — immutable bundle]
Preview --> Bridge
Bridge --> Client
Bridge --> Trace[Datadog — request, model mode, cost]
-
Define the bridge contract first:
POST /agent/messages,GET /projects/{id}/events, and a typed event schema foragent_thinking,build_running,preview_ready, andfailed_retryable.
Confirm: the Swift client can render every state from mocked JSON. -
Keep Claude Agent SDK and Vibe Code CLI credentials out of the mobile app. Use server-side secrets, per-job environment variables, and short-lived preview tokens.
Confirm: no production key appears in the.ipa, app logs, or device storage. -
Run generated code in isolated workspaces with timeouts, network policy, dependency allowlists, and artifact cleanup. Firecracker, Docker with strict profiles, or a managed sandbox can work; the boundary matters more than the brand.
Confirm: one failed build cannot mutate another project or read another job’s files. -
Emit typed artifacts instead of scraping assistant text. A preview is
{type, url, project_id, build_id}, not “the last URL in the message.”
Confirm: the newest preview opens deterministically after retries and revisions. -
Use tiered model reasoning. Fast mode is right for UI glue, copy edits, and conventional CRUD screens. High reasoning belongs on architecture, ambiguous build failures, security review, and final diff review.
Confirm: cost and latency are logged per request, not guessed from the invoice.
A design tool such as Stitch, Figma, or Paper can sit before implementation. That separation is healthy: design exploration should not compete with build repair in the same agent loop.
In Practice
The patterns below are mechanism-based failure analysis derived from how agentic app builder architectures behave, not a claim about a specific published postmortem. The simpler version of an agentic app builder ships first: mobile client calls the agent API, agent returns a URL in response text, client parses and opens it. That design creates predictable breakpoints because the client, bridge, sandbox, and preview service share one loosely typed conversation.
Action: Split the workflow into typed events and persisted job records. A mobile retry after a network timeout should reuse an idempotency_key tied to the user action, not the HTTP call. Preview delivery should emit a typed preview_ready artifact — {type, url, project_id, build_id} — rather than asking the client to parse the last blue link in a model message. Cost tracking should persist model_mode and cost_cents per job, not wait for the monthly invoice.
Result: The validation signal is operational determinism. Duplicate project creation becomes detectable. Preview URLs stop depending on LLM prose formatting. A 15-20 minute build loop is visible as a specific job with cost, logs, artifacts, and exit code. Secret exposure risk moves out of the iOS app because execution happens behind the bridge with short-lived scoped tokens.
Learning: Agent quality is not the limiting factor in these failures. Runtime ownership is. Once the bridge owns execution, the client renders events rather than managing state, the sandbox becomes a replaceable implementation detail, and preview delivery stops depending on prose formatting. URLs are not an API just because they are blue.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| App Store rejection risk | Native app lets users generate or execute app-like code | Start as web app, or get explicit policy review before native distribution |
| Duplicate projects | iOS retries POST /agent/messages after timeout | Require idempotency_key per user action |
| Secret exposure | API keys placed in Swift config, Keychain, or bundled plist | Move execution to hosted bridge; use short-lived scoped tokens only |
| Runaway model spend | Maximum reasoning used for every edit-test cycle | Route by task type: fast for routine edits, high for architecture and failure analysis |
| Broken preview state | Assistant returns multiple links, old links, or Markdown-formatted links | Return typed preview_ready artifacts from the bridge |
| Non-reproducible builds | Sandbox installs floating dependencies on every run | Lock package versions, persist manifest, store generated files and command logs |
| Weak observability | Only client chat transcript is saved | Capture agent trace, CLI logs, exit code, artifacts, and cost per build |
What to Do Next
- Problem: agentic app builders fail when chat UI, agent runtime, generated-code execution, and preview delivery are mixed together.
- Solution: build a hosted agent bridge with typed events, sandboxed jobs, server-side secrets, and deterministic preview artifacts.
- Proof: the first validation is operational: retry safety, reproducible logs, visible cost, and previews that open without parsing LLM prose.
- Action: this week, write the bridge contract: message schema, artifact schema, error taxonomy, idempotency rules, and the exact log fields every build must persist.