A Replit-for-agents clone fails when the mobile chat is treated as the platform instead of the control plane. The common version is “Swift app calls a coding agent and opens the last URL it sees.” The production version is a hosted agent bridge: the iOS app orchestrates state, while secrets, sandboxed execution, logs, retries, and preview artifacts live server-side.

Situation

AI app builders are moving from desktop coding assistants into chat-shaped product surfaces: mobile clients, internal portals, Slack commands, and browser agents. That shift changes the blast radius. A failed Codex or Claude Code session on a laptop is annoying; a failed hosted builder can leak API keys, fork duplicate projects, or leave paid model jobs running for 30 minutes.

Mobile-agent wrapperHosted agent bridge
RuntimeAgent logic pushed near the clientAgent logic runs behind an API
SecretsTempting to store in app configKept server-side or minted as short-lived tokens
PreviewParse URL from assistant textTyped artifact returned by job system
Failure handlingHung chat bubbleObservable state machine with retries

The important correction is that this is not “building Replit” yet. It is a prototype wrapper around a coding command-line interface (CLI), a tool run from a shell. That can still be useful, but only if the architecture admits what it is.

The Problem

The failure mode is not that the agent is bad at Swift. The failure mode is boundary confusion: chat, agent reasoning, generated-code execution, preview hosting, and deployment state are allowed to blur together.

Failure pointWhat breaksWhy it matters
API keys in iOSClaude, Vibe Code, or deployment keys can be extracted from binaries or local storageMobile clients are inspectable; “private app” is not a security boundary
Last-link parsingThe app opens the wrong URL or an old previewLarge language model (LLM) prose is not a protocol
No idempotency keyMobile retry creates two projects from one promptFlaky networks become duplicate builds and inconsistent project history
Long-running build in chat state“Jerry is thinking” hides compile, install, test, and deploy phasesUsers cannot tell whether to wait, retry, or inspect logs
No cost accountingReasoning mode and tool calls run without budget visibilityA single build loop can quietly become the most expensive button in the app

There is also a platform trap. If the client is a native iOS app that creates apps, executes generated code, or exposes app-building behavior, Apple review policy becomes part of the architecture. For personal use, a web app may be the right first target: faster iteration, fewer distribution constraints, and a cleaner fit for backend-heavy agent workflows.

The Implementation

The right architecture is a hosted agent bridge with typed artifacts. The iOS app is an orchestration UI. The bridge owns agent execution. The sandbox owns generated code. The preview service owns URLs. Datadog, OpenTelemetry, or LangSmith-style traces own the postmortem.

flowchart TD
    Client[iOS client] --> Bridge[agent-bridge-api]
    Bridge --> Agent[Claude Agent SDK — tool contract]
    Agent --> Sandbox[sandbox — isolated job with timeout]
    Sandbox --> CLI[vibe-code-cli — build, test, artifact manifest]
    CLI --> Preview[preview host — immutable bundle]
    Preview --> Bridge
    Bridge --> Client
    Bridge --> Trace[Datadog — request, model mode, cost]
  1. Define the bridge contract first: POST /agent/messages, GET /projects/{id}/events, and a typed event schema for agent_thinking, build_running, preview_ready, and failed_retryable.
    Confirm: the Swift client can render every state from mocked JSON.

  2. Keep Claude Agent SDK and Vibe Code CLI credentials out of the mobile app. Use server-side secrets, per-job environment variables, and short-lived preview tokens.
    Confirm: no production key appears in the .ipa, app logs, or device storage.

  3. Run generated code in isolated workspaces with timeouts, network policy, dependency allowlists, and artifact cleanup. Firecracker, Docker with strict profiles, or a managed sandbox can work; the boundary matters more than the brand.
    Confirm: one failed build cannot mutate another project or read another job’s files.

  4. Emit typed artifacts instead of scraping assistant text. A preview is {type, url, project_id, build_id}, not “the last URL in the message.”
    Confirm: the newest preview opens deterministically after retries and revisions.

  5. Use tiered model reasoning. Fast mode is right for UI glue, copy edits, and conventional CRUD screens. High reasoning belongs on architecture, ambiguous build failures, security review, and final diff review.
    Confirm: cost and latency are logged per request, not guessed from the invoice.

A design tool such as Stitch, Figma, or Paper can sit before implementation. That separation is healthy: design exploration should not compete with build repair in the same agent loop.

In Practice

The patterns below are mechanism-based failure analysis derived from how agentic app builder architectures behave, not a claim about a specific published postmortem. The simpler version of an agentic app builder ships first: mobile client calls the agent API, agent returns a URL in response text, client parses and opens it. That design creates predictable breakpoints because the client, bridge, sandbox, and preview service share one loosely typed conversation.

Action: Split the workflow into typed events and persisted job records. A mobile retry after a network timeout should reuse an idempotency_key tied to the user action, not the HTTP call. Preview delivery should emit a typed preview_ready artifact — {type, url, project_id, build_id} — rather than asking the client to parse the last blue link in a model message. Cost tracking should persist model_mode and cost_cents per job, not wait for the monthly invoice.

Result: The validation signal is operational determinism. Duplicate project creation becomes detectable. Preview URLs stop depending on LLM prose formatting. A 15-20 minute build loop is visible as a specific job with cost, logs, artifacts, and exit code. Secret exposure risk moves out of the iOS app because execution happens behind the bridge with short-lived scoped tokens.

Learning: Agent quality is not the limiting factor in these failures. Runtime ownership is. Once the bridge owns execution, the client renders events rather than managing state, the sandbox becomes a replaceable implementation detail, and preview delivery stops depending on prose formatting. URLs are not an API just because they are blue.

Where It Breaks

Failure modeTriggerFix
App Store rejection riskNative app lets users generate or execute app-like codeStart as web app, or get explicit policy review before native distribution
Duplicate projectsiOS retries POST /agent/messages after timeoutRequire idempotency_key per user action
Secret exposureAPI keys placed in Swift config, Keychain, or bundled plistMove execution to hosted bridge; use short-lived scoped tokens only
Runaway model spendMaximum reasoning used for every edit-test cycleRoute by task type: fast for routine edits, high for architecture and failure analysis
Broken preview stateAssistant returns multiple links, old links, or Markdown-formatted linksReturn typed preview_ready artifacts from the bridge
Non-reproducible buildsSandbox installs floating dependencies on every runLock package versions, persist manifest, store generated files and command logs
Weak observabilityOnly client chat transcript is savedCapture agent trace, CLI logs, exit code, artifacts, and cost per build

What to Do Next

  • Problem: agentic app builders fail when chat UI, agent runtime, generated-code execution, and preview delivery are mixed together.
  • Solution: build a hosted agent bridge with typed events, sandboxed jobs, server-side secrets, and deterministic preview artifacts.
  • Proof: the first validation is operational: retry safety, reproducible logs, visible cost, and previews that open without parsing LLM prose.
  • Action: this week, write the bridge contract: message schema, artifact schema, error taxonomy, idempotency rules, and the exact log fields every build must persist.