Database Runbooks as Agent Contracts

A runbook that depends on human intuition is not ready for an agent. Most database runbooks were written for experienced operators. They say check replication lag, inspect locks, validate backup health, or apply the standard rollback. A human knows which command to use, which output is suspicious, and when to stop.

Situation

Most database runbooks were written for experienced operators. They say check replication lag, inspect locks, validate backup health, or apply the standard rollback. A human knows which command to use, which output is suspicious, and when to stop.

The pattern matters for database, cloud, and platform teams because agents do not operate in a vacuum. They inherit repository rules, tool permissions, deployment workflows, incident history, and the quality of the evidence available to them.

Operating layer	Default approach	Better alternative
Context	Rely on a long prompt or chat history	Give the agent task-specific evidence and rules
Tooling	Expose broad tools and inspect later	Expose narrow tools with clear approval boundaries
Verification	Read the final answer	Check the artifact, trace, and final state

The Problem

Agents need the missing contract. Without exact inputs, commands, expected outputs, thresholds, and stop conditions, the agent fills gaps with inference. That is not acceptable for production databases.

The practical question is not whether an agent can produce a convincing response. The question is whether the engineering system around that response makes the work observable, reversible, and reviewable.

Failure point	What breaks	Why it matters
Weak boundary	Agent authority is broader than the task	A diagnostic run can become an unsafe change
Missing evidence	The agent cannot cite the state it used	Review becomes opinion instead of verification
No lifecycle	The workflow ends at a message	Ownership, audit, cleanup, and rollback disappear

Runbook Contract Architecture

Convert each runbook into a contract with five parts: trigger, allowed tools, required observations, decision table, and completion proof.

flowchart TD
    A[task request — bounded intent] --> B[runbook contract architecture — controls]
    B --> C[tool execution — evidence collected]
    C --> D[verification — final state checked]
    D --> E[human handoff — audit retained]

Define the operating boundary.
Write down the task class, allowed tools, environment, data class, and approval mode before the agent runs.
Shape the evidence.
Return compact observations instead of raw dumps. The agent should see enough to reason, but not so much that context is wasted.
Require proof of completion.
Completion should be an artifact or state check: a passing test, a reviewed plan, a valid rollback, a trace, or a linked ticket.

For each operational workflow, define what the agent may read, what it may draft, what requires approval, and which evidence must be attached to the final answer.

In Practice

Context: OpenAI’s Codex loop shows that tool outputs become future prompt context. A runbook therefore shapes not only the current action but the next reasoning step. Source: OpenAI, Unrolling the Codex agent loop.

Action: For each operational workflow, define what the agent may read, what it may draft, what requires approval, and which evidence must be attached to the final answer.

Result: A contract runbook can be tested in an eval harness against historical incidents before it is used in production.

Learning: Convert each runbook into a contract with five parts: trigger, allowed tools, required observations, decision table, and completion proof. This is a documented pattern or a direct consequence of how the named systems behave, not a fabricated production story.

Where It Breaks

Failure mode	Trigger	Fix
Ambiguous command	Runbook says check lag without naming query	Provide exact SQL or script
Hidden threshold	Only humans know what value is bad	Write thresholds and escalation rules
No abort path	Agent continues after unexpected output	Define stop conditions
No completion proof	Agent summarizes instead of verifying	Require evidence artifact and owner handoff

What to Do Next

Problem: Agents need the missing contract. Without exact inputs, commands, expected outputs, thresholds, and stop conditions, the agent fills gaps with inference. That is not acceptable for production databases.
Solution: Convert each runbook into a contract with five parts: trigger, allowed tools, required observations, decision table, and completion proof.
Proof: A contract runbook can be tested in an eval harness against historical incidents before it is used in production.
Action: Pick the replication-lag runbook and rewrite it as trigger, inputs, commands, thresholds, abort conditions, and proof of completion.

The teams that get value from agents will not be the teams with the longest prompts. They will be the teams that turn agent work into a controlled engineering workflow.

Situation

The Problem

Runbook Contract Architecture

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Stack for AI-Accelerated Database Operations Is Now Open Source

Stop Writing Ad-Hoc Queries: Build a Skill Backbone for Your DB Engineering Workflows

Top GitHub Breakouts: March 2026 — Agent Adaptation and Production-Scale Vector Search