Ad-hoc prompting against a non-deterministic system produces non-deterministic results. It is time to stop re-typing the same EXPLAIN ANALYZE prompts and start treating LLMs like testable system components.

Situation

Every DBA has a mental library of prompts. The one that pastes in EXPLAIN ANALYZE output and asks for index candidates. The one that diffs a schema and asks for a migration with a matching rollback. The one that reads a PagerDuty timeline and drafts an RCA doc. You’ve typed variants of these hundreds of times. Each new Claude Code session starts blank, so you spend the first three minutes reconstructing context — the table names, the engine version, the constraint that you’re on Aurora MySQL 3.04 so generated columns behave differently, the rule that every migration must include a CONCURRENTLY index build to avoid table locks at 400M rows.

The Problem

At scale, this overhead burns countless engineering hours. More importantly, the output varies wildly. Ask the same slow-query prompt five times across a week and you will get five different index candidates, three different confidence levels, and at least one suggestion that would cause a lock timeout on production.

The deeper failure is that ad-hoc prompting defeats the one thing that makes LLMs useful at scale: constraining the output shape. When an ad-hoc prompt returns whatever the model decides is useful that day against a 200M-row orders_fact table, it is not an acceptable risk posture. How do we eliminate ad-hoc prompting and ensure our database automation is repeatable, testable, and constrained?

Core Concept

The fix is codification. Turn your most-used database workflows into named Claude Code skills, benchmark them against historical workloads, and automate the routine ones on a schedule.

Step 1: Extract skill candidates. Open a session and paste in your recent Jira or Linear ticket titles, PagerDuty alerts, and Slack threads. Identify recurring task patterns and group them by trigger type. Common candidates include slow query triage, index bloat checks, migration generation, schema drift detection, and RCA doc generation.

Step 2: Write the skill files. Skills live in .claude/skills/ as Markdown files. Each file is an instruction set structured like a runbook.

# slow-query-triage

## Purpose
Analyze a slow query on Aurora PostgreSQL and return structured optimization candidates.

## Inputs
- $QUERY: the slow SQL statement
- $EXPLAIN: output of EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) run against the query
- $ENGINE_VERSION: PostgreSQL major version (e.g., 15)

## Steps
1. Parse $EXPLAIN for sequential scans, hash joins on large row estimates, and high buffer hits
2. For each seq scan: estimate selectivity using pg_stats.n_distinct and pg_stats.most_common_vals
3. Propose CREATE INDEX CONCURRENTLY statements; prefer partial indexes where filter predicate is stable
4. Flag any suggestion that requires a full table rewrite (adding NOT NULL without a default on PG < 11)
5. Assign a risk label: safe | lock-risk | rewrite-required

## Output format
Return exactly:
- EXPLAIN summary (2–3 sentences)
- Index candidates table: column | type | estimated selectivity | risk
- CREATE INDEX CONCURRENTLY statements, ready to copy
- Migration risk: safe | lock-risk | rewrite-required

Step 3: Build a workflow skill for migration cascade. Individual skills compose into workflow skills. A migration cascade skill chains: schema diff → migration SQL → rollback script → staging apply → row-count validation → draft PR. Each step calls a sub-skill or a direct tool invocation.

# migration-cascade

## Steps
1. Run /schema-diff against $CURRENT_SCHEMA and $TARGET_SCHEMA
2. Write V{n}__change.sql following Flyway naming convention
3. Write V{n}__rollback.sql; every DDL must have an explicit undo statement
4. Apply to $STAGING_URL using Flyway migrate; capture exit code
5. Validate: SELECT COUNT(*) FROM $TABLE before and after; assert counts match within 0.1%
6. Open draft GitHub PR; title format: "db: V{n} — {one-line description}"

## Abort conditions
- Flyway exit code != 0: stop, write error to stdout, do not open PR
- Row count delta > 0.1%: stop, flag for manual review

Step 4: Schedule the routine skills. Local schedules run while your machine is on and have access to your CLIs, credentials, and skill files. Cloud automations cannot reach your internal $PROD_RO_URL — use them only for tasks that operate on exported data.

flowchart TD
    Trigger[DBA trigger] --> OnDemand{on demand or scheduled?}

    OnDemand -->|on demand| Invoke[invoke skill in Claude Code]
    OnDemand -->|scheduled| Cron[cron shell script]

    Invoke --> SkillFile[skills — skill-name.md]
    Cron --> SkillFile

    SkillFile --> Claude[Claude reads skill context]

    Claude --> DB[(pg_stat_statements — read replica)]
    Claude --> Files[migration files and schema definitions]

    DB --> Output[structured output]
    Files --> Output

    Output --> Report[markdown report to db-health vault]
    Output --> PR[draft GitHub PR with rollback attached]
    Output --> Alert[Slack alert if threshold exceeded]

Step 5: Benchmark before you roll out. Pull historical slow queries from pg_stat_statements where you have ground truth. Run each through the skill. Measure if the recommended index matches what was actually deployed and whether the statement compiles against the current schema. Accept the skill only if it matches on both metrics for the golden set.

In Practice

The documented pattern for database reliability, as seen in GitLab’s public engineering handbooks, emphasizes strict, declarative query plan reviews before applying migrations. Translating this to an LLM-driven workflow means replacing chat windows with version-controlled skill definitions.

When evaluating query performance, PostgreSQL’s query planner behaves predictably given accurate table statistics. By forcing the LLM to analyze pg_stats.n_distinct and pg_stats.most_common_vals rather than guessing selectivity, the skill aligns its recommendations with how PostgreSQL actually executes the plan.

The documented pattern for safe schema changes requires that every data definition language (DDL) operation has an explicit, tested inverse. A migration cascade skill enforces this by automatically coupling the generated V{n}__change.sql with a syntactically valid V{n}__rollback.sql script, ensuring that lock-risk migrations on large tables can be immediately reverted if the application metrics degrade.

Where It Breaks

ScenarioFailure ModeMitigation
Aurora MySQL 3.xEXPLAIN FORMAT=TREE output differs from JSON, causing the skill to estimate selectivity incorrectly.Pin the $ENGINE_VERSION input and branch the parsing logic in the skill.
Complex constraintsA DROP COLUMN with check constraints cannot be naively rolled back with ADD COLUMN.Add an explicit step to dump the column definition from information_schema.columns before generating the migration.
Model updatesA model update changes the output format, turning a structured index table into prose.Run a weekly cron against your benchmark suite and alert on output format regression.
Large EXPLAIN outputA 12-table join on a 500M-row table exceeds the token budget for the context window.Truncate to the first 200 lines and extract only seq scan and hash join nodes before invoking the skill.

What to Do Next

  • Problem: Ad-hoc LLM prompts for database triage yield non-deterministic results and are impossible to benchmark.
  • Solution: Codify repetitive tasks into testable, version-controlled skill files that enforce structured output.
  • Proof: PostgreSQL’s pg_stat_statements provides a ground-truth dataset to benchmark skill accuracy against historical deployments.
  • Action: Pull the last 20 slow queries from pg_stat_statements, write a .claude/skills/slow-query-triage.md file, and measure how often the skill’s suggested index matches historical decisions.