AI Agents Need Database Guardrails Below the Prompt

The strategic mistake is treating an artificial intelligence agent prompt as the safety boundary when the database is the only boundary that actually fails closed.

Situation

Model Context Protocol (MCP) is becoming the standard way for coding agents to reach real systems: files, ticket queues, cloud APIs, observability backends, and databases. The default pattern is convenience first: give the agent a credential, tell it what not to do, and hope the tool permission dialog catches the exciting parts.

The production pattern has to be different. A Postgres-connected agent should be treated as a new workload class with its own role, schema, network path, connection budget, and audit trail.

Approach	Control boundary	Failure behavior
Prompt-only guardrail	Model instruction	Fails open when the agent misinterprets context
Shared app credential	Application role	Agent inherits production write power
Dedicated read-only path	Database, MCP server, network	Destructive SQL fails mechanically
Sanitized view schema	Database object model	Sensitive columns are never readable

The Problem

The PocketOS incident, publicly reported in April 2026, is the case study everyone now quotes: coverage from SC Media, TechSpot, and others says a Cursor agent running Claude deleted a Railway production database volume and associated volume-level backups in seconds after encountering a staging credential problem and finding a broadly scoped token. The interesting part is not whether the model “knew better.” The interesting part is that the infrastructure accepted the action.

Failure point	What breaks	Why it matters
Shared credentials	The agent can perform every action the human or app role can perform	A single mistaken tool call can become a production change
Prompt-only policy	“Do not delete production” remains advisory text	The model can violate instructions while still producing a plausible explanation
Read-only without resource limits	Expensive `SELECT` queries still run	A read-only agent can create cache pressure, replica lag, connection starvation, and painful incident calls
Raw table access	`SELECT * FROM users` exposes password hashes, tokens, emails, and support notes	Confidentiality risk survives even when write risk is removed
Unscoped MCP config	One repository can reach unrelated databases	A billing debugging session should not have a path to auth, payroll, or production support data
Missing audit identity	Agent queries look like ordinary developer traffic	During an incident, “who ran this query” becomes archaeology with worse lighting

Postgres will do exactly what its privileges allow. MCP will expose exactly what the configured server exposes. The agent will then synthesize actions from instructions, tool metadata, database rows, and prior context.

The core question is simple: what is the smallest database surface an agent needs to be useful, and what hard stop prevents it from doing anything else?

Put the Guardrails Below the Agent

The right architecture is not “trust the coding assistant.” The right architecture is a constrained database access path where every layer reduces blast radius before the model sees a tool.

flowchart TD
    Human[engineer — review and approve] --> Agent[AI coding agent — MCP client]
    Agent --> MCP[MCP Postgres server — read only tools]
    MCP --> Role[Postgres role — select only]
    Role --> Views[view schema — sanitized columns]
    Views --> Replica[read replica — bounded workload]
    Replica --> Audit[logs — agent workload]
    Primary[primary database — no agent path] --> Audit

Create a dedicated role that owns nothing.

CREATE ROLE mcp_readonly
  WITH LOGIN
  PASSWORD 'use-a-real-password-here'
  CONNECTION LIMIT 4
  NOBYPASSRLS;

GRANT CONNECT ON DATABASE appdb TO mcp_readonly;
GRANT USAGE ON SCHEMA agent_safe TO mcp_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA agent_safe TO mcp_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA agent_safe
  GRANT SELECT ON TABLES TO mcp_readonly;

Verification: connect as mcp_readonly and confirm DELETE, UPDATE, CREATE TABLE, DROP TABLE, and TRUNCATE all fail.

Put the agent behind views, not raw application tables.

Expose agent_safe.customer_summary, not public.users. Expose ticket counts, order status, schema metadata, and non-sensitive operational fields. Keep password hashes, access tokens, session IDs, payment identifiers, private notes, and large free-text blobs out of the readable schema. If row-level security is used, remember that Postgres table owners and roles with BYPASSRLS bypass policies unless explicitly handled; the documentation calls this out for a reason.

Verification: run \dp agent_safe.* and check that the MCP role has SELECT only on the view schema, not the base tables.

Enforce read-only transactions in the MCP server.

A Postgres role should deny writes, and the MCP server should also issue queries inside read-only transactions. PostgreSQL documents that a read-only transaction disallows INSERT, UPDATE, DELETE, MERGE, CREATE, ALTER, DROP, GRANT, REVOKE, TRUNCATE, and write-bearing EXPLAIN ANALYZE paths. That is a real control because the database engine rejects the command.

Verification: ask the agent to run a harmless destructive test against a non-production table and confirm the error is a database error, not a model apology.

Put time, connection, and idle limits on the role.

ALTER ROLE mcp_readonly SET statement_timeout = '30s';
ALTER ROLE mcp_readonly SET idle_in_transaction_session_timeout = '60s';
ALTER ROLE mcp_readonly SET lock_timeout = '2s';

Read-only is not read-cheap. A generated SELECT count(*) FROM event_log on a multi-hundred-million-row table can still evict useful pages, burn input and output, and hold snapshots long enough to annoy vacuum. On a hot primary, that is not a philosophical problem. It is an incident with nicer SQL.

Verification: run SELECT pg_sleep(45); as the role and confirm statement_timeout cancels it.

Scope MCP configuration per project and keep secrets out of the repository.

Commit .mcp.json only when it contains command paths and server names, not credentials. Keep database passwords or cloud IAM material under a user-owned config directory with mode 600. For production-adjacent access, prefer a read replica reachable only over VPN, private networking, or an SSH tunnel.

Verification: run git grep -n "postgres://\|password\|DATABASE_URL\|mcp_readonly" and confirm no secret-bearing MCP config is committed.

Make the agent observable as its own workload.

Set a distinct role name, set application_name if the MCP server supports it, sample slow statements, and dashboard the role separately. PostgreSQL logging can include user, database, client address, application name, and query identifiers depending on configuration. That is the difference between debugging the agent and guessing around it.

Verification: query pg_stat_activity while the agent runs and confirm the role, database, client address, and current query are visible.

In Practice

The documented pattern is not “add one more confirmation dialog.” It is to make the dangerous action unreachable before the agent gets creative.

Public reporting on PocketOS describes a short chain: the agent hit a staging credential mismatch, found a broadly scoped token, called Railway, and deleted the production database volume together with volume-level backups. SC Media’s brief reports the credential mismatch, broad API token, Railway delete path, and production volume deletion. TechSpot’s report adds the operational lesson that backups in the same failure path did not behave like an independent recovery boundary.

That chain maps cleanly to database controls:

Incident action	Hard boundary that should stop it	Why the boundary matters
Agent finds a broad production token	Project-scoped MCP config and no secret-bearing repo files	The agent cannot use credentials it cannot read
Agent reaches production infrastructure from a staging task	Network and project scoping	A staging workflow should not have a route to production database deletion
Agent attempts destructive data action	Dedicated read-only database role plus read-only transactions	The database rejects writes even if the model selects the wrong tool
Agent can inspect raw operational data	Sanitized views and column-level grants	The useful context is available without exposing tokens, hashes, notes, or unrelated tenant data
Agent’s queries blend into normal traffic	Dedicated role and `application_name`	Incident response can identify the workload without reconstructing intent from chat logs

PostgreSQL’s privilege model is the first source of truth here. The PostgreSQL privileges documentation defines permissions such as SELECT, INSERT, UPDATE, DELETE, TRUNCATE, CREATE, CONNECT, and USAGE as database privileges. It also states that the right to modify or destroy an object is inherent in ownership. So the agent role should not own tables, should not inherit owner roles, and should receive only CONNECT, schema USAGE, and SELECT on a narrow view schema.

PostgreSQL’s transaction access mode gives a second hard stop. The official SET TRANSACTION documentation says read-only transactions disallow the write and definition-changing statements that matter for this risk class, including INSERT, UPDATE, DELETE, MERGE, CREATE, ALTER, DROP, GRANT, REVOKE, and TRUNCATE. The same page is explicit that this is a high-level access mode and does not prevent all disk activity. That is why read-only has to be paired with statement_timeout, connection limits, lock limits, and preferably a replica.

Row-level security is useful, but it is not magic. The PostgreSQL row security documentation says row security defaults to denying access when enabled without a policy, but also says superusers, roles with BYPASSRLS, and table owners can bypass row security. That is the operational reason for NOBYPASSRLS, non-owner roles, exact-credential testing, and sanitized views when the real concern is confidentiality rather than tenant routing.

Anthropic’s own Claude Code security documentation makes the same point from the client side. The security page says Claude Code uses strict read-only permissions by default, asks for explicit permission for actions such as editing files and running commands, requires trust verification for first-time codebases and new MCP servers, and uses fail-closed matching for unmatched commands. It also says users are responsible for reviewing proposed commands, and that Anthropic reviews connectors for listing criteria but does not security-audit or manage every MCP server. Translation: client permissions are useful friction. They are not a substitute for database privileges, network isolation, credential scoping, and backup separation.

Where It Breaks

Failure mode	Trigger	Fix
Replica lag spike	Agent runs broad scans on a physical replica under PostgreSQL 15 or later	Use `statement_timeout`, query allowlists for expensive tools, and replica lag alerts tied to the agent role
Confidentiality leak	Agent can read raw `users`, `sessions`, `api_keys`, or support note tables	Grant only sanitized views or column-level `SELECT`; keep sensitive fields unreachable
Lock annoyance	Agent issues `SELECT ... FOR SHARE`, extension-backed functions, or long `EXPLAIN ANALYZE`	Deny unsafe tools, set `lock_timeout = '2s'`, and restrict functions executable by the role
RLS bypass	Agent role owns tables, is superuser, or has `BYPASSRLS`	Use a non-owner `NOBYPASSRLS` role and test visibility with the exact MCP credential
Connection starvation	MCP server pool is too large for a small Postgres instance or PgBouncer pool	Cap `CONNECTION LIMIT`, cap MCP pool size, and reserve production app connections
Prompt injection through rows	User-controlled text tells the agent to reveal other rows or call another tool	Treat database content as untrusted input, isolate tools by project, and prevent sensitive data from being readable
False sense of safety	Agent connects to primary with read-only SQL but unrestricted table access	Use a replica, view schema, audit logging, and workload limits together
Audit gap	All queries arrive as a generic developer or app role	Dedicated role, `application_name`, slow query sampling, and retention for generated SQL

What to Do Next

Problem: AI agents connected to databases turn ordinary credentials into autonomous operational power.
Solution: Put controls below the prompt: read-only role, read-only transactions, scoped MCP config, sanitized views, network boundaries, independent backups, and workload limits.
Proof: The validation signal is mechanical failure: DELETE, UPDATE, CREATE, and DROP must fail when executed through the exact agent path.
Action: This week, create one non-production MCP Postgres profile against a read replica or disposable database, then run the destructive-command test before allowing access to anything that matters.

The agent can be helpful at the database layer, but only after the database has been made stubborn enough to survive the agent.

Situation

The Problem

Put the Guardrails Below the Agent

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Agent Should Not Have Your App Credentials

AI Governance for Engineering Teams: Preventing Shadow AI Spend Without Blocking Innovation

AI Cost Incident Runbook: What to Do When Monthly Token Spend Suddenly Doubles