AI Engineering Operating Model

Agent Loop Anatomy for DB and Cloud Engineers

A practical mental model for how coding agents plan, call tools, observe results, and complete infrastructure work without treating the model response as the whole system.

#ai-engineering #architecture #databases #cloud

Jan 9, 2026 5 min read

L1 Field Note

Evals Are the New Unit Tests for Agents

Why database and cloud teams need agent eval harnesses that grade outcomes, not persuasive transcripts.

Jan 12, 2026 4 min read

L1 Field Note

Outcome-Based Agent Evaluation vs Transcript Review

A field note on why agent evaluation should measure verified state changes instead of polished reasoning traces.

2 Operating Model

Permission boundaries, autonomy controls, and efficiency decisions that govern what agents can do without human approval.

Jan 16, 2026 4 min read

L1 Field Note

Agent Autonomy Ladder: Manual, Confirm, Auto-Approve, Supervised

A governance model for deciding which database and cloud agent actions require approval and which can run automatically.

Feb 3, 2026 4 min read

L1 Field Note

Harness Engineering: The 2026 Breakthrough Concept

Why the real engineering surface around agents is the harness of tools, scripts, context, review, and telemetry.

Feb 17, 2026 4 min read

L1 Field Note

Token-Efficient Tool Use

How to design agent tool surfaces that preserve context budget for reasoning instead of wasting it on tool metadata and raw output.

Feb 20, 2026 4 min read

L1 Field Note

Tool Search vs Loading Every MCP Tool

Why production agents need discoverable tools and context budgets instead of one giant always-loaded MCP surface.

3 Production Patterns

Identity, observability, safe deployment, and context throughput — the operational concerns that only appear at scale.

May 29, 2026 7 min read

L2 Deep Dive

Agent Productivity Depends on Context Throughput

AI coding agents work better when voice, clipboard, screenshots, and MCP tools reduce context friction.

4 Historical Context — Earlier Agent Patterns

2024 writing on agent architectures, error amplification in multi-agent systems, and the shift from chat to goal-directed operation. Read these for perspective on how the field got here.

Mar 20, 2024 20 min read

L3 Reference Guide

How Paperclip Is Redefining AI Agent Orchestration for the Zero-Human Company

Paperclip's zero-human orchestration model — goal-directed agent teams instead of task-by-task prompting — and what that architecture requires from the software and data systems beneath it.

Mar 27, 2024 9 min read

L2 Deep Dive

#ai-engineering #architecture #checklist #failures

From Chat to Agents: Designing Goal-to-Result Systems for Real Work

Chat is request-response; agents are task systems that plan, call tools, iterate, and stop when done. The minimum architecture — loop, tools, bounded memory, stopping conditions — required to make the transition from chat reliable.

Apr 1, 2024 7 min read

L2 Deep Dive

#ai-engineering #architecture #failures

Independent Parallel Agents Don't Cancel Errors — They Amplify Them

Google Research found that independent parallel agents amplify errors 17x compared to centralized orchestrator topologies. Adding more agents to a system with a shared context defect makes it worse, not more resilient.

May 16, 2024 6 min read

L2 Deep Dive

Use Coding Agents as a Toolchain, Not a Vendor Bet

A production-minded workflow for running Cursor and Aider together without locking engineering practice to one agent.

Additional Posts

Related posts matched to this series by topic, tags, and keywords.

Jun 4, 2024 4 min read

L1 Field Note

The Database Observability Baseline: What Every DBA Dashboard Must Show

Before you can adopt AI-assisted triage, your database dashboard needs a foundation built on saturation, locking, and lag metrics.

#databases #architecture #failures #checklist

Mar 18, 2026 3 min read

L1 Field Note

The New AI FinOps Model: Seat Cost vs Token Cost vs Agent Runtime Cost

Why traditional SaaS spend models fail for agentic AI, and how platform teams are treating LLM compute like database provisioned IOPS.

#ai-engineering #cloud #architecture #failures

Aug 20, 2024 5 min read

L2 Deep Dive

PostgreSQL Observability: Vacuum, Bloat, Locks, Replication Lag, and Query Plans

Monitoring PostgreSQL requires looking past the operating system and into the internal bookkeeping of MVCC, autovacuum, and replication streams.

#databases #architecture #failures

Sep 17, 2024 6 min read

L2 Deep Dive

Cassandra Observability: Compaction, Tombstones, Repair, Latency, and Hot Partitions

Why generic server monitoring fails for Apache Cassandra, and how to track the true operational signals of a distributed masterless database.

#databases #architecture #failures

Apr 8, 2026 4 min read

L1 Field Note

Why Agentic AI Costs Explode: Context Size, Tool Calls, MCP Servers, Repo Size, and Retry Loops

Agentic AI systems can quietly accumulate massive API bills due to compounding context windows, retry loops, and unconstrained workspace parsing.

#ai-engineering #architecture #cloud #failures

Apr 15, 2026 5 min read

L1 Field Note

Engineering Fundamentals

AI Cost Observability Dashboard: LangSmith vs Helicone

How to build an AI FinOps dashboard and choose between proxy-based and instrumentation-based observability.

Nov 19, 2024 5 min read

L2 Deep Dive

Engineering Fundamentals

Cost Observability: Build Dashboards That Show Waste Before Finance Finds It

How to expand monitoring beyond uptime by building dashboards that expose underutilized RDS instances, EBS io2 waste, and backup retention drift.

#cloud #architecture #checklist

May 6, 2026 6 min read

L2 Deep Dive

Prompt Caching, Context Pruning, and Model Routing: Practical Ways to Reduce LLM Cost

How to combine semantic routing, structured context pruning, and prompt caching to reduce production LLM API costs without degrading application quality.

Aug 19, 2025 5 min read

L2 Deep Dive

FinOps Observability: Tie Cloud Cost to Workload, Team, Product, and Customer

How to connect engineering telemetry with cost telemetry to achieve granular cloud unit economics using FinOps principles and FOCUS standards.

#cloud #architecture #ai-engineering

Dec 9, 2025 6 min read

L2 Deep Dive

Telemetry Cost Control: Why Observability Data Itself Needs Governance

If you log everything and monitor every dimension, your observability bill will eventually exceed your database infrastructure bill. Here is how to fix it.

#cloud #architecture #ai-engineering

Jan 20, 2026 8 min read

L2 Deep Dive

#ai-engineering #architecture #failures #system-design

AI Agent Observability: Monitor Tool Calls, Token Spend, Latency, and Failure Loops

Why monitoring autonomous SRE agents requires tracking tool-call hallucinations, context window saturation, and recursive retry loops, rather than just basic CPU metrics.

Mar 10, 2026 8 min read

L2 Deep Dive

#ai-engineering #architecture #system-design #security

MCP Server Observability: The New Control Plane for AI + Enterprise Tools

How the Model Context Protocol (MCP) became the networking layer for AI agents, and why monitoring these connections is critical for enterprise security.

May 12, 2026 7 min read

L2 Deep Dive

#ai-engineering #architecture #system-design #cloud

Agentic SRE Architecture: Skills, Agents, MCP Servers, and Human Approval Loops

The definitive 2026 reference architecture for autonomous database operations, from detection to multi-agent diagnosis to human-in-the-loop remediation.

May 20, 2024 7 min read

L2 Deep Dive

The Harness Around the Agent: How Stripe Runs 1,000 Unattended Code Reviews per Week

Stripe's Minions system runs over a thousand AI code reviews weekly using a fork of an open-source agent. The reliability comes from the deterministic pipeline around it, not the model inside.

May 27, 2024 7 min read

L2 Deep Dive

AI Agents Need a Control Plane, Not More Interfaces

Production AI agents work best when coding, files, tools, and knowledge workflows share one governed execution model.

Jun 8, 2024 6 min read

L2 Deep Dive

Runtime Boundaries for Agentic App Builders

A hosted AI app generator fails when the mobile chat becomes the platform — API keys end up in binaries, execution state blurs with chat, and previews break without artifact handoff. The control-plane architecture that keeps these concerns separated.

Dec 2, 2024 12 min read

L1 Field Note

The Agent Should Not Have Your App Credentials

Giving an AI coding agent your application's Postgres credentials is the default mistake — the agent inherits every permission the app has. Database-enforced read-only roles, replica routing, query limits, and project-scoped MCP config are the alternative that actually fails closed.

#ai-engineering #databases #failures

Dec 10, 2024 10 min read

L3 Reference Guide

AI Agents Need Database Guardrails Below the Prompt

Prompt-level guardrails fail open when the agent misinterprets context. The only boundary that mechanically rejects destructive SQL is the database — dedicated read-only roles, sanitized view schemas, and a network path that application credentials never touch.

#ai-engineering #databases #failures

Dec 20, 2024 6 min read

L2 Deep Dive

Remote Agents Need Deployment, Permissions, and Feedback Loops

Codex mobile turns local agents into remote workflows, but production value depends on deployment, access control, and observability.

#ai-engineering #cloud #checklist

Mar 1, 2025 6 min read

L2 Deep Dive

#ai-engineering #checklist #architecture

Evaluate AI Agents by Completed Work, Not Token Price

Production AI agent selection should measure quality, retries, tokens, latency, and verification cost per completed task.

Mar 1, 2025 9 min read

L2 Deep Dive

Natural Language SQL Agents Need Guardrails Before Orchestration

How Postgres chat agents turn intent into SQL, and why production systems need schema controls, validation, and auditability.

May 17, 2025 8 min read

L2 Deep Dive

The Three-Layer Agent Infrastructure Stack for Database Operations (April 2025)

Building a database operations agent requires a workflow framework, production observability, and scalable inference — April 2025 shipped open-source solutions for all three layers simultaneously.

Jun 14, 2025 9 min read

L2 Deep Dive

Three Open-Source Tools Filling the Gaps in Database Operations (May 2025)

May 2025's most-starred new projects solve three specific database team problems: backup restores that are never verified, internal knowledge that can't be retrieved, and AI agents blind to your schema history.

Jun 21, 2025 7 min read

L2 Deep Dive

Top GitHub Breakouts: May 2025 — Agent Infrastructure Without Boilerplate

Three May 2025 open-source projects eliminate the manual scaffolding that blocks every AI agent deployment: orchestration glue, vector database setup, and MCP gateway configuration.

Jun 25, 2025 9 min read

L2 Deep Dive

Parallel AI Agents Need an Operating Model

Running many coding agents only works when git isolation, shared memory, permissions, hooks, and verification are designed as a system.

Jul 3, 2025 8 min read

L2 Deep Dive

#ai-engineering #architecture #failures

Personal AI Agents Fail in the Last 20 Percent of Integration

Self-hosted AI agents become useful only when model quality, tool access, memory, and setup completeness line up.

Jul 26, 2025 19 min read

L3 Reference Guide

#ai-engineering #databases #architecture

Natural Language SQL Agents Need Database Guardrails

The risk in a natural-language SQL agent is not bad SQL — it is authority compilation: a user sentence becomes a database operation unless the control plane proves, before execution, which role, rows, cost, and columns the query is allowed to touch.

Oct 14, 2025 7 min read

L2 Deep Dive

AI Agents in Platform Automation: Useful Assistant or Unreviewed Change Engine

When AI agents accelerate platform operations versus when they generate unreviewed changes — the permission boundary and audit design that separates useful from risky.

Dec 6, 2025 8 min read

L2 Deep Dive

The AI-Native Engineering Stack: Agents, Inference, and Knowledge Graphs in Production (November 2025)

Three November 2025 breakout projects eliminate the manual infrastructure build that blocks teams from running AI agents in production — covering agent backends, Kubernetes LLM inference, and SQL-driven knowledge retrieval.

Dec 20, 2025 8 min read

L2 Deep Dive

Automated Reliability Across the Stack: Database Backups, Platform Observability, and SQL Quality (November 2025)

Three November 2025 open-source releases eliminate manual work from three engineering reliability tasks — multi-database backup verification, self-hosted log and trace collection, and SQL static analysis in CI pipelines.

Jan 20, 2026 4 min read

L1 Field Note

#ai-engineering #databases #architecture

Agentic Code Review for Database Repositories

Database repositories contain hidden rules human reviewers know: never add a blocking index at peak hours, never widen IAM without owner approval. Agent review surfaces these violations before merge — without displacing the human judgment that set the rules.

Jan 30, 2026 4 min read

L1 Field Note

#databases #ai-engineering #architecture #checklist

Database Runbooks as Agent Contracts

A reference operating model for turning human database runbooks into machine-usable agent contracts.

Feb 6, 2026 4 min read

L1 Field Note

Agent-to-Agent Review Loops

A practical review pattern where one agent creates a change and specialized agents review risk, rollback, security, and observability.

Feb 13, 2026 4 min read

L1 Field Note

Application Legibility for Agents

A reference architecture for making logs, metrics, test output, schemas, and deployment history readable by coding agents.

Feb 24, 2026 4 min read

L1 Field Note

Programmatic Tool Calling for DB Automation

A reference pattern for keeping large database outputs out of model context by using scripts that summarize evidence before the agent sees it.

Feb 27, 2026 4 min read

L1 Field Note

#ai-engineering #architecture #failures

Context Anxiety and Harness Decay

Why agent harnesses become stale when they overfit today's model weaknesses instead of stable execution contracts.

Mar 22, 2026 7 min read

L2 Deep Dive

Top GitHub Breakouts: February 2026 — Local Agents and MCP Bridges

February 2026's highest-starred new open-source projects connecting AI agents to local infrastructure, Kubernetes clusters, and structured data without cloud API dependencies.

#ai-engineering #cloud #architecture

Apr 22, 2026 7 min read

L2 Deep Dive

#ai-engineering #databases #architecture

Top GitHub Breakouts: March 2026 — Agent Adaptation and Production-Scale Vector Search

The second wave of March 2026 breakouts: an agent that learns from every conversation, a Rust vector index that outperforms FAISS at a fraction of the memory, and a Kubernetes-native agent control plane.

May 22, 2026 8 min read

L2 Deep Dive

Top GitHub Breakouts: April 2026 — Production Agent Infrastructure

The highest-starred new open-source projects in April 2026 targeting production-scale AI agent memory, protocol enforcement, and Postgres environment management — what breaks when agents leave single-developer scope.

#ai-engineering #databases #cloud

Mar 9, 2021 7 min read

L2 Deep Dive

Service Catalogs Are Not Portals. They Are Control Planes

A service catalog that helps engineers find links is a directory. One that owns metadata, policy, workflow, and reconciliation is a platform control plane — and only the second one solves the real scaling problem.

#architecture #cloud

May 11, 2021 7 min read

L2 Deep Dive

CI/CD Pipelines Are Distributed Systems With Bad Observability

CI/CD pipelines fail as distributed coordination systems long before they fail as broken scripts — why build badges hide partial failures, flaky retries, and ordering gaps that only appear under real delivery load.

#architecture #failures #cloud

Dec 14, 2021 7 min read

L2 Deep Dive

Automation Incident Review: When the Tool Worked and the System Failed

The hardest automation incidents are not broken tools — they happen when every tool executes exactly as asked while the surrounding system loses the ability to evaluate whether that action is still safe.

#automation #platform #ci-cd

Aug 8, 2023 9 min read

L2 Deep Dive

Backstage, Port, Cortex, and AWS Service Catalog: Different Tools, Different Control Planes

Backstage, Port, Cortex, and AWS Service Catalog compared on control-plane model — which tools provision, which only display, and where each abstraction breaks down.

#automation #platform #ci-cd

Sep 19, 2023 7 min read

L2 Deep Dive

OpenTofu vs Terraform: What Platform Teams Should Actually Evaluate

OpenTofu vs. Terraform on licensing risk, provider supply chain compatibility, state safety, and the migration cost platform teams actually absorb.

#automation #platform #ci-cd

Mar 12, 2024 8 min read

L2 Deep Dive

Internal Developer Platform Reference Architecture: Catalog, IaC, CI/CD, Policy, and Observability

Reference architecture for an IDP as a control plane—connecting service catalog, IaC, CI/CD pipelines, policy enforcement, and observability feedback.

#architecture #cloud #checklist

Oct 15, 2024 7 min read

L2 Deep Dive

CI/CD Observability: Queue Time, Flake Rate, Lead Time, Failure Domains, and Change Risk

Queue time, flake rate, lead time, failure domains, and change risk as CI/CD signals that reveal whether a delivery system is becoming safer or just busier.

#architecture #failures #cloud

Dec 17, 2024 7 min read

L2 Deep Dive