The AI-Native Engineering Stack: Agents, Inference, and Knowledge Graphs in Production (November 2025)
Content reflects the state as of December 2025. AI tooling and model capabilities in this area change frequently.
Putting AI into production engineering systems — not as a chat wrapper but as a backend service handling real operational tasks — means solving three infrastructure problems that teams have been building by hand: running agents with the same reliability properties as microservices, deploying LLM inference on your own hardware without assembling a custom platform, and making your database a queryable knowledge layer without maintaining a separate vector store. Three November 2025 open-source releases address each layer.
Situation
The gap between “AI demo” and “AI in production” is infrastructure. Engineers who want AI agents in their operational workflows — automating incident triage, reviewing schema changes, answering schema questions — have been building auth, identity, scaling, and observability into each agent by hand. Running local LLM inference on Kubernetes has required assembling GPU scheduling, model management, health checks, and API exposure into a custom operator. Using databases as a knowledge layer for AI has meant maintaining separate vector stores and ETL pipelines in sync with the primary database. All three were multi-week infrastructure projects before this month.
The Problem
| Domain | Manual bottleneck | What it costs |
|---|---|---|
| System design | AI agents coded as scripts with no auth, traceability, or scaling primitives | Production failures are opaque; every agent is a one-off with no shared operational model |
| Platform engineering | LLM inference on K8s requires assembling GPU scheduling, model management, health checks, and routing manually | Weeks of infrastructure work before the AI capability ships |
| Databases | SQL knowledge lives in the database but AI retrieval requires a separate vector store and maintained ETL | Two parallel data systems to keep in sync for what is conceptually one knowledge base |
| Platform engineering | Local inference with cloud fallback requires a custom routing layer | Air-gapped compliance and cost control require infrastructure that had no K8s-native expression |
Can these three infrastructure layers be provisioned today without building them from scratch?
The AI-Native Production Stack
These three tools form a complete AI-native engineering stack:
flowchart TD
AIProduction[AI in production engineering]
AIProduction --> AgentLayer[system design — AI agents as production microservices]
AIProduction --> InfraLayer[platform — LLM inference as a Kubernetes primitive]
AIProduction --> DataLayer[databases — SQL as the AI knowledge layer]
AgentLayer --> agentfield[agentfield — agent identity, auth, and observability from day one]
InfraLayer --> LLMKube[LLMKube — deploy any LLM on K8s in two YAML lines]
DataLayer --> SAG[SAG — SQL-driven knowledge graph built at query time]
agentfield --> Out1[agents behave like microservices — observable, auditable, scalable]
LLMKube --> Out2[any model on any GPU — NVIDIA or Apple Silicon — no custom platform]
SAG --> Out3[database becomes the knowledge base — no separate vector store to maintain]
agentfield — Agent Backends Without Building the Infrastructure Layer
The productivity problem it solves: Engineers who want to deploy a database operations agent — one that reviews migrations, answers schema questions, or escalates alerts — have to build auth, identity boundaries, scaling, audit logging, and observability into the agent before it can run in production. agentfield removes that work entirely.
According to the project README, agentfield frames itself as “The AI Backend” with the explicit position that “AI has outgrown chatbots and prompt orchestrators — backend agents need backend infrastructure.” The platform makes AI agents observable, auditable, and identity-aware from day one, with support for Kubernetes deployment and SDKs in Python, Go, and TypeScript.
from agentfield import Agent
@Agent.register(name="schema-reviewer")
async def review_schema(migration_sql: str) -> dict:
# Identity, auth, audit trail, and scaling are handled by the platform
return await analyze_migration(migration_sql)
The architecture positions agents as backend services with defined identity and authorization boundaries — the same operational model a team would apply to any API service, applied to AI agents.
Where it breaks: agentfield is a November 2025 release at v0.x. The README and SDKs describe the architecture, but production deployments at scale are not yet documented. Teams should treat it as early-adopter infrastructure and expect API changes — the project signals active development and the documentation is evolving.
LLMKube — LLM Inference as a Kubernetes Operator
The productivity problem it solves: Running LLM inference on your own Kubernetes cluster for production AI agents requires assembling GPU scheduling, model version management, health checks, scaling, and API exposure manually. LLMKube turns that into a K8s operator — define a Model and an InferenceService, and the operator handles the rest.
According to the project README, LLMKube supports llama.cpp, vLLM, TGI, and mlx-server as inference backends, with NVIDIA and Apple Silicon (Metal) GPU support across heterogeneous clusters. The operator handles model downloading, caching, GPU scheduling, health checks, and exposes an OpenAI-compatible API. A ModelRouter resource enables policy-aware routing between local models and external providers (Claude, GPT) from within the same cluster.
The README states the problem directly: after you get llama.cpp running on one machine, “you need to scale it, monitor it, manage model versions, handle GPU scheduling across nodes… Suddenly you’re building an entire platform instead of shipping your product.”
apiVersion: llmkube.io/v1
kind: Model
metadata:
name: llama-3-8b
spec:
source: huggingface
modelId: meta-llama/Meta-Llama-3-8B-Instruct
backend: llamacpp
---
apiVersion: llmkube.io/v1
kind: InferenceService
metadata:
name: db-assistant
spec:
model: llama-3-8b
replicas: 2
gpu: nvidia
Where it breaks: LLMKube requires an existing Kubernetes cluster with GPU node pools. The operator simplifies LLM deployment on K8s but doesn’t replace the K8s infrastructure prerequisite. Teams without GPU node pools need to provision that infrastructure before LLMKube provides value. The project is at an early release; production deployment documentation is still developing alongside the code.
SAG — SQL-Driven Knowledge Graph for AI Retrieval
The productivity problem it solves: Teams building AI agents that need to reason about their own data — schema structure, data relationships, operational history — typically maintain a separate vector store synchronized with the primary database. SAG uses SQL as the retrieval mechanism and builds the knowledge graph at query time from the data already in the database.
According to the project README, SAG (Smart Auto Graph Engine) is a SQL-driven RAG engine that automatically decomposes documents into semantic atomic events, extracts multi-dimensional entities, and builds relationship networks dynamically at query time rather than maintaining a pre-built static graph. The backend is FastAPI with a Next.js frontend; the English README is available at README_en.md in the repository.
For a database team, the practical application: schema documentation, query history, and change logs become queryable by AI agents without a separate vector index to maintain. The knowledge graph evolves as data does.
git clone https://github.com/Zleap-AI/SAG
cd SAG
cp .env.example .env
# Configure database connection and LLM endpoint
docker compose up -d
# Query your database in natural language at http://localhost:3000
Where it breaks: SAG’s architecture implies query-time compute cost proportional to the knowledge graph traversal depth. For high-frequency queries against large document sets, benchmark response time on a representative workload before deploying it in an agent’s hot path. The README does not publish latency benchmarks — teams should measure this against their specific data volume.
In Practice
All three descriptions above are grounded in the respective project READMEs. Items to verify:
agentfield’s claims (“observable, auditable, identity-aware from day one”) are the architectural position from the README. The specific observability implementation — what is traced, what is audited, how it integrates with existing monitoring — should be verified against current project documentation before using it as the primary agent infrastructure layer.
LLMKube’s ModelRouter routing between local and external providers is documented as a resource type in the operator. The README references a #performance section with throughput benchmarks — teams should verify against their specific model and hardware combination before committing to production deployment.
SAG’s primary README is in Chinese; the English version is README_en.md. The “dynamically builds knowledge graph at query time” architecture is described but production performance benchmarks are not yet published.
Where It Breaks
| Failure mode | Trigger | Fix |
|---|---|---|
| agentfield v0.x API instability | Breaking changes between early releases | Pin to a specific version; review changelog before each upgrade |
| LLMKube GPU prerequisite | No GPU node pool in existing K8s cluster | Provision GPU nodes before deploying; CPU inference works but latency increases significantly |
| SAG query-time latency | Large knowledge graphs with deep relationship traversal | Benchmark on a representative dataset before using SAG in an agent’s synchronous request path |
| LLMKube cloud fallback misconfiguration | ModelRouter sends requests to external provider unexpectedly | Audit ModelRouter policy rules before enabling cloud fallback; verify no sensitive schema data is included in routed requests |
| SAG documentation gap | English README may lag Chinese README on new features | Check README_en.md and compare last-modified dates with README.md |
What to Do Next
- Problem: Running AI agents in production requires three infrastructure layers — agent backend, LLM inference serving, and knowledge retrieval — that all had manual-build costs before November 2025.
- Solution: agentfield for AI agent backend infrastructure with identity and observability, LLMKube for K8s-native LLM inference deployment, SAG for SQL-driven knowledge graph retrieval.
- Proof: Deploy LLMKube on a single GPU node with Llama 3 8B and point an agentfield agent at the local endpoint. If the agent answers a schema question using the local model, you have validated the agent-plus-inference layer without a cloud API key.
- Action: This week, run SAG against a development database and ask three questions that a database engineer answered manually last quarter. If the answers are accurate, you have a knowledge layer that requires no separate vector store to maintain.