The AI-Native Engineering Stack: Agents, Inference, and Knowledge Graphs in Production (November 2025)

Putting AI into production engineering systems — not as a chat wrapper but as a backend service handling real operational tasks — means solving three infrastructure problems that teams have been building by hand: running agents with the same reliability properties as microservices, deploying LLM inference on your own hardware without assembling a custom platform, and making your database a queryable knowledge layer without maintaining a separate vector store. Three November 2025 open-source releases address each layer.

Situation

The gap between “AI demo” and “AI in production” is infrastructure. Engineers who want AI agents in their operational workflows — automating incident triage, reviewing schema changes, answering schema questions — have been building auth, identity, scaling, and observability into each agent by hand. Running local LLM inference on Kubernetes has required assembling GPU scheduling, model management, health checks, and API exposure into a custom operator. Using databases as a knowledge layer for AI has meant maintaining separate vector stores and ETL pipelines in sync with the primary database. All three were multi-week infrastructure projects before this month.

The Problem

Domain	Manual bottleneck	What it costs
System design	AI agents coded as scripts with no auth, traceability, or scaling primitives	Production failures are opaque; every agent is a one-off with no shared operational model
Platform engineering	LLM inference on K8s requires assembling GPU scheduling, model management, health checks, and routing manually	Weeks of infrastructure work before the AI capability ships
Databases	SQL knowledge lives in the database but AI retrieval requires a separate vector store and maintained ETL	Two parallel data systems to keep in sync for what is conceptually one knowledge base
Platform engineering	Local inference with cloud fallback requires a custom routing layer	Air-gapped compliance and cost control require infrastructure that had no K8s-native expression

Can these three infrastructure layers be provisioned today without building them from scratch?

The AI-Native Production Stack

These three tools form a complete AI-native engineering stack:

flowchart TD
    AIProduction[AI in production engineering]
    AIProduction --> AgentLayer[system design — AI agents as production microservices]
    AIProduction --> InfraLayer[platform — LLM inference as a Kubernetes primitive]
    AIProduction --> DataLayer[databases — SQL as the AI knowledge layer]
    AgentLayer --> agentfield[agentfield — agent identity, auth, and observability from day one]
    InfraLayer --> LLMKube[LLMKube — deploy any LLM on K8s in two YAML lines]
    DataLayer --> SAG[SAG — SQL-driven knowledge graph built at query time]
    agentfield --> Out1[agents behave like microservices — observable, auditable, scalable]
    LLMKube --> Out2[any model on any GPU — NVIDIA or Apple Silicon — no custom platform]
    SAG --> Out3[database becomes the knowledge base — no separate vector store to maintain]

agentfield — Agent Backends Without Building the Infrastructure Layer

The productivity problem it solves: Engineers who want to deploy a database operations agent — one that reviews migrations, answers schema questions, or escalates alerts — have to build auth, identity boundaries, scaling, audit logging, and observability into the agent before it can run in production. agentfield removes that work entirely.

According to the project README, agentfield frames itself as “The AI Backend” with the explicit position that “AI has outgrown chatbots and prompt orchestrators — backend agents need backend infrastructure.” The platform makes AI agents observable, auditable, and identity-aware from day one, with support for Kubernetes deployment and SDKs in Python, Go, and TypeScript.

from agentfield import Agent

@Agent.register(name="schema-reviewer")
async def review_schema(migration_sql: str) -> dict:
    # Identity, auth, audit trail, and scaling are handled by the platform
    return await analyze_migration(migration_sql)

The architecture positions agents as backend services with defined identity and authorization boundaries — the same operational model a team would apply to any API service, applied to AI agents.

Where it breaks: agentfield is a November 2025 release at v0.x. The README and SDKs describe the architecture, but production deployments at scale are not yet documented. Teams should treat it as early-adopter infrastructure and expect API changes — the project signals active development and the documentation is evolving.

LLMKube — LLM Inference as a Kubernetes Operator

The productivity problem it solves: Running LLM inference on your own Kubernetes cluster for production AI agents requires assembling GPU scheduling, model version management, health checks, scaling, and API exposure manually. LLMKube turns that into a K8s operator — define a Model and an InferenceService, and the operator handles the rest.

According to the project README, LLMKube supports llama.cpp, vLLM, TGI, and mlx-server as inference backends, with NVIDIA and Apple Silicon (Metal) GPU support across heterogeneous clusters. The operator handles model downloading, caching, GPU scheduling, health checks, and exposes an OpenAI-compatible API. A ModelRouter resource enables policy-aware routing between local models and external providers (Claude, GPT) from within the same cluster.

The README states the problem directly: after you get llama.cpp running on one machine, “you need to scale it, monitor it, manage model versions, handle GPU scheduling across nodes… Suddenly you’re building an entire platform instead of shipping your product.”

apiVersion: llmkube.io/v1
kind: Model
metadata:
  name: llama-3-8b
spec:
  source: huggingface
  modelId: meta-llama/Meta-Llama-3-8B-Instruct
  backend: llamacpp
---
apiVersion: llmkube.io/v1
kind: InferenceService
metadata:
  name: db-assistant
spec:
  model: llama-3-8b
  replicas: 2
  gpu: nvidia

Where it breaks: LLMKube requires an existing Kubernetes cluster with GPU node pools. The operator simplifies LLM deployment on K8s but doesn’t replace the K8s infrastructure prerequisite. Teams without GPU node pools need to provision that infrastructure before LLMKube provides value. The project is at an early release; production deployment documentation is still developing alongside the code.

SAG — SQL-Driven Knowledge Graph for AI Retrieval

The productivity problem it solves: Teams building AI agents that need to reason about their own data — schema structure, data relationships, operational history — typically maintain a separate vector store synchronized with the primary database. SAG uses SQL as the retrieval mechanism and builds the knowledge graph at query time from the data already in the database.

According to the project README, SAG (Smart Auto Graph Engine) is a SQL-driven RAG engine that automatically decomposes documents into semantic atomic events, extracts multi-dimensional entities, and builds relationship networks dynamically at query time rather than maintaining a pre-built static graph. The backend is FastAPI with a Next.js frontend; the English README is available at README_en.md in the repository.

For a database team, the practical application: schema documentation, query history, and change logs become queryable by AI agents without a separate vector index to maintain. The knowledge graph evolves as data does.

git clone https://github.com/Zleap-AI/SAG
cd SAG
cp .env.example .env
# Configure database connection and LLM endpoint
docker compose up -d
# Query your database in natural language at http://localhost:3000

Where it breaks: SAG’s architecture implies query-time compute cost proportional to the knowledge graph traversal depth. For high-frequency queries against large document sets, benchmark response time on a representative workload before deploying it in an agent’s hot path. The README does not publish latency benchmarks — teams should measure this against their specific data volume.

In Practice

All three descriptions above are grounded in the respective project READMEs. Items to verify:

agentfield’s claims (“observable, auditable, identity-aware from day one”) are the architectural position from the README. The specific observability implementation — what is traced, what is audited, how it integrates with existing monitoring — should be verified against current project documentation before using it as the primary agent infrastructure layer.

LLMKube’s ModelRouter routing between local and external providers is documented as a resource type in the operator. The README references a #performance section with throughput benchmarks — teams should verify against their specific model and hardware combination before committing to production deployment.

SAG’s primary README is in Chinese; the English version is README_en.md. The “dynamically builds knowledge graph at query time” architecture is described but production performance benchmarks are not yet published.

Where It Breaks

Failure mode	Trigger	Fix
agentfield v0.x API instability	Breaking changes between early releases	Pin to a specific version; review changelog before each upgrade
LLMKube GPU prerequisite	No GPU node pool in existing K8s cluster	Provision GPU nodes before deploying; CPU inference works but latency increases significantly
SAG query-time latency	Large knowledge graphs with deep relationship traversal	Benchmark on a representative dataset before using SAG in an agent’s synchronous request path
LLMKube cloud fallback misconfiguration	ModelRouter sends requests to external provider unexpectedly	Audit ModelRouter policy rules before enabling cloud fallback; verify no sensitive schema data is included in routed requests
SAG documentation gap	English README may lag Chinese README on new features	Check `README_en.md` and compare last-modified dates with `README.md`

What to Do Next

Problem: Running AI agents in production requires three infrastructure layers — agent backend, LLM inference serving, and knowledge retrieval — that all had manual-build costs before November 2025.
Solution: agentfield for AI agent backend infrastructure with identity and observability, LLMKube for K8s-native LLM inference deployment, SAG for SQL-driven knowledge graph retrieval.
Proof: Deploy LLMKube on a single GPU node with Llama 3 8B and point an agentfield agent at the local endpoint. If the agent answers a schema question using the local model, you have validated the agent-plus-inference layer without a cloud API key.
Action: This week, run SAG against a development database and ask three questions that a database engineer answered manually last quarter. If the answers are accurate, you have a knowledge layer that requires no separate vector store to maintain.

Situation

The Problem

The AI-Native Production Stack

agentfield — Agent Backends Without Building the Infrastructure Layer

LLMKube — LLM Inference as a Kubernetes Operator

SAG — SQL-Driven Knowledge Graph for AI Retrieval

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

Build vs Buy: The AI Platform Architecture Decision

AI Governance for Engineering Teams: Preventing Shadow AI Spend Without Blocking Innovation

AI Token Cost Overruns: Why AI Coding Assistants Are Becoming the New Cloud Bill Problem