Engineering Fundamentals

#databases #architecture #ai-engineering

CPU vs GPU vs TPU Explained for Database Engineers

How CPU, GPU, and TPU architectures differ in ways that matter for databases and AI workloads — and which compute class to reach for when adding vector search, embedding generation, or GPU-accelerated analytics.

Mar 3, 2024 5 min read

L1 Field Note

SIMD vs SIMT Explained for Database Engineers

A DBA-friendly explanation of SIMD and SIMT using query execution, vectorized processing, and GPU mental models instead of hardware jargon.

#databases #cpu #gpu #performance

Feb 14, 2022 5 min read

L1 Field Note

MVCC Explained Like a Database Engineer

How multi-version concurrency control lets readers and writers run without blocking each other — and why misunderstanding it causes table bloat, undo log growth, and stalled vacuums.

#databases #architecture

Deep Dives

L2 and L3 posts with architecture, reliability, and tradeoff detail.

Nov 19, 2024 5 min read

L2 Deep Dive

Cost Observability: Build Dashboards That Show Waste Before Finance Finds It

How to expand monitoring beyond uptime by building dashboards that expose underutilized RDS instances, EBS io2 waste, and backup retention drift.

#cloud #architecture #checklist

Apr 26, 2022 6 min read

L2 Deep Dive

Read-After-Write Consistency: The UX Bug That Becomes a Database Bug

Acknowledging a write before the system knows where the next read will land turns a clean product experience into a staleness bug that looks like data loss — how read-after-write consistency works and where it breaks under replica lag.

Apr 11, 2022 7 min read

L2 Deep Dive

Rate Limiting Is a Product Contract, Not Just a Redis Counter

Rate limiting fails when the platform enforces one behavior while the product promised another to clients. The technical mechanism matters less than treating rate limits as a documented contract with defined scope, limits, and error semantics.

Mar 27, 2022 7 min read

L2 Deep Dive

Consistent Hashing: What It Solves and What It Does Not

Consistent hashing is a damage-control mechanism for cluster membership change, not a general scalability strategy — what it limits during node additions and removals, and the tradeoffs that make it unsuitable as a universal sharding approach.

Mar 12, 2022 7 min read

L2 Deep Dive

Idempotency Keys: The Small Table That Saves Distributed Systems

The most reliable distributed systems depend on an unimpressive table with a unique constraint and a saved response — how idempotency keys prevent double charges, duplicate events, and retry amplification at the database layer.

Feb 10, 2022 7 min read

L2 Deep Dive

Caches Do Not Remove Database Load Unless You Design the Miss Path

A cache is not a shield around the database — it is a second traffic control system whose failure mode is a synchronized stampede back to the database. How to design the miss path so cache failures don't become database incidents.

Latest in Engineering Fundamentals

Apr 15, 2026 5 min read

L1 Field Note

#ai-engineering #architecture #checklist

AI Cost Observability Dashboard: LangSmith vs Helicone

How to build an AI FinOps dashboard and choose between proxy-based and instrumentation-based observability.

Oct 21, 2025 4 min read

L1 Field Note

Alert Fatigue Engineering: How to Build Fewer, Better, Actionable Alerts

A dashboard is not observability, and an alert without a specific action is just operational debt masquerading as monitoring.

#failures #checklist #architecture

Nov 19, 2024 5 min read

L2 Deep Dive

Cost Observability: Build Dashboards That Show Waste Before Finance Finds It

How to expand monitoring beyond uptime by building dashboards that expose underutilized RDS instances, EBS io2 waste, and backup retention drift.

#cloud #architecture #checklist

Mar 12, 2024 4 min read

L1 Field Note

Consistency Models Your Application Actually Needs

The difference between read committed, repeatable read, and serializable isolation in operational terms — and why most applications are running with weaker guarantees than engineers assume.

Mar 3, 2024 5 min read

L1 Field Note

SIMD vs SIMT Explained for Database Engineers

A DBA-friendly explanation of SIMD and SIMT using query execution, vectorized processing, and GPU mental models instead of hardware jargon.

#databases #cpu #gpu #performance

Mar 2, 2024 5 min read

L1 Field Note

#databases #architecture #ai-engineering

CPU vs GPU vs TPU Explained for Database Engineers

All Engineering Fundamentals Posts

Apr 15, 2026 5 min read

L1 Field Note

#ai-engineering #architecture #checklist

AI Cost Observability Dashboard: LangSmith vs Helicone

How to build an AI FinOps dashboard and choose between proxy-based and instrumentation-based observability.

Oct 21, 2025 4 min read

L1 Field Note

Alert Fatigue Engineering: How to Build Fewer, Better, Actionable Alerts

A dashboard is not observability, and an alert without a specific action is just operational debt masquerading as monitoring.

#failures #checklist #architecture

Nov 19, 2024 5 min read

L2 Deep Dive

Cost Observability: Build Dashboards That Show Waste Before Finance Finds It

How to expand monitoring beyond uptime by building dashboards that expose underutilized RDS instances, EBS io2 waste, and backup retention drift.

#cloud #architecture #checklist

Mar 12, 2024 4 min read

L1 Field Note

Consistency Models Your Application Actually Needs

The difference between read committed, repeatable read, and serializable isolation in operational terms — and why most applications are running with weaker guarantees than engineers assume.

Mar 3, 2024 5 min read

L1 Field Note

SIMD vs SIMT Explained for Database Engineers

A DBA-friendly explanation of SIMD and SIMT using query execution, vectorized processing, and GPU mental models instead of hardware jargon.

#databases #cpu #gpu #performance

Mar 2, 2024 5 min read

L1 Field Note

#databases #architecture #ai-engineering

CPU vs GPU vs TPU Explained for Database Engineers

Jan 9, 2024 4 min read

L1 Field Note

#databases #fundamentals #architecture

CAP Theorem in Operational Terms

What CAP theorem actually says about distributed database tradeoffs, why the CP vs AP framing is more useful than the theory, and what it means for your system when the network fails.

Nov 14, 2023 4 min read

L1 Field Note

#databases #fundamentals #architecture

Caches, Queues, and Databases: When to Use Each

The decision framework for choosing between a cache, a queue, and a database — including the failure modes that appear when engineers use the wrong one for the job.

Sep 12, 2023 4 min read

L1 Field Note

Cardinality Estimation: Why the Query Planner Gets It Wrong

How PostgreSQL estimates row counts, why those estimates are wrong for correlated columns and skewed distributions, and what engineers can do when the planner picks a bad plan.

Jul 11, 2023 4 min read

L1 Field Note

Index Selectivity: Why Cardinality Changes Everything

Why a low-cardinality index is often worse than no index, how the query planner uses selectivity estimates, and when to build a partial index instead.

May 9, 2023 5 min read

L1 Field Note

Reading a Query Plan Without Getting Lost

How to read PostgreSQL EXPLAIN output, what seq scan vs index scan actually means in practice, and the three numbers that matter most in any query plan.

Mar 14, 2023 4 min read

L1 Field Note

Connection Pooling Explained

Why PostgreSQL connections are expensive, what a connection pool actually does, and the difference between session mode, transaction mode, and statement mode in PgBouncer.

Jan 10, 2023 4 min read

L1 Field Note

Replication Lag Explained

What replication lag actually measures in PostgreSQL, the three distinct lag components that most monitoring tools conflate, and which one matters for your RPO.

Oct 11, 2022 4 min read

L1 Field Note

Checkpoint and Flush: What Your Database Does Before It Can Rest

What a checkpoint actually does in PostgreSQL, why dirty page flush matters for recovery time, and what engineers should monitor to avoid checkpoint pressure.

Aug 9, 2022 4 min read

L1 Field Note

Redo vs Undo: How Databases Recover from Crashes

The two mechanisms databases use to survive crashes — redo brings committed changes forward, undo rolls back uncommitted ones — and why the distinction matters operationally.

Jun 14, 2022 4 min read

L1 Field Note

#databases #fundamentals #architecture

B-tree vs LSM Tree: The Storage Engine Tradeoff

Why PostgreSQL and MySQL use B-trees while Cassandra and RocksDB use LSM trees — the read/write tradeoff that determines which storage engine fits your workload.

Apr 26, 2022 6 min read

L2 Deep Dive

Read-After-Write Consistency: The UX Bug That Becomes a Database Bug

Apr 11, 2022 7 min read

L2 Deep Dive

Rate Limiting Is a Product Contract, Not Just a Redis Counter

Mar 27, 2022 7 min read

L2 Deep Dive

Consistent Hashing: What It Solves and What It Does Not

Mar 15, 2022 4 min read

L1 Field Note

WAL Explained for Database Engineers

What write-ahead logging is, why every ACID database uses it, and what engineers need to know about LSN ordering, crash recovery, and replication lag.

Mar 12, 2022 7 min read

L2 Deep Dive

Idempotency Keys: The Small Table That Saves Distributed Systems

Feb 14, 2022 5 min read

L1 Field Note

MVCC Explained Like a Database Engineer

How multi-version concurrency control lets readers and writers run without blocking each other — and why misunderstanding it causes table bloat, undo log growth, and stalled vacuums.

#databases #architecture

Feb 10, 2022 7 min read

L2 Deep Dive

Caches Do Not Remove Database Load Unless You Design the Miss Path

Jan 26, 2022 8 min read

L2 Deep Dive

Load Balancers: The Hidden State Machine in Front of Your App

A load balancer is not a pipe — it is a distributed state machine making routing and health decisions on stale, partial evidence. Its configuration choices propagate directly into application availability and failure modes.