Database infrastructure conversations are breaking down the moment hardware enters the room because engineers are asking the wrong question. “Which is faster — CPU, GPU, or TPU?” is the wrong frame. The right question is the same one you already apply to query plans: what execution pattern does this workload need, and what hardware is optimized for that pattern?

Situation

OLTP systems are adding vector similarity, analytical aggregates, and AI inference to their workloads. Infrastructure teams are being asked to provision GPU instances without a framework for deciding when a GPU is the right choice versus a larger CPU instance or a purpose-built accelerator. The same confusion that once surrounded row-store vs column-store has returned at the hardware layer.

The Problem

Engineers who treat CPU, GPU, and TPU as a linear performance hierarchy make the wrong call in both directions: they over-provision GPUs for workloads that remain CPU-bound (transactions, connection management, control flow), and they under-provision accelerators for workloads that are genuinely scan-heavy or tensor-heavy. The result is either wasted capacity or incorrect assumptions that “the GPU is faster” without a workload-specific basis.

If you already understand OLTP vs OLAP, row vs column execution, and latency vs throughput, you already have the right mental model for this hardware decision.

Matching Execution Patterns to Hardware

CPU vs GPU vs TPU mental model

HardwareDBA Mental ModelBest At
CPUOLTP execution brainBranching, coordination, transactions, mixed workloads
GPUParallel analytics engineScans, filters, joins, aggregations, vector math
TPUMatrix math applianceDense AI tensor operations and model inference/training

What a CPU Is

A CPU is designed to be general-purpose. It handles many instruction types efficiently: branching, pointer chasing, transaction logic, conditional execution, scheduling and interrupts, complex control flow.

Think of a CPU as a traditional relational engine running OLTP traffic.

SELECT *
FROM orders
WHERE customer_id = 123
AND status = 'SHIPPED';

This is CPU-friendly because it involves index lookups, branching, and low-latency response patterns.

CPUs win when the workload is transactional, branch-heavy, latency-sensitive, coordination-heavy, or dominated by smaller irregular queries.

What a GPU Is

A GPU is not a faster CPU. It is built for repeating the same operation across massive data volumes in parallel.

Think of a GPU as a massively parallel analytics engine optimized for huge scans, repeated arithmetic, columnar execution, vector operations, and parallel filtering.

SELECT SUM(price * quantity)
FROM sales;

With billions of rows, this operation is repetitive and parallelizable — it maps well to GPU threads. GPUs win when the workload is scan-heavy, arithmetic-heavy, batch-oriented, highly parallelizable, or throughput-driven.

What a TPU Is

A TPU is more specialized than CPU or GPU. It is designed for dense matrix and tensor math used heavily in neural networks. Think of a TPU as a purpose-built model-math execution appliance.

TPUs are not general database accelerators. They are strongest when model computation itself is the bottleneck: neural network training, large-scale inference, dense tensor operations, and repeated matrix multiplications with regular shapes.

Dimension CPU GPU TPU
Flexibility Highest Medium Lowest
Best workload Mixed/general-purpose Parallel analytics AI tensor math
Latency Strong Moderate Workload-specific
Throughput Moderate Very high Very high for AI
Branch-heavy logic Excellent Weak Poor fit
OLTP Best Poor Poor
Analytics Decent Excellent General mismatch
ML inference Decent Strong Excellent
Matrix multiplication Okay Strong Best

In Practice

PostgreSQL’s execution model runs on CPUs — its buffer manager, lock manager, and MVCC machinery are built around sequential per-backend processing with branching logic. The documented behavior when you add GPU-accelerated extensions (such as PG-Strom for vectorized scan offload) is that the optimizer continues to handle query planning on CPU while the GPU handles the data-parallel scan and aggregation phases. This division of labor — CPU for control, GPU for data movement — is the documented design pattern for heterogeneous database systems.

NVIDIA’s RAPIDS cuDF library (Apache 2.0, documented at developer.nvidia.com/rapids) processes Pandas-like DataFrame operations on GPU. The documented design note is that data transfer between CPU memory and GPU memory (PCIe bandwidth) is the dominant latency cost for small-to-medium datasets, making GPU acceleration ineffective until the working set exceeds what the transfer overhead amortizes.

Google’s TPU documentation is explicit that TPUs are optimized for matrix multiplications with regular, statically-shaped tensors, and that irregular control flow, sparse operations, and dynamic shapes fall back to CPU or GPU. This boundary is the same boundary a DBA understands as the difference between a full table scan (GPU-friendly) and a complex multi-join query plan (CPU-friendly).

Where It Breaks

ScenarioWhat breaksWhy
GPU for OLTPLatency increases, no throughput gainGPU launch overhead and PCIe transfer cost exceed the per-request compute savings
CPU for large scansQuery runs 10–100x slower than GPU equivalentCPU cannot parallelize the same scan operation across thousands of cores simultaneously
TPU for database workloadsMisfit — most DB operations are not dense tensor mathTPU lacks general-purpose branching and irregular memory access support
Heterogeneous system with small working setGPU transfer overhead dominatesPCIe bandwidth makes GPU offload slower than in-memory CPU execution until data volume is large enough
Assuming GPU = faster for all AI workloadsInference latency spikes at low concurrencyTPU is faster for batched dense inference; GPU wins for moderate concurrency; CPU wins for single-request light inference

What to Do Next

  • Problem: Adding GPU or TPU infrastructure without a workload-to-hardware mapping wastes capacity on the wrong execution pattern.
  • Solution: Classify hot paths by execution pattern before choosing hardware — transactions and coordination stay on CPU, scan-heavy analytics move to GPU, dense model math goes to TPU.
  • Proof: Run your heaviest analytical query on a GPU-enabled instance with a columnar execution engine (DuckDB, RAPIDS, or a GPU database) and compare elapsed time and I/O throughput against the same query on your current CPU-only setup — the gap narrows or disappears for CPU-bound query shapes.
  • Action: This week, identify the three highest-CPU-cost queries in your monitoring dashboard and classify each as branch-heavy (CPU-bound) or scan-heavy (GPU candidate). That classification determines whether GPU provisioning is justified.