Series / Databases

Database Observability Playbook

A complete guide to monitoring, alerting, and capacity planning for Postgres, MySQL, Cassandra, and MongoDB at scale.

5 posts Databases

Who This Is For

DBAs and platform engineers building or inheriting database monitoring. Covers what to measure, what thresholds matter, and how to tell signal from noise across Postgres, MySQL, Cassandra, and MongoDB.

What You Will Be Able to Do

Build a dashboard that surfaces saturation before users notice degradation
Set alert thresholds that fire on real problems, not autovacuum and checkpoint noise
Identify capacity headroom from metrics before you need to scale
Instrument slow-query logging and correlate it with replication lag and connection pool pressure

Prerequisites

Familiarity with at least one relational database in production. Helpful if you've used Prometheus or Datadog, but not required.

1 Per-Database Monitoring

What to measure and which queries surface the right signals for PostgreSQL and MySQL/Aurora dashboards.

Jul 8, 2024 7 min read

L2 Deep Dive

Databases

PostgreSQL Monitoring: The Dashboard That Surfaces Problems Before Users Do

The eight PostgreSQL metric groups that matter for production operations — queries, connections, replication lag, autovacuum, locks, cache pressure, checkpoint behavior, and bloat — with exact SQL and alert thresholds.

#databases #checklist

Jul 22, 2024 8 min read

L2 Deep Dive

Databases

MySQL and Aurora Monitoring: The Dashboard That Catches Problems Before Users Do

The seven MySQL and Aurora metric groups that matter for production operations — threads, replication lag, InnoDB buffer pool, slow queries, connections, locks, and disk — with exact SQL, CloudWatch metrics, and alert thresholds.

#databases #checklist

2 Alerting Strategy

Threshold design that fires on real saturation, not autovacuum and checkpoint noise.

Aug 12, 2024 8 min read

L2 Deep Dive

Databases

Database Alert Design: Thresholds That Fire on Real Problems

How to set database alert thresholds that catch real failures without burning the team on autovacuum noise, checkpoint churn, and replication lag spikes — with specific values for PostgreSQL, MySQL, and Aurora.

#databases #checklist

3 Tooling

End-to-end setup for Prometheus/Grafana and Datadog Database Monitoring — exporters, dashboards, and retention.

Sep 9, 2024 6 min read

L2 Deep Dive

Databases

Prometheus and Grafana for Database Monitoring: PostgreSQL and MySQL Setup

How to instrument PostgreSQL and MySQL with postgres_exporter and mysqld_exporter, configure Prometheus scrape jobs, and build Grafana panels that surface the metrics that matter — with working PromQL queries.

#databases #checklist

Oct 14, 2024 8 min read

L2 Deep Dive

Databases

Datadog Database Monitoring: PostgreSQL, MySQL, and Aurora Setup

How to configure Datadog Database Monitoring for PostgreSQL, MySQL, and Aurora — query samples, explain plans, wait event analysis, and the specific Agent settings that make the difference between metric collection and real observability.

#databases #checklist