MongoDB Query Performance Workflow

A MongoDB query showing COLLSCAN in explain output is not always the root cause of a performance problem — but it is always the first place to look. When Atlas Performance Advisor flags a query or currentOp shows sessions running for seconds, the diagnostic sequence from explain output to index design to cache pressure determines whether you spend 15 minutes or 2 hours finding the fix.

Situation

The alert fires or the monitoring dashboard shows elevated read latency. Atlas Performance Advisor has flagged one or more queries lacking index coverage. Operations that normally return in single-digit milliseconds are now taking hundreds of milliseconds or seconds. The collection has grown significantly since the last schema review.

MongoDB query execution follows a straightforward path: the query planner selects a plan based on available indexes and statistics, executes it, and reports the winning plan with execution statistics. When no suitable index exists, the planner chooses COLLSCAN — a sequential scan of every document in the collection. For large collections, COLLSCAN latency scales linearly with collection size regardless of how selective the query predicate is.

The diagnostic starting point is the same in every case: understand what the query planner is actually doing, then determine whether it is doing the right thing.

Symptoms

Signal	Where to see it	What it means
`queryPlanner.winningPlan.stage: COLLSCAN`	`explain()` output	No index used — full collection scan
High `totalDocsExamined` vs `nReturned`	`explain("executionStats")`	Index exists but selectivity is low, or filter is post-index
`SORT` stage in winningPlan	`explain()` output	In-memory sort — may hit 100 MB sort limit on large result sets
`keysExamined >> nReturned`	`explain("executionStats")`	Index scan returning many keys, most filtered out after
Ops flagged in Atlas Performance Advisor	Atlas UI — Performance Advisor tab	Atlas detected slow queries without index coverage
Growing `opcounters.query` with flat throughput	`db.serverStatus().opcounters`	Query rate growing without corresponding throughput improvement

First Five Checks

Currently running slow operations — Check what is active before looking at historical patterns:

db.currentOp({
  active: true,
  secs_running: { $gt: 1 }
})

Any operation running longer than 1 second is a candidate. Note the ns (namespace), op type, and query field. If you see the same query pattern repeatedly, it is a systemic issue, not a one-off.

Explain the slow query with execution statistics — Get the actual execution plan and row counts:

db.orders.explain("executionStats").find({
  customer_id: 12345,
  status: "pending"
})

Key fields in the output:

winningPlan.stage: IXSCAN (index used) or COLLSCAN (full scan)
executionStats.nReturned: documents returned to the client
executionStats.totalDocsExamined: documents MongoDB had to read
executionStats.totalKeysExamined: index keys scanned
executionStats.executionTimeMillis: actual query duration

A healthy query has nReturned ≈ totalDocsExamined. A poorly indexed query has totalDocsExamined >> nReturned.

List existing indexes — Understand what index coverage already exists:

db.orders.getIndexes()

Check whether an index exists on the query fields. If an index exists but EXPLAIN shows COLLSCAN, the index may not match the query predicate (wrong field order in a compound index, mismatched types, or low cardinality causing planner to prefer COLLSCAN).

Enable slow query profiling — Capture slow queries for pattern analysis:

// Set profiling level 1 — log queries slower than 100ms
db.setProfilingLevel(1, { slowms: 100 })

// Read recent slow queries
db.system.profile.find().sort({ ts: -1 }).limit(5).pretty()

The profiler output includes full query shape, execution plan, and timing. On Atlas, the Query Profiler in the UI exposes the same data without manual profiling setup.

Check server-level query rate trends — Determine if this is a new regression or a gradual growth issue:

db.serverStatus().opcounters

Compare query count between two calls 60 seconds apart. If the query rate has been growing while throughput stays flat, the queries are getting slower as the collection grows — a classic missing-index signature.

Decision Tree

flowchart TD
    A[Slow MongoDB query] --> B{explain shows COLLSCAN?}
    B -->|yes| C{Index exists on query fields?}
    C -->|no| D[Create index on query predicate fields]
    C -->|yes| E{Cardinality low — many duplicate values?}
    E -->|yes| F[Consider compound index with higher-cardinality field first]
    E -->|no| G[Check field type match — query type must match schema type]
    B -->|no| H{totalDocsExamined much larger than nReturned?}
    H -->|yes| I[Compound index needed — add filter fields in ESR order]
    H -->|no| J{SORT stage in winningPlan?}
    J -->|yes| K[Add sort key to index — create covering compound index]
    J -->|no| L{WiredTiger cache fill above 90%?}
    L -->|yes| M[Cache pressure — increase wiredTigerCacheSizeGB or upgrade instance]
    L -->|no| N[Check write contention — concurrent writes to same documents]

Remediation Options

Option 1 — Create a targeted index

For a query doing COLLSCAN with no existing index on the predicate fields:

// Single-field index
db.orders.createIndex({ customer_id: 1 })

// Compound index following ESR rule (Equality, Sort, Range)
// Query: find({ customer_id: X, status: "pending" }, sort by created_at)
db.orders.createIndex({ customer_id: 1, status: 1, created_at: -1 })

The ESR rule from MongoDB documentation: place equality predicates first, sort fields second, and range predicates last in a compound index. This ordering maximizes the portion of the index that can be used for both filtering and sorting.

After index creation, re-run explain("executionStats") to confirm the plan switched from COLLSCAN to IXSCAN and totalDocsExamined dropped to match nReturned.

Option 2 — Covered query with projection

If a query frequently returns only a subset of fields and those fields plus the query predicate can all fit in an index, a covered query avoids fetching documents entirely:

// Index covers query + projection
db.orders.createIndex({ customer_id: 1, status: 1, created_at: 1, total: 1 })

// Covered query — returns only indexed fields, no document fetch
db.orders.find(
  { customer_id: 12345, status: "pending" },
  { customer_id: 1, status: 1, created_at: 1, total: 1, _id: 0 }
)

In explain() output, a covered query shows IXSCAN with no FETCH stage. totalDocsExamined will be 0.

Option 3 — Resolve in-memory sort

An in-memory SORT stage appears when no index covers the sort key. MongoDB limits in-memory sorts to 100 MB by default; queries that would exceed this limit fail with an error. Adding the sort key to the index eliminates the SORT stage:

// Before: COLLSCAN or IXSCAN followed by SORT stage
db.orders.find({ customer_id: 12345 }).sort({ created_at: -1 })

// Add compound index covering filter and sort
db.orders.createIndex({ customer_id: 1, created_at: -1 })

// After: IXSCAN with no SORT stage — sort is satisfied by index order

Rollback Plan

Index creation: Indexes can be dropped without data loss: db.orders.dropIndex("index_name"). Index name is visible in db.orders.getIndexes(). Drop takes effect immediately — query plans revert to pre-index behavior.
Profiling level change: db.setProfilingLevel(0) disables profiling. The system.profile collection is not automatically truncated — drop it manually if it has grown large: db.system.profile.drop().
No rollback needed for explain or currentOp — these are read-only diagnostic commands with no side effects.

Automation Opportunity

Atlas Performance Advisor automatically surfaces index recommendations for queries it detects as slow. For self-managed deployments, the same signal is available by querying the profiler collection on a schedule:

// Find query shapes taking longer than 200ms in the last hour
db.system.profile.find({
  ts: { $gt: new Date(Date.now() - 3600000) },
  millis: { $gt: 200 },
  op: "query"
}).sort({ millis: -1 }).limit(10)

Running this as a scheduled job and alerting when new slow query shapes appear gives early warning before a growing collection converts a borderline index miss into a hard COLLSCAN under production load.

Leadership Summary

What broke: MongoDB read latency spiked as collection growth exposed queries running without index coverage. Full collection scans were taking seconds on collections that had grown beyond their original index planning assumptions.
What was done: Used explain("executionStats") to identify COLLSCAN queries, applied compound indexes following the ESR rule, and verified plans switched from COLLSCAN to IXSCAN with totalDocsExamined matching nReturned.
What prevents recurrence: Atlas Performance Advisor monitoring surfaces new missing-index patterns automatically. A scheduled profiler query provides equivalent coverage on self-managed deployments.

Checklist

Run db.currentOp({active: true, secs_running: {$gt: 1}}) — identify active slow operations
Run explain("executionStats") on the flagged query — note winningPlan.stage
Check totalDocsExamined vs nReturned — ratio above 10:1 indicates poor selectivity or missing index
Run db.collection.getIndexes() — confirm which indexes exist and their field order
Check for SORT stage in winningPlan — if present, sort key is not covered by the index
If COLLSCAN with no index: create a targeted index using ESR rule for compound predicates
If IXSCAN but high totalDocsExamined: consider adding remaining filter fields to the compound index
Re-run explain("executionStats") after index creation — verify plan switches to IXSCAN
Check WiredTiger cache fill ratio via db.serverStatus().wiredTiger.cache — rule out cache pressure
Enable profiler at slowms: 100 if the slow query pattern is not yet fully characterized

Situation

Symptoms

First Five Checks

Decision Tree

Remediation Options

Rollback Plan

Automation Opportunity

Leadership Summary

Checklist

Rajiv

Related Posts

Autovacuum Is a Capacity Problem, Not a Maintenance Task

When Autovacuum Becomes a Backpressure Signal

Prometheus + Grafana for Database Engineers: Open-Source Monitoring That Actually Works