MongoDB Queryable Encryption Architecture Review
MongoDB Queryable Encryption is not a feature you enable after the application is built — it is a schema and key management decision that constrains every query you can run on encrypted fields for the lifetime of the collection. Getting the architecture review right before go-live is substantially cheaper than discovering a query constraint after the collection is populated and production traffic is live.
Situation
The team has decided to use MongoDB Queryable Encryption to protect a subset of sensitive document fields — PII, payment instrument data, health records, or similar categories that require protection from privileged infrastructure access. The development environment has QE configured with a local key provider. Production go-live is planned.
This runbook is the go-live gate review for a team implementing QE in MongoDB 8.0. For an introduction to what QE enables and how it differs from standard field-level encryption, see MongoDB 8.0: Why Queryable Encryption Matters.
The Problem
The pre-go-live review exists because three categories of mistakes are expensive to fix after data is encrypted at scale: wrong key management provider, wrong query type configuration per field, and insufficient performance testing for range queries. Each one requires either a collection rebuild (re-encrypt all documents with corrected configuration) or a material change to how the application queries the data.
How do we systematically validate the MongoDB QE deployment before production traffic begins?
Pre-Go-Live Architecture Review
The target architecture must satisfy stringent key management, driver, and query constraints.
flowchart TD
A[QE go-live review] --> B{KMS configured for production?}
B -->|no| C[Configure AWS KMS or GCP or Azure KV]
C --> B
B -->|yes| D{All sensitive fields classified?}
D -->|no| E[Create field inventory — QE vs standard FLE]
E --> D
D -->|yes| F{Driver version 6.0 plus with libmongocrypt?}
F -->|no| G[Upgrade driver and validate encryption round-trip]
F -->|yes| H{Query types verified for each QE field?}
H -->|no| I[Audit application queries vs encrypted fields map]
I --> H
H -->|yes| J{Range query performance tested in staging?}
J -->|no| K[Run range query benchmark — verify latency acceptable]
J -->|yes| L{Key rotation procedure documented?}
L -->|no| M[Document CMK rotation and DEK re-wrap procedure]
L -->|yes| N[Approved for production go-live]
1. Key Management Provider
Verify that production configuration uses AWS KMS, GCP Cloud KMS, Azure Key Vault, or a KMIP-compliant provider.
// Insecure: local provider (development only)
const kmsProviders = {
local: { key: localMasterKey }
};
// Required for production: external KMS
const kmsProviders = {
aws: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
}
};
Any production deployment using the local provider has its entire encryption model broken — the key material is accessible to anyone with filesystem access to the application server.
2. Field Classification
Not every sensitive field needs Queryable Encryption. Fields that are only written and read by the application without server-side filtering belong on standard FLE.
| Field | Sensitivity | Server-side queries needed | Recommendation |
|---|---|---|---|
ssn | High | Equality lookup only | QE — equality |
salary | Medium | Range queries needed | QE — range |
medical_notes | High | No server-side queries | Standard FLE |
3. Driver Version and Dependencies
MongoDB QE requires specific driver versions and the libmongocrypt dependency:
- Node.js driver:
mongodb6.0+ - Python driver:
pymongo4.4+ withpymongo[encryption] - Java driver: 4.11+
- libmongocrypt: 1.8+
# Node.js
cat package.json | grep '"mongodb"'
4. Query Type Configuration
const encryptedFieldsMap = {
"mydb.patients": {
fields: [
{
path: "ssn",
bsonType: "string",
queries: [{ queryType: "equality" }]
}
]
}
};
Regex, $text, $where, and most aggregation expressions that operate on encrypted field content are not supported for server-side evaluation.
5. DEK Cache TTL and Rotation
The ClientEncryption object caches Data Encryption Keys (DEKs) in application memory.
const clientEncryption = new ClientEncryption(client, {
keyVaultNamespace: "encryption.__keyVault",
kmsProviders,
keyExpirationMS: 60000
});
For key rotation to take effect promptly, the cache TTL must be shorter than the rotation response SLA.
In Practice
All patterns below are derived from MongoDB’s documented system behavior and MongoDB’s official QE documentation (MongoDB Queryable Encryption docs). I have not run QE at production scale personally; these are documented design behaviors, not field observations.
Based on how MongoDB’s system actually behaves, migrating from a local provider to an external KMS requires re-writing the data. There is no migration path that converts existing encrypted documents in-place. If documents were encrypted with local-provider DEKs, they must be decrypted and re-encrypted with KMS-backed DEKs before production go-live.
Range queries on QE-encrypted fields carry substantial performance overhead. The documented pattern is that range encryption introduces additional metadata index entries per document — MongoDB’s range index for an encrypted field stores multiple auxiliary entries per document (not just one per document as a standard B-tree index does), so index storage size grows significantly with collection volume. A collection with 50 million documents and two range-encrypted fields can accumulate an encrypted index substantially larger than equivalent unencrypted field indexes. Write latency also increases because each insert must write auxiliary range index metadata. The actual latency impact depends heavily on collection size, range bounds configuration, and range precision settings (sparsity and trimFactor in the encryptedFields config). Benchmarking must be done at production scale:
const start = Date.now();
const results = await db.collection("patients").find({
dob: { $gte: new Date("1970-01-01"), $lte: new Date("1990-12-31") }
}).toArray();
const elapsed = Date.now() - start;
Multi-pod DEK cache consistency. In multi-instance application deployments, each process holds its own in-memory DEK cache. When a DEK is revoked or a CMK is rotated, instances that have not yet evicted their cached DEK will continue to decrypt data using the old key until their keyExpirationMS TTL elapses. During this window, some application pods succeed on encrypted reads and others fail after rotation takes effect on the KMS side — a split-brain failure mode where errors appear intermittently across instances. The operational requirement is to either set a short TTL (accepting higher KMS call volume) or coordinate a rolling restart of application pods immediately after key rotation to flush all caches.
For key rotation, MongoDB’s behavior ensures that Customer Master Key (CMK) rotation in the KMS does not require re-encrypting document data. The documented pattern is to use the rewrapManyDataKey command, which re-wraps the DEKs with the new CMK while leaving the underlying collection data untouched:
await clientEncryption.rewrapManyDataKey(
{},
{
provider: "aws",
masterKey: { region: "us-east-1", key: process.env.NEW_AWS_CMK_ARN }
}
);
Automating visibility into DEK health is a common operational pattern. DEK creation dates can be monitored via the key vault collection:
db.getSiblingDB("encryption").getCollection("__keyVault").find(
{},
{ keyAltNames: 1, creationDate: 1, updateDate: 1 }
).forEach(key => {
const ageDays = (Date.now() - key.creationDate) / 86400000;
if (ageDays > 90) {
print("DEK may need rotation:", key.keyAltNames, "age:", Math.round(ageDays), "days");
}
});
Where It Breaks
Symptoms of an Incomplete QE Design
| Signal | Where to see it | What it means |
|---|---|---|
| Local key provider in production config | ClientEncryption initialization in app code | Security model broken — key material accessible without KMS |
| Driver version below 6.0 | package.json or requirements.txt | libmongocrypt not supported — QE will fail at runtime |
| QE field queried with regex in application | Application code search | Unsupported query type — will fail or require application-layer workaround |
| No key rotation procedure documented | Architecture documentation | CMK rotation unplanned — compliance risk |
| Range query on equality-only field | Encrypted fields map vs query code | Runtime error when range query hits equality-only encrypted field |
| DEK cached indefinitely in application | ClientEncryption configuration | Key rotation does not take effect until cache expires |
Design Tradeoffs and Failure Modes
| Design Decision | Benefit | Tradeoff / Failure Mode |
|---|---|---|
| Standard FLE vs QE | Simpler setup, lower overhead, no strict query constraints. | Cannot run any server-side queries (equality or range) on the encrypted data. |
| Equality vs Range | Equality has faster performance and generates less metadata. | Runtime errors will occur if the application attempts a range query on an equality-only field. |
| External KMS Dependency | Meets compliance standards; security model is maintained. | KMS Unavailability: If the KMS endpoint becomes unreachable, the application cannot encrypt new writes or decrypt reads. Plan for KMS high availability. |
| Short DEK Cache TTL | Application responds quickly to CMK rotations and revocations. | Increases request volume to the external KMS, potentially impacting latency and increasing costs. |
| In-place Schema Changes | N/A | Post-Go-Live Rigidity: MongoDB does not support in-place schema changes for QE. Changing queryType requires a multi-hour collection rebuild, decrypting and re-encrypting all data. |
What to Do Next
- Problem: Queryable Encryption configurations are permanent; making the wrong choice on query types or KMS providers requires expensive collection rebuilds.
- Solution: Execute a pre-go-live architecture review validating field classification, driver versions, query constraints, and performance overhead.
- Proof: Benchmarking range queries at production scale and validating the
rewrapManyDataKeyrotation process ensures the infrastructure behaves correctly under real-world conditions. - Action: Implement the five verification checks listed above before deploying the encrypted fields map to the production cluster, and schedule an automated job to monitor DEK age.