AWS Multi-Account Data Boundary: VPCs, KMS, IAM, and Audit Trails

Most AWS data leaks are not caused by one missing deny statement. They happen when identity, network, encryption, and audit boundaries are designed as separate controls, then operated by separate teams with no shared failure model.

Situation

The default AWS account is a convenient construction zone. It is a poor security boundary for a growing platform.

A single account lets teams move fast while they are still learning the shape of the system. The VPC is local, IAM policies are close to the workload, KMS keys are created beside the data, and CloudTrail exists somewhere in the console. That is acceptable until the organization starts asking harder questions: Which principals can reach production data? Which network paths are allowed? Which keys can decrypt which stores? Which logs survive if the workload account is compromised?

AWS has spent years pushing customers toward multi-account architectures through AWS Organizations, Control Tower, organization trails, delegated administrator accounts, and the AWS Security Reference Architecture. The documented pattern is clear: separate accounts by responsibility, centralize guardrails, and make security evidence harder to tamper with than the workload itself.

That pattern matters because an AWS account is not just a billing container. It is an administrative blast-radius boundary. A production workload account, a log archive account, a security tooling account, and a shared network account should fail differently.

The Problem

The complication is that multi-account AWS can create the appearance of isolation without delivering a real data boundary.

A team may put production workloads in separate accounts but still allow broad cross-account roles. It may encrypt data with customer managed KMS keys but leave key policy administration inside the same account that runs the application. It may force traffic through private subnets but allow public AWS service access outside VPC endpoints. It may enable CloudTrail but store logs in a bucket that workload administrators can alter. Each control is present. The boundary is still weak.

This usually fails during an incident. A compromised role is not stopped by the VPC because AWS API calls do not behave like east-west packet flows. A KMS deny does not help if the key policy trusts the wrong account root. An S3 bucket policy is not enough if the principal can assume a role outside the organization. CloudTrail logs do not answer the question if data events were never enabled or the log archive was not separated.

The core question is: how do you design an AWS data boundary where identity, network, encryption, and audit controls reinforce each other instead of leaving gaps between teams?

Data Boundary as Control Plane

The answer is to treat the data boundary as a control plane, not a subnet diagram.

A practical architecture has four layers. IAM defines who may ask. VPC endpoints define where requests may come from. KMS defines whether protected data can be decrypted. Audit trails define whether the decision can be reconstructed later. AWS Organizations ties those layers together with account placement, service control policies, and organization-aware condition keys.

flowchart TD
  Org[AWS Organizations — account guardrails] --> Workload[Workload account — application VPC]
  Org --> Data[Data account — protected data stores]
  Org --> Key[KMS key account — customer managed keys]
  Org --> Audit[Log archive account — immutable evidence]
  Org --> Sec[Security tooling account — delegated administration]

  Workload --> Principal[IAM role — workload identity]
  Workload --> Endpoint[VPC endpoint — private service path]
  Principal --> Policy[Policy set — identity resource network]
  Endpoint --> Policy
  Policy --> Data
  Data --> Key
  Workload --> Audit
  Data --> Audit
  Key --> Audit
  Sec --> Audit

The workload account should contain compute and the minimum IAM roles needed to run it. It should not be the final authority for data access. The data account should own durable stores such as S3 buckets, databases, streams, and queues that contain protected datasets. Resource policies should reject access unless the principal belongs to the expected AWS Organization, the role path is approved, and the request context matches the intended network path.

The network layer should not be confused with the whole boundary. VPC endpoints are useful because endpoint policies and condition keys such as aws:SourceVpce can constrain AWS service access to known private paths. They do not replace IAM. They make IAM assertions harder to exercise from unintended networks.

KMS should be a second authorization plane. A workload that can read an encrypted object should still need permission to use the relevant key. Key policies should be explicit about organization membership, approved principals, and service usage. For highly sensitive datasets, key administration should live outside the workload account so that compromising the application account does not automatically grant the ability to rewrite the decryption boundary.

Audit trails should be centralized into a log archive account. Organization CloudTrail, CloudTrail data events for sensitive stores, AWS Config, GuardDuty, Security Hub, IAM Access Analyzer, and KMS key usage events should feed a place that workload administrators cannot casually mutate. The operational goal is not perfect visibility. The goal is evidence that survives the first account-level failure.

In Practice

Context: AWS publicly documents the Security Reference Architecture as a multi-account baseline using a management account, security tooling, log archive, network, and workload accounts. The reference architecture also describes delegated administration for services such as GuardDuty, Security Hub, IAM Access Analyzer, AWS Config, and CloudTrail. See the AWS Security Reference Architecture: https://aws.amazon.com/blogs/security/aws-security-reference-architecture-a-guide-to-designing-with-aws-security-services/

Action: The documented pattern separates control ownership. Workload accounts run applications. A log archive account receives organization-level logs. A security tooling account aggregates findings. Guardrails are applied through AWS Organizations and Control Tower patterns rather than copied manually into each account.

Result: The result is reduced blast radius. A compromised workload role can still be dangerous, but it should not automatically own the audit trail, the detection configuration, the KMS administration path, and the organization policy layer. The boundary becomes a set of mutually reinforcing checks.

Learning: The important lesson is that account separation only works when policy context crosses account lines. AWS IAM data perimeter guidance explicitly calls out identity, resource, and network perimeters, including condition keys such as aws:PrincipalOrgID for organization membership. See AWS IAM data perimeter guidance: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_data-perimeters.html

Context: AWS KMS authorization is not governed by IAM alone. KMS key policies are part of the authorization decision, and AWS documents condition keys such as aws:SourceVpce, aws:SourceVpc, aws:PrincipalOrgID, and aws:PrincipalOrgPaths for constraining access.

Action: Use KMS key policies to make decryption depend on the same boundary assertions as the data policy: approved organization, approved account path, approved role, and expected network source where supported.

Result: A principal that obtains S3 or database access still needs to satisfy the encryption boundary. This is not a substitute for least privilege, but it prevents a single permissive resource policy from becoming the whole security model.

Learning: KMS is most useful as an independent choke point when administration, use, and audit are separated. If the same workload administrator can edit the IAM role, bucket policy, key policy, and log destination, the architecture has controls but not meaningful independence.

Where It Breaks

Failure mode	Why it happens	Hardening move
Cross-account role sprawl	Every team creates exceptions faster than the platform can review them	Use role naming, permission boundaries, IAM Access Analyzer, and organization conditions
VPC treated as the boundary	AWS API access is authorized by IAM and resource policy, not only packet path	Combine endpoint policies with identity and resource conditions
KMS keys owned by workload admins	The same compromised account can alter decryption rules	Separate key administration for sensitive data and log all key usage
CloudTrail exists but lacks data events	Management events show control-plane activity but miss object-level reads	Enable data events for sensitive S3 buckets and high-value resources
Log archive is writable by workloads	Attackers can remove or alter evidence after compromise	Centralize logs in a separate account with restrictive bucket and key policies
Service control policies are overused	Broad denies can block operations without proving data safety	Use SCPs for coarse guardrails and enforce fine-grained access in IAM, resource policies, and KMS

What to Do Next

Problem: Inventory the actual data paths, not just the accounts. For each protected dataset, record the IAM principals, VPC endpoints, KMS keys, resource policies, and CloudTrail data event coverage.
Solution: Build the boundary as layered authorization. Require organization membership, approved role identity, expected network source, explicit data resource policy, and KMS permission for sensitive reads.
Proof: Test the negative cases. Attempt access from an account outside the organization, from an unapproved role inside the organization, from the wrong VPC endpoint, and with missing KMS permissions. A boundary that has not been tested with denied paths is only a diagram.
Action: Start with one production dataset. Move logs to a dedicated archive account, tighten the resource policy with organization-aware conditions, restrict KMS use to approved principals, require VPC endpoint access where practical, and make the resulting access decision visible in audit tooling. Then turn that pattern into account vending and infrastructure modules so every new workload inherits the boundary by default.

Situation

The Problem

Data Boundary as Control Plane

In Practice

Where It Breaks

What to Do Next

Rajiv

Related Posts

The Staff Engineer's System Design Review: Questions That Expose Real Risk

Designing for Peak Traffic Without Designing for Permanent Waste

Building a Commerce Platform Data Plane: OLTP, Search, Cache, Queue, Warehouse