πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901 aws-aif-c01
Guided DP-420 Domain 1
Domain 1 β€” Module 3 of 11 27%
3 of 28 overall

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review
Domain 1: Design and Implement Data Models Free ⏱ ~18 min read

Partition Key Strategy

Master the most important Cosmos DB design decision β€” choosing the right partition key. Understand physical vs logical partitions, the three rules of a good key, and common mistakes that cause hot partitions.

Why the partition key is everything

β˜• Simple explanation

Imagine a library with millions of books. The partition key is how you organise the shelves. If you sort by β€œfirst letter of the author’s last name,” shelf β€œS” gets Shakespeare, Spielberg, and 10,000 other authors β€” it overflows. But if you sort by a full author ID, each shelf has a manageable number of books.

A bad partition key creates one overloaded shelf (a hot partition). A good key spreads books evenly so every shelf does its fair share of work.

The partition key determines how data is distributed across physical storage. It affects:

  • Query efficiency: Queries that filter on the partition key hit one partition (fast). Without it, they fan out across all partitions (expensive).
  • Throughput distribution: RU/s are divided across physical partitions. A hot partition creates a bottleneck.
  • Storage balance: Each logical partition has a 20 GB limit. Skewed data hits this limit prematurely.
  • Transactions: TransactionalBatch only works within a single logical partition.

The partition key is immutable β€” you cannot change it after container creation. Choose wisely.

Physical vs logical partitions

ConceptLogical partitionPhysical partition
DefinitionAll items sharing the same partition key valueA physical storage unit managed by Cosmos DB
Size limit20 GB per logical partition~50 GB per physical partition
Throughput limitN/A (shares the physical partition’s RU budget)10,000 RU/s per physical partition
Who manages itYou (via partition key choice)Cosmos DB (automatic splits)
ContainsItems with identical PK valueOne or more logical partitions
Physical partition 1 (50 GB max, 10K RU/s)
  β”œβ”€β”€ Logical partition: tenantId = "abc"  (15 GB)
  └── Logical partition: tenantId = "def"  (8 GB)

Physical partition 2 (50 GB max, 10K RU/s)
  β”œβ”€β”€ Logical partition: tenantId = "ghi"  (12 GB)
  └── Logical partition: tenantId = "jkl"  (3 GB)

Key insight: Cosmos DB automatically splits physical partitions as data grows. But a single logical partition can never be split β€” all items with the same PK value must stay together on one physical partition.

The three rules of a good partition key

Rule 1: High cardinality

The key should have many distinct values β€” ideally as many as there are items, or close to it.

❌ /country       β†’ ~200 values for millions of documents
❌ /status        β†’ 3-5 values (active, inactive, archived)
βœ… /tenantId      β†’ thousands of distinct tenants
βœ… /userId        β†’ one per user
βœ… /orderId       β†’ one per order (highest cardinality)

Rule 2: Even distribution

Traffic and storage should spread evenly across partition key values. No single value should dominate.

❌ /companyId in a B2B app where one company has 80% of data
   β†’ That one logical partition becomes a hot partition

βœ… /userId in a consumer app with millions of balanced users
   β†’ Each user's data is roughly the same size

Rule 3: Query alignment

Your most frequent queries should include the partition key in the WHERE clause.

-- βœ… Single-partition query (fast, ~1 RU for point read)
SELECT * FROM c WHERE c.tenantId = 'abc' AND c.type = 'task'

-- ❌ Cross-partition query (fan-out, 10Γ— more RU)
SELECT * FROM c WHERE c.type = 'task' AND c.status = 'overdue'

Priya’s scenario: choosing the right key

πŸš€ Priya evaluates partition key candidates for her workitems container:

CandidateCardinalityDistributionQuery alignmentVerdict
/tenantIdMedium (1,000 tenants)⚠️ One enterprise tenant has 60% of dataβœ… Every query filters by tenantRisky β€” hot partition
/projectIdHigh (50,000 projects)βœ… Projects are roughly equal size⚠️ Task queries need projectId + tenantIdPossible
/idPerfect (unique per item)βœ… Perfectly even❌ Queries never filter by id aloneToo scattered
/tenantId + typeHigher (1,000 Γ— 5 types)βœ… Better spreadβœ… Most queries filter by bothBetter β€” but synthetic key needed

Priya decides on /tenantId for now but plans to evaluate hierarchical partition keys (next module) once she confirms the enterprise tenant’s data exceeds 20 GB.

Read-heavy vs write-heavy strategies

StrategyRead-heavy workloadsWrite-heavy workloads
GoalMinimise cross-partition readsDistribute writes across many partitions
Ideal PKMatches your WHERE clause filtersHigh cardinality (e.g., /deviceId, /eventId)
Example/tenantId for 'get all tasks for tenant X'/deviceId for IoT sensor writes
Trade-offMay concentrate writes on popular tenantsReads that span devices need fan-out
Common patternDenormalise + type discriminatorAppend-only with synthetic/random suffix
RiskHot partition on popular key valuesCross-partition queries for aggregations

Common partition key mistakes

MistakeWhy it’s badFix
Using /date or /timestampAll today’s writes go to one partition (hot write)Use /deviceId or append a random suffix
Using a low-cardinality field like /status3-5 partitions total β€” can’t scaleUse the entity’s natural ID
Choosing a key that doesn’t appear in queriesEvery query becomes cross-partitionAlign the key with your WHERE clauses
Ignoring the 20 GB logical limitOne large tenant fills the partitionUse hierarchical keys or synthetic keys
Assuming you can change it laterThe partition key is immutable after container creationDesign carefully upfront; migration = new container + data copy

Ravi’s mistake: Ravi chose /createdDate (a date string like β€œ2025-06-15”) as the partition key for the audit log container. On a busy day, all writes hammer the same partition. He got 429 (throttled) errors even though the container had plenty of total RU/s β€” because one physical partition was maxed at 10,000 RU/s.

πŸ’‘ Exam tip: partition key is immutable

You cannot change the partition key after creating a container. If you chose wrong, the only fix is to create a new container with the correct key and migrate data. The exam tests this β€” know that there’s no ALTER CONTAINER SET PARTITION KEY equivalent. Plan carefully or use the emulator to prototype first.

πŸ’‘ Exam tip: 429 on a single partition

You can get 429 (Too Many Requests) even when total RU/s isn’t exhausted β€” if a single physical partition exceeds its ~10,000 RU/s allocation. This is the classic hot partition symptom. The fix is a better partition key with more even distribution, not more total RU/s.

🎬 Video walkthrough

🎬 Video coming soon

Partition Key Strategy β€” DP-420 Module 3

Partition Key Strategy β€” DP-420 Module 3

~18 min

Flashcards

Question

What is the maximum size of a single logical partition?

Click or press Enter to reveal answer

Answer

20 GB. All items sharing the same partition key value must fit within this limit. A single logical partition can never be split across physical partitions.

Click to flip back

Question

What are the three rules for choosing a good partition key?

Click or press Enter to reveal answer

Answer

1) High cardinality β€” many distinct values. 2) Even distribution β€” no single value dominates storage or traffic. 3) Query alignment β€” your frequent queries include the partition key in WHERE clauses.

Click to flip back

Question

Can you change a container's partition key after creation?

Click or press Enter to reveal answer

Answer

No. The partition key is immutable. To change it, you must create a new container with the desired key and migrate all data. There is no ALTER operation for partition keys.

Click to flip back

Question

What is the maximum RU/s throughput per physical partition?

Click or press Enter to reveal answer

Answer

10,000 RU/s. Even if your container has 100,000 RU/s total, a single physical partition can only serve 10,000 RU/s. A hot partition hitting this limit causes 429 errors while the rest of the container sits idle.

Click to flip back

Question

What's the difference between a physical and logical partition?

Click or press Enter to reveal answer

Answer

A logical partition is all items with the same partition key value (max 20 GB). A physical partition is a storage unit managed by Cosmos DB (max ~50 GB, 10K RU/s) that contains one or more logical partitions. Cosmos DB splits physical partitions automatically.

Click to flip back

Knowledge check

Knowledge Check

Ravi chose /createdDate as the partition key for an audit log. The container has 50,000 RU/s but he's getting 429 errors. What's the most likely cause?

Knowledge Check

Priya is choosing a partition key for a container storing projects and tasks. Her #1 query is 'get all tasks for tenant X'. Which key best serves this query?

Knowledge Check

A logical partition for 'tenant-big-corp' has grown to 19.5 GB. What happens if it reaches 20 GB?


Next up: Synthetic & Hierarchical Keys β€” advanced techniques to break past the 20 GB logical partition limit and distribute data more evenly.

← Previous

Designing Your Data Model

Next β†’

Synthetic and Hierarchical Partition Keys

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.