🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901 aws-aif-c01
Guided DP-420 Domain 1
Domain 1 — Module 2 of 11 18%
2 of 28 overall

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB — The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships — Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB — The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships — Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review
Domain 1: Design and Implement Data Models Free ⏱ ~16 min read

Designing Your Data Model

Learn to design Cosmos DB data models by starting with access patterns, choosing single vs multi-container layouts, using type discriminators, and respecting entity boundaries and the 2 MB item size limit.

Think access patterns first

☕ Simple explanation

Imagine you’re organising a kitchen. You don’t start by categorising every ingredient into “grains,” “spices,” “proteins.” You start by asking: “What meals do I cook most?” Then you put the pasta, olive oil, and garlic near the stove — because that’s your hot path.

Cosmos DB works the same way. Don’t model your data around entities (users, orders, products). Model it around how you’ll query it. List your top 5 queries first, then design containers to serve them.

In relational databases, you normalise data into tables based on entity relationships (3NF). In Cosmos DB, you design around access patterns:

  1. List your queries: What will the app read and write most often?
  2. Identify hot paths: Which queries run thousands of times per second?
  3. Design containers: Group data so that hot-path queries hit a single partition with a single point read or narrow query.
  4. Accept denormalisation: Duplicate data if it means your top queries avoid cross-partition fan-out.

The goal: 1 RU point reads for hot paths, not 50 RU cross-partition queries.

Priya’s scenario: modelling NovaSaaS

🚀 Priya at NovaSaaS has a multi-tenant project management platform. Her top queries are:

  1. Get all projects for a tenant (landing page — runs every login)
  2. Get a single project with its tasks (project detail page)
  3. Create/update a task (happens thousands of times per day)
  4. Get a user’s profile (auth flow)

Sophie (Jake’s DBA) would immediately create tenants, projects, tasks, and users tables. But Priya thinks differently…

The design process

Step 1: List queries          → "Get all projects for tenant X"
Step 2: Identify hot paths    → Query #1 runs 50K times/day
Step 3: Choose partition key   → /tenantId for the projects container
Step 4: Model for that query  → Projects + their tasks in the same container?
Step 5: Evaluate trade-offs   → Embedding tasks means bigger documents, but one read

Key mindset shift: In SQL Server, Sophie asks “What are my entities?” In Cosmos DB, Priya asks “What are my queries?”

Single-container vs multi-container design

AspectSingle container (mixed entities)Multiple containers (one per entity)
StructureProjects, tasks, comments all in one containerSeparate containers: projects, tasks, comments
Partition keyA shared key like /tenantIdEach container has its own optimal key
QueryingUse type discriminator to filter entity typesQuery the right container directly
ThroughputShared — one pool for all entity typesDedicated per container — fine-grained control
TransactionsTransactionalBatch works across entity types (same partition)Batch limited to one container
Best forEntities that are often queried togetherEntities with very different access patterns or scale
Cosmos DB costOne container = one throughput allocationMultiple containers = multiple throughput allocations

Priya chooses a single container for projects and tasks (they’re always queried together, same tenantId) and a separate container for user profiles (different access pattern — queried by userId, not tenantId).

The type discriminator pattern

When you store multiple entity types in one container, add a type property to distinguish them:

// A project document
{
  "id": "proj-001",
  "tenantId": "tenant-abc",
  "type": "project",
  "name": "Website Redesign",
  "status": "active",
  "createdAt": "2025-01-15T10:00:00Z"
}

// A task document (same container, same partition)
{
  "id": "task-042",
  "tenantId": "tenant-abc",
  "type": "task",
  "projectId": "proj-001",
  "title": "Update hero section",
  "assignee": "ravi@novasaas.com",
  "status": "in-progress"
}

Query all tasks for a tenant:

SELECT * FROM c
WHERE c.tenantId = 'tenant-abc'
  AND c.type = 'task'
  AND c.projectId = 'proj-001'

This is a single-partition query — fast and cheap (typically 3-5 RU).

💡 Exam tip: type discriminator indexing

When using the type discriminator pattern, the type property is automatically indexed (Cosmos DB indexes all properties by default). Your queries filtering on type benefit from this index. However, if you’ve customised the indexing policy and excluded paths, make sure type is still included — otherwise your discriminator queries become expensive scans.

Entity boundaries

Not everything should go in one container. Consider splitting when:

SignalWhy split
Very different partition keysUsers by /userId, orders by /customerId — forcing them into one container means one key is suboptimal
Wildly different throughputOne entity needs 50K RU/s, another needs 500 RU/s — shared throughput wastes money
Different TTL needsLogs expire after 30 days, profiles live forever
Security isolationSome data needs different RBAC or encryption policies
Item size divergenceOne entity is 500 bytes, another is 1.5 MB — awkward mix

The 2 MB item size limit

Every Cosmos DB item (document) has a hard limit of 2 MB including all properties, nested objects, and metadata. This matters when you embed related data:

❌ Bad: Embed 10,000 comments inside a blog post document
   → Document grows past 2 MB → write fails with 413 error

✅ Good: Embed the 5 most recent comments, reference the rest
   → Hot-path query gets recent comments in one read
   → "Load more" does a separate query

Ravi’s mistake: Ravi embedded an entire file attachment (base64-encoded) inside a task document. A 1.8 MB PDF pushed the document over 2 MB. The write failed with a 413 Request Entity Too Large error. Priya taught him to store attachments in Blob Storage and keep only a URL reference in Cosmos DB.

💡 Exam tip: what counts toward the 2 MB limit

The 2 MB limit includes the entire JSON payload — all properties, nested arrays, nested objects, and system-generated properties (_rid, _self, _etag, _ts, _attachments). The system properties add roughly 200-300 bytes. When you’re close to the limit, those bytes matter. The limit is on the UTF-8 encoded JSON, not the in-memory object size.

Designing Priya’s final model

ContainerPartition keyEntity typesRationale
workitems/tenantIdproject, task, commentAlways queried by tenant; batch operations within a tenant
users/userIduserDifferent access pattern; queried by userId in auth flow
audit/tenantIdauditEntryHigh write volume; TTL 90 days; separate throughput

🎬 Video walkthrough

🎬 Video coming soon

Designing Data Models — DP-420 Module 2

Designing Data Models — DP-420 Module 2

~16 min

Flashcards

Question

What should you do FIRST when designing a Cosmos DB data model?

Click or press Enter to reveal answer

Answer

List your access patterns (queries). Identify the top 5 queries by frequency and importance, then design containers and partition keys to serve those queries efficiently. Don't start from entity-relationship diagrams.

Click to flip back

Question

What is the type discriminator pattern?

Click or press Enter to reveal answer

Answer

Adding a 'type' property (e.g., 'project', 'task', 'comment') to documents stored in the same container. This lets you filter by entity type in queries: WHERE c.type = 'task'. It's the standard pattern for single-container, multi-entity designs.

Click to flip back

Question

What is the maximum size of a single Cosmos DB item?

Click or press Enter to reveal answer

Answer

2 MB — including all JSON properties, nested objects, arrays, and system-generated metadata (_rid, _etag, _ts, etc.). Writes that exceed this fail with a 413 error.

Click to flip back

Question

When should you split entities into separate containers instead of using one container?

Click or press Enter to reveal answer

Answer

When entities need: (1) very different partition keys, (2) wildly different throughput, (3) different TTL settings, (4) different security/RBAC, or (5) vastly different item sizes. If entities share access patterns and partition keys, one container is fine.

Click to flip back

Knowledge check

Knowledge Check

Sophie (from a SQL Server background) suggests creating separate containers for 'projects', 'tasks', and 'comments' — just like tables. What's the strongest argument against this for NovaSaaS?

Knowledge Check

Ravi stores a document with a 1.9 MB base64-encoded PDF attachment. What happens when he tries to write it?

Knowledge Check

Priya's NovaSaaS landing page query is 'Get all projects for tenant X'. Which container design optimises this query?


Next up: Partition Key Strategy — the single most important decision in Cosmos DB design. Get it right, and everything scales. Get it wrong, and you’ll hit hot partitions and 429 errors.

← Previous

Cosmos DB — The Big Picture

Next →

Partition Key Strategy

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.