Designing Your Data Model
Learn to design Cosmos DB data models by starting with access patterns, choosing single vs multi-container layouts, using type discriminators, and respecting entity boundaries and the 2 MB item size limit.
Think access patterns first
Imagine you’re organising a kitchen. You don’t start by categorising every ingredient into “grains,” “spices,” “proteins.” You start by asking: “What meals do I cook most?” Then you put the pasta, olive oil, and garlic near the stove — because that’s your hot path.
Cosmos DB works the same way. Don’t model your data around entities (users, orders, products). Model it around how you’ll query it. List your top 5 queries first, then design containers to serve them.
Priya’s scenario: modelling NovaSaaS
🚀 Priya at NovaSaaS has a multi-tenant project management platform. Her top queries are:
- Get all projects for a tenant (landing page — runs every login)
- Get a single project with its tasks (project detail page)
- Create/update a task (happens thousands of times per day)
- Get a user’s profile (auth flow)
Sophie (Jake’s DBA) would immediately create tenants, projects, tasks, and users tables. But Priya thinks differently…
The design process
Step 1: List queries → "Get all projects for tenant X"
Step 2: Identify hot paths → Query #1 runs 50K times/day
Step 3: Choose partition key → /tenantId for the projects container
Step 4: Model for that query → Projects + their tasks in the same container?
Step 5: Evaluate trade-offs → Embedding tasks means bigger documents, but one read
Key mindset shift: In SQL Server, Sophie asks “What are my entities?” In Cosmos DB, Priya asks “What are my queries?”
Single-container vs multi-container design
| Aspect | Single container (mixed entities) | Multiple containers (one per entity) |
|---|---|---|
| Structure | Projects, tasks, comments all in one container | Separate containers: projects, tasks, comments |
| Partition key | A shared key like /tenantId | Each container has its own optimal key |
| Querying | Use type discriminator to filter entity types | Query the right container directly |
| Throughput | Shared — one pool for all entity types | Dedicated per container — fine-grained control |
| Transactions | TransactionalBatch works across entity types (same partition) | Batch limited to one container |
| Best for | Entities that are often queried together | Entities with very different access patterns or scale |
| Cosmos DB cost | One container = one throughput allocation | Multiple containers = multiple throughput allocations |
Priya chooses a single container for projects and tasks (they’re always queried together, same tenantId) and a separate container for user profiles (different access pattern — queried by userId, not tenantId).
The type discriminator pattern
When you store multiple entity types in one container, add a type property to distinguish them:
// A project document
{
"id": "proj-001",
"tenantId": "tenant-abc",
"type": "project",
"name": "Website Redesign",
"status": "active",
"createdAt": "2025-01-15T10:00:00Z"
}
// A task document (same container, same partition)
{
"id": "task-042",
"tenantId": "tenant-abc",
"type": "task",
"projectId": "proj-001",
"title": "Update hero section",
"assignee": "ravi@novasaas.com",
"status": "in-progress"
}
Query all tasks for a tenant:
SELECT * FROM c
WHERE c.tenantId = 'tenant-abc'
AND c.type = 'task'
AND c.projectId = 'proj-001'
This is a single-partition query — fast and cheap (typically 3-5 RU).
Exam tip: type discriminator indexing
When using the type discriminator pattern, the type property is automatically indexed (Cosmos DB indexes all properties by default). Your queries filtering on type benefit from this index. However, if you’ve customised the indexing policy and excluded paths, make sure type is still included — otherwise your discriminator queries become expensive scans.
Entity boundaries
Not everything should go in one container. Consider splitting when:
| Signal | Why split |
|---|---|
| Very different partition keys | Users by /userId, orders by /customerId — forcing them into one container means one key is suboptimal |
| Wildly different throughput | One entity needs 50K RU/s, another needs 500 RU/s — shared throughput wastes money |
| Different TTL needs | Logs expire after 30 days, profiles live forever |
| Security isolation | Some data needs different RBAC or encryption policies |
| Item size divergence | One entity is 500 bytes, another is 1.5 MB — awkward mix |
The 2 MB item size limit
Every Cosmos DB item (document) has a hard limit of 2 MB including all properties, nested objects, and metadata. This matters when you embed related data:
❌ Bad: Embed 10,000 comments inside a blog post document
→ Document grows past 2 MB → write fails with 413 error
✅ Good: Embed the 5 most recent comments, reference the rest
→ Hot-path query gets recent comments in one read
→ "Load more" does a separate query
Ravi’s mistake: Ravi embedded an entire file attachment (base64-encoded) inside a task document. A 1.8 MB PDF pushed the document over 2 MB. The write failed with a 413 Request Entity Too Large error. Priya taught him to store attachments in Blob Storage and keep only a URL reference in Cosmos DB.
Exam tip: what counts toward the 2 MB limit
The 2 MB limit includes the entire JSON payload — all properties, nested arrays, nested objects, and system-generated properties (_rid, _self, _etag, _ts, _attachments). The system properties add roughly 200-300 bytes. When you’re close to the limit, those bytes matter. The limit is on the UTF-8 encoded JSON, not the in-memory object size.
Designing Priya’s final model
| Container | Partition key | Entity types | Rationale |
|---|---|---|---|
workitems | /tenantId | project, task, comment | Always queried by tenant; batch operations within a tenant |
users | /userId | user | Different access pattern; queried by userId in auth flow |
audit | /tenantId | auditEntry | High write volume; TTL 90 days; separate throughput |
🎬 Video walkthrough
🎬 Video coming soon
Designing Data Models — DP-420 Module 2
Designing Data Models — DP-420 Module 2
~16 minFlashcards
Knowledge check
Sophie (from a SQL Server background) suggests creating separate containers for 'projects', 'tasks', and 'comments' — just like tables. What's the strongest argument against this for NovaSaaS?
Ravi stores a document with a 1.9 MB base64-encoded PDF attachment. What happens when he tries to write it?
Priya's NovaSaaS landing page query is 'Get all projects for tenant X'. Which container design optimises this query?
Next up: Partition Key Strategy — the single most important decision in Cosmos DB design. Get it right, and everything scales. Get it wrong, and you’ll hit hot partitions and 429 errors.