Designing Your Data Model

Think access patterns first

Simple explanation

Imagine you’re organising a kitchen. You don’t start by categorising every ingredient into “grains,” “spices,” “proteins.” You start by asking: “What meals do I cook most?” Then you put the pasta, olive oil, and garlic near the stove — because that’s your hot path.

Cosmos DB works the same way. Don’t model your data around entities (users, orders, products). Model it around how you’ll query it. List your top 5 queries first, then design containers to serve them.

Priya’s scenario: modelling NovaSaaS

🚀 Priya at NovaSaaS has a multi-tenant project management platform. Her top queries are:

Get all projects for a tenant (landing page — runs every login)
Get a single project with its tasks (project detail page)
Create/update a task (happens thousands of times per day)
Get a user’s profile (auth flow)

Sophie (Jake’s DBA) would immediately create tenants, projects, tasks, and users tables. But Priya thinks differently…

The design process

Step 1: List queries          → "Get all projects for tenant X"
Step 2: Identify hot paths    → Query #1 runs 50K times/day
Step 3: Choose partition key   → /tenantId for the projects container
Step 4: Model for that query  → Projects + their tasks in the same container?
Step 5: Evaluate trade-offs   → Embedding tasks means bigger documents, but one read

Key mindset shift: In SQL Server, Sophie asks “What are my entities?” In Cosmos DB, Priya asks “What are my queries?”

Single-container vs multi-container design

Aspect	Single container (mixed entities)	Multiple containers (one per entity)
Structure	Projects, tasks, comments all in one container	Separate containers: projects, tasks, comments
Partition key	A shared key like /tenantId	Each container has its own optimal key
Querying	Use type discriminator to filter entity types	Query the right container directly
Throughput	Shared — one pool for all entity types	Dedicated per container — fine-grained control
Transactions	TransactionalBatch works across entity types (same partition)	Batch limited to one container
Best for	Entities that are often queried together	Entities with very different access patterns or scale
Cosmos DB cost	One container = one throughput allocation	Multiple containers = multiple throughput allocations

Priya chooses a single container for projects and tasks (they’re always queried together, same tenantId) and a separate container for user profiles (different access pattern — queried by userId, not tenantId).

The type discriminator pattern

When you store multiple entity types in one container, add a type property to distinguish them:

// A project document
{
  "id": "proj-001",
  "tenantId": "tenant-abc",
  "type": "project",
  "name": "Website Redesign",
  "status": "active",
  "createdAt": "2025-01-15T10:00:00Z"
}

// A task document (same container, same partition)
{
  "id": "task-042",
  "tenantId": "tenant-abc",
  "type": "task",
  "projectId": "proj-001",
  "title": "Update hero section",
  "assignee": "ravi@novasaas.com",
  "status": "in-progress"
}

Query all tasks for a tenant:

SELECT * FROM c
WHERE c.tenantId = 'tenant-abc'
  AND c.type = 'task'
  AND c.projectId = 'proj-001'

This is a single-partition query — fast and cheap (typically 3-5 RU).

Exam tip: type discriminator indexing

When using the type discriminator pattern, the type property is automatically indexed (Cosmos DB indexes all properties by default). Your queries filtering on type benefit from this index. However, if you’ve customised the indexing policy and excluded paths, make sure type is still included — otherwise your discriminator queries become expensive scans.

Entity boundaries

Not everything should go in one container. Consider splitting when:

Signal	Why split
Very different partition keys	Users by `/userId`, orders by `/customerId` — forcing them into one container means one key is suboptimal
Wildly different throughput	One entity needs 50K RU/s, another needs 500 RU/s — shared throughput wastes money
Different TTL needs	Logs expire after 30 days, profiles live forever
Security isolation	Some data needs different RBAC or encryption policies
Item size divergence	One entity is 500 bytes, another is 1.5 MB — awkward mix

The 2 MB item size limit

Every Cosmos DB item (document) has a hard limit of 2 MB including all properties, nested objects, and metadata. This matters when you embed related data:

❌ Bad: Embed 10,000 comments inside a blog post document
   → Document grows past 2 MB → write fails with 413 error

✅ Good: Embed the 5 most recent comments, reference the rest
   → Hot-path query gets recent comments in one read
   → "Load more" does a separate query

Ravi’s mistake: Ravi embedded an entire file attachment (base64-encoded) inside a task document. A 1.8 MB PDF pushed the document over 2 MB. The write failed with a 413 Request Entity Too Large error. Priya taught him to store attachments in Blob Storage and keep only a URL reference in Cosmos DB.

Exam tip: what counts toward the 2 MB limit

The 2 MB limit includes the entire JSON payload — all properties, nested arrays, nested objects, and system-generated properties (_rid, _self, _etag, _ts, _attachments). The system properties add roughly 200-300 bytes. When you’re close to the limit, those bytes matter. The limit is on the UTF-8 encoded JSON, not the in-memory object size.

Designing Priya’s final model

Container	Partition key	Entity types	Rationale
`workitems`	`/tenantId`	project, task, comment	Always queried by tenant; batch operations within a tenant
`users`	`/userId`	user	Different access pattern; queried by userId in auth flow
`audit`	`/tenantId`	auditEntry	High write volume; TTL 90 days; separate throughput

🎬 Video walkthrough

Flashcards

Question

What should you do FIRST when designing a Cosmos DB data model?

Click or press Enter to reveal answer

Answer

List your access patterns (queries). Identify the top 5 queries by frequency and importance, then design containers and partition keys to serve those queries efficiently. Don't start from entity-relationship diagrams.

Click to flip back

Question

What is the type discriminator pattern?

Click or press Enter to reveal answer

Answer

Adding a 'type' property (e.g., 'project', 'task', 'comment') to documents stored in the same container. This lets you filter by entity type in queries: WHERE c.type = 'task'. It's the standard pattern for single-container, multi-entity designs.

Click to flip back

Question

What is the maximum size of a single Cosmos DB item?

Click or press Enter to reveal answer

Answer

2 MB — including all JSON properties, nested objects, arrays, and system-generated metadata (_rid, _etag, _ts, etc.). Writes that exceed this fail with a 413 error.

Click to flip back

Question

When should you split entities into separate containers instead of using one container?

Click or press Enter to reveal answer

Answer

When entities need: (1) very different partition keys, (2) wildly different throughput, (3) different TTL settings, (4) different security/RBAC, or (5) vastly different item sizes. If entities share access patterns and partition keys, one container is fine.

Click to flip back

Knowledge check

Knowledge Check

Sophie (from a SQL Server background) suggests creating separate containers for 'projects', 'tasks', and 'comments' — just like tables. What's the strongest argument against this for NovaSaaS?

Knowledge Check

Ravi stores a document with a 1.9 MB base64-encoded PDF attachment. What happens when he tries to write it?

Knowledge Check

Priya's NovaSaaS landing page query is 'Get all projects for tenant X'. Which container design optimises this query?

Next up: Partition Key Strategy — the single most important decision in Cosmos DB design. Get it right, and everything scales. Get it wrong, and you’ll hit hot partitions and 429 errors.