Relationships β Embedding vs Referencing
Master the two fundamental data modelling strategies in Cosmos DB: embedding related data inside a document and referencing it with separate documents. Plus TTL, unique keys, and denormalisation sync with the change feed.
Embedding vs referencing
Think about a recipe card. You can write the ingredients right on the card (embedding) β everything you need is in one place. Or you can write βSee pantry shelf 3β and look it up separately (referencing).
Embedding is faster to read (one trip). Referencing is better when the ingredient list changes often or is shared across many recipes.
When to embed
Embed when:
| Signal | Why embedding works |
|---|---|
| 1:1 relationship | A user and their profile β always read together |
| 1:few relationship | An order with 3-5 line items β small, bounded |
| Data is read together | A blog post and its tags β always displayed together |
| Child rarely changes independently | Address embedded in a customer record |
| Bounded growth | You know the array wonβt exceed a reasonable size |
{
"id": "user-001",
"tenantId": "tenant-abc",
"type": "user",
"name": "Priya Sharma",
"email": "priya@novasaas.com",
"address": {
"street": "42 Cloud Lane",
"city": "Auckland",
"country": "New Zealand"
},
"roles": ["admin", "architect"]
}
One read, one RU β no JOINs, no second query.
When to reference
Reference when:
| Signal | Why referencing works |
|---|---|
| 1:many (unbounded) | A product with 10,000 reviews β embedding would blow past 2 MB |
| Data changes independently | A userβs name changes; you donβt want to update 500 embedded copies |
| Data is shared | A category referenced by 10,000 products β store once, reference everywhere |
| Data is large | A comment with 50 KB of rich text |
| Different access patterns | Comments are loaded on scroll, not with the parent |
// Parent document
{
"id": "proj-001",
"tenantId": "tenant-abc",
"type": "project",
"name": "Website Redesign",
"ownerId": "user-001"
}
// Referenced document (same or different container)
{
"id": "task-042",
"tenantId": "tenant-abc",
"type": "task",
"projectId": "proj-001",
"title": "Update hero section",
"assigneeId": "user-001"
}
Reading a project and its tasks: two queries, but each is a single-partition query on /tenantId.
Embedding vs referencing comparison
| Aspect | Embedding | Referencing |
|---|---|---|
| Read performance | Single read β fast, 1 RU for point read | Multiple reads β more RU, more latency |
| Write performance | Entire document rewritten on any change | Only the changed document is written |
| Data duplication | Possible β embedded copies may go stale | No duplication β single source of truth |
| Document size | Grows with embedded data β watch the 2 MB limit | Each document stays small |
| Consistency | Always consistent (one document) | May be eventually consistent (denormalised copies) |
| Best for | 1:1, 1:few, read-together, rarely changes | 1:many, changes often, shared across parents |
Denormalisation and the change feed
When you embed data (e.g., a userβs name inside every task theyβre assigned to), updates create a sync problem. The change feed solves this:
1. Priya updates her display name in the "users" container
2. A change feed processor detects the update
3. The processor queries all tasks where assigneeId = "user-001"
4. It patches each task's embedded assigneeName with the new value
// Change feed processor handler
static async Task HandleUserChanges(
ChangeFeedProcessorContext context,
IReadOnlyCollection<User> changes,
CancellationToken ct)
{
foreach (User user in changes)
{
// Find all tasks assigned to this user
var query = new QueryDefinition(
"SELECT * FROM c WHERE c.assigneeId = @userId AND c.type = 'task'")
.WithParameter("@userId", user.Id);
using FeedIterator<TaskItem> feed = tasksContainer
.GetItemQueryIterator<TaskItem>(query);
while (feed.HasMoreResults)
{
foreach (TaskItem task in await feed.ReadNextAsync(ct))
{
// Patch the embedded name
await tasksContainer.PatchItemAsync<TaskItem>(
task.Id,
new PartitionKey(task.TenantId),
new[] { PatchOperation.Set("/assigneeName", user.DisplayName) });
}
}
}
}
This gives you fast reads (embedded name) with eventual consistency (change feed updates propagate in seconds).
Time to Live (TTL)
TTL automatically deletes items after a specified number of seconds. Itβs configured at two levels:
| Level | Setting | Behaviour |
|---|---|---|
| Container default | DefaultTimeToLive | Enables TTL for the container. Set to -1 to enable without a default, or a positive number for a default expiry. |
| Per-item override | ttl property on the JSON document | Overrides the container default. Set to -1 to never expire, or a positive number for a custom expiry. |
// Enable TTL on container with a 90-day default
ContainerProperties props = new("audit", "/tenantId")
{
DefaultTimeToLive = 90 * 24 * 60 * 60 // 90 days in seconds
};
// Per-item: this specific item never expires
{
"id": "audit-critical-001",
"tenantId": "tenant-abc",
"type": "auditEntry",
"ttl": -1,
"action": "data-export",
"details": "Full tenant export requested by admin"
}
Key rules:
- Container TTL must be enabled (not null) before per-item TTL works
ttl: -1on an item means βnever expireβ even if the container has a default- TTL deletes consume RU/s but are background operations β no extra cost beyond RU
Exam tip: TTL hierarchy
TTL has three states: off (container DefaultTimeToLive is null β no items expire), on with default (positive number β all items expire unless overridden), on without default (set to -1 β items only expire if they have their own ttl value). Per-item ttl: -1 always means βnever expire.β Per-item ttl only works when the container has TTL enabled.
Unique keys
Unique keys enforce uniqueness within a logical partition β not across the entire container:
ContainerProperties props = new("users", "/tenantId")
{
UniqueKeyPolicy = new UniqueKeyPolicy
{
UniqueKeys =
{
new UniqueKey { Paths = { "/email" } },
new UniqueKey { Paths = { "/username" } }
}
}
};
Key rules:
- Unique keys are scoped to a logical partition β two different tenants can have the same email
- Unique keys must be defined at container creation time β you cannot add them later
- The uniqueness check includes
nullvalues β only one item per partition can have a null for that path
Raviβs mistake: Ravi assumed unique keys were globally unique across the container. He set /email as a unique key, then was confused when two different tenants could both register admin@company.com. Priya explained: unique keys are per-partition-key-value, not global.
Exam tip: unique key creation time
Like partition keys, unique key policies are set at container creation and cannot be changed later. Plan your uniqueness constraints upfront. If you need to add a unique key, you must create a new container and migrate data.
π¬ Video walkthrough
π¬ Video coming soon
Embedding vs Referencing β DP-420 Module 5
Embedding vs Referencing β DP-420 Module 5
~16 minFlashcards
Knowledge check
Priya's 'project' document embeds the owner's display name. The owner updates their name. What's the best way to keep the embedded name current?
A container has DefaultTimeToLive set to 86400 (1 day). An item has ttl: -1. When does the item expire?
Ravi sets a unique key on /email in a container partitioned by /tenantId. Can two users in different tenants have the same email?
Next up: SDK Connectivity β learn how to create and configure the CosmosClient, choose between direct and gateway modes, and authenticate with account keys or Entra ID.