Relationships — Embedding vs Referencing

Embedding vs referencing

Simple explanation

Think about a recipe card. You can write the ingredients right on the card (embedding) — everything you need is in one place. Or you can write “See pantry shelf 3” and look it up separately (referencing).

Embedding is faster to read (one trip). Referencing is better when the ingredient list changes often or is shared across many recipes.

When to embed

Embed when:

Signal	Why embedding works
1:1 relationship	A user and their profile — always read together
1:few relationship	An order with 3-5 line items — small, bounded
Data is read together	A blog post and its tags — always displayed together
Child rarely changes independently	Address embedded in a customer record
Bounded growth	You know the array won’t exceed a reasonable size

{
  "id": "user-001",
  "tenantId": "tenant-abc",
  "type": "user",
  "name": "Priya Sharma",
  "email": "priya@novasaas.com",
  "address": {
    "street": "42 Cloud Lane",
    "city": "Auckland",
    "country": "New Zealand"
  },
  "roles": ["admin", "architect"]
}

One read, one RU — no JOINs, no second query.

When to reference

Reference when:

Signal	Why referencing works
1:many (unbounded)	A product with 10,000 reviews — embedding would blow past 2 MB
Data changes independently	A user’s name changes; you don’t want to update 500 embedded copies
Data is shared	A category referenced by 10,000 products — store once, reference everywhere
Data is large	A comment with 50 KB of rich text
Different access patterns	Comments are loaded on scroll, not with the parent

// Parent document
{
  "id": "proj-001",
  "tenantId": "tenant-abc",
  "type": "project",
  "name": "Website Redesign",
  "ownerId": "user-001"
}

// Referenced document (same or different container)
{
  "id": "task-042",
  "tenantId": "tenant-abc",
  "type": "task",
  "projectId": "proj-001",
  "title": "Update hero section",
  "assigneeId": "user-001"
}

Reading a project and its tasks: two queries, but each is a single-partition query on /tenantId.

Embedding vs referencing comparison

Aspect	Embedding	Referencing
Read performance	Single read — fast, 1 RU for point read	Multiple reads — more RU, more latency
Write performance	Entire document rewritten on any change	Only the changed document is written
Data duplication	Possible — embedded copies may go stale	No duplication — single source of truth
Document size	Grows with embedded data — watch the 2 MB limit	Each document stays small
Consistency	Always consistent (one document)	May be eventually consistent (denormalised copies)
Best for	1:1, 1:few, read-together, rarely changes	1:many, changes often, shared across parents

Denormalisation and the change feed

When you embed data (e.g., a user’s name inside every task they’re assigned to), updates create a sync problem. The change feed solves this:

1. Priya updates her display name in the "users" container
2. A change feed processor detects the update
3. The processor queries all tasks where assigneeId = "user-001"
4. It patches each task's embedded assigneeName with the new value

// Change feed processor handler
static async Task HandleUserChanges(
    ChangeFeedProcessorContext context,
    IReadOnlyCollection<User> changes,
    CancellationToken ct)
{
    foreach (User user in changes)
    {
        // Find all tasks assigned to this user
        var query = new QueryDefinition(
            "SELECT * FROM c WHERE c.assigneeId = @userId AND c.type = 'task'")
            .WithParameter("@userId", user.Id);

        using FeedIterator<TaskItem> feed = tasksContainer
            .GetItemQueryIterator<TaskItem>(query);

        while (feed.HasMoreResults)
        {
            foreach (TaskItem task in await feed.ReadNextAsync(ct))
            {
                // Patch the embedded name
                await tasksContainer.PatchItemAsync<TaskItem>(
                    task.Id,
                    new PartitionKey(task.TenantId),
                    new[] { PatchOperation.Set("/assigneeName", user.DisplayName) });
            }
        }
    }
}

This gives you fast reads (embedded name) with eventual consistency (change feed updates propagate in seconds).

Time to Live (TTL)

TTL automatically deletes items after a specified number of seconds. It’s configured at two levels:

Level	Setting	Behaviour
Container default	`DefaultTimeToLive`	Enables TTL for the container. Set to `-1` to enable without a default, or a positive number for a default expiry.
Per-item override	`ttl` property on the JSON document	Overrides the container default. Set to `-1` to never expire, or a positive number for a custom expiry.

// Enable TTL on container with a 90-day default
ContainerProperties props = new("audit", "/tenantId")
{
    DefaultTimeToLive = 90 * 24 * 60 * 60  // 90 days in seconds
};

// Per-item: this specific item never expires
{
    "id": "audit-critical-001",
    "tenantId": "tenant-abc",
    "type": "auditEntry",
    "ttl": -1,
    "action": "data-export",
    "details": "Full tenant export requested by admin"
}

Key rules:

Container TTL must be enabled (not null) before per-item TTL works
ttl: -1 on an item means “never expire” even if the container has a default
TTL deletes consume RU/s but are background operations — no extra cost beyond RU

Exam tip: TTL hierarchy

TTL has three states: off (container DefaultTimeToLive is null — no items expire), on with default (positive number — all items expire unless overridden), on without default (set to -1 — items only expire if they have their own ttl value). Per-item ttl: -1 always means “never expire.” Per-item ttl only works when the container has TTL enabled.

Unique keys

Unique keys enforce uniqueness within a logical partition — not across the entire container:

ContainerProperties props = new("users", "/tenantId")
{
    UniqueKeyPolicy = new UniqueKeyPolicy
    {
        UniqueKeys =
        {
            new UniqueKey { Paths = { "/email" } },
            new UniqueKey { Paths = { "/username" } }
        }
    }
};

Key rules:

Unique keys are scoped to a logical partition — two different tenants can have the same email
Unique keys must be defined at container creation time — you cannot add them later
The uniqueness check includes null values — only one item per partition can have a null for that path

Ravi’s mistake: Ravi assumed unique keys were globally unique across the container. He set /email as a unique key, then was confused when two different tenants could both register admin@company.com. Priya explained: unique keys are per-partition-key-value, not global.

Exam tip: unique key creation time

Like partition keys, unique key policies are set at container creation and cannot be changed later. Plan your uniqueness constraints upfront. If you need to add a unique key, you must create a new container and migrate data.

🎬 Video walkthrough

Flashcards

Question

When should you embed related data in Cosmos DB?

Click or press Enter to reveal answer

Answer

Embed when: (1) 1:1 or 1:few relationship, (2) data is always read together, (3) child rarely changes independently, (4) growth is bounded. The benefit is a single read returning everything.

Click to flip back

Question

How do you keep denormalised (embedded) data in sync?

Click or press Enter to reveal answer

Answer

Use the change feed. When the source document changes, a change feed processor detects the update and patches all documents containing the embedded copy. This provides eventual consistency — updates propagate in seconds.

Click to flip back

Question

What does TTL: -1 mean on a Cosmos DB item?

Click or press Enter to reveal answer

Answer

It means 'never expire' — the item lives forever, even if the container has a default TTL. The container must have TTL enabled (DefaultTimeToLive ≠ null) for per-item TTL to work at all.

Click to flip back

Question

Are unique keys in Cosmos DB globally unique across the entire container?

Click or press Enter to reveal answer

Answer

No — unique keys are scoped to a logical partition (same partition key value). Two items in different logical partitions can have the same value for a unique key path. Unique key policies must be defined at container creation.

Click to flip back

Knowledge check

Knowledge Check

Priya's 'project' document embeds the owner's display name. The owner updates their name. What's the best way to keep the embedded name current?

Knowledge Check

A container has DefaultTimeToLive set to 86400 (1 day). An item has ttl: -1. When does the item expire?

Knowledge Check

Ravi sets a unique key on /email in a container partitioned by /tenantId. Can two users in different tenants have the same email?

Next up: SDK Connectivity — learn how to create and configure the CosmosClient, choose between direct and gateway modes, and authenticate with account keys or Entra ID.