Synthetic and Hierarchical Partition Keys

When a simple partition key isn’t enough

Simple explanation

Imagine your library’s “S” shelf is overflowing. You have two options: (1) Create a combined label like “S-Fiction” and “S-History” to spread books across sub-shelves (that’s a synthetic key). (2) Build a multi-level system — Floor → Section → Shelf (that’s a hierarchical key).

Both solve the same problem: one shelf had too much stuff. Synthetic keys are a string trick you do yourself. Hierarchical keys are a Cosmos DB feature that lets you define up to 3 levels of partitioning.

Synthetic partition keys

Technique 1: Concatenation

Combine two or more fields into a single string:

// In your application code, before writing to Cosmos DB
var item = new
{
    id = Guid.NewGuid().ToString(),
    tenantId = "tenant-abc",
    type = "task",
    partitionKey = "tenant-abc_task",  // synthetic key
    title = "Update hero section"
};

await container.CreateItemAsync(item, new PartitionKey("tenant-abc_task"));

When to use: Your queries almost always filter on both fields (e.g., tenantId AND type).

Technique 2: Random suffix

Append a random number to spread writes across partitions:

// Spread a hot tenant across 10 sub-partitions
int suffix = new Random().Next(0, 10);
string syntheticKey = $"tenant-bigcorp_{suffix}";

var item = new
{
    id = Guid.NewGuid().ToString(),
    tenantId = "tenant-bigcorp",
    partitionKey = syntheticKey,  // e.g., "tenant-bigcorp_7"
    sensorData = new { temperature = 72.5 }
};

Trade-off: Writes are perfectly distributed, but reads need to fan out across all suffix values:

// To read ALL data for tenant-bigcorp, you need 10 queries
for (int i = 0; i < 10; i++)
{
    string pk = $"tenant-bigcorp_{i}";
    // query each sub-partition
}

Technique 3: Hash

Use a hash for deterministic distribution without fan-out (if you know the input):

string input = $"{tenantId}_{projectId}";
int hash = Math.Abs(input.GetHashCode()) % 100;
string syntheticKey = $"{tenantId}_{hash}";

When to use: You want even distribution AND can recompute the key at read time from known inputs.

Hierarchical partition keys

Hierarchical keys are a Cosmos DB feature — you define up to 3 levels of partition key paths when creating the container:

// Create a container with hierarchical partition keys
ContainerProperties properties = new ContainerProperties(
    id: "workitems",
    partitionKeyPaths: new List<string> { "/tenantId", "/projectId", "/id" }
);

Database database = await cosmosClient.CreateDatabaseIfNotExistsAsync("novasaas");
Container container = await database.CreateContainerIfNotExistsAsync(properties, throughput: 10000);

How it works

Level 1: /tenantId     → "tenant-abc"
Level 2: /projectId    → "proj-001"
Level 3: /id           → "task-042"

Logical partition = combination of all 3 levels

Queries at level 1 (WHERE c.tenantId = 'abc') target all partitions for that tenant
Queries at level 1 + 2 (WHERE c.tenantId = 'abc' AND c.projectId = 'proj-001') are more targeted
Queries at all 3 levels are a precise point read

Breaking the 20 GB limit

With a simple /tenantId key, all data for one tenant must fit in 20 GB. With hierarchical keys:

Each unique combination of all levels is a logical partition
Tenant “abc” with 100 projects has 100+ logical partitions (one per project per item)
Each logical partition stays small → the 20 GB limit effectively disappears for that tenant

Query efficiency: left-to-right

Hierarchical keys enable prefix queries — you can query from left to right:

-- ✅ Uses first 2 levels — efficient, scoped
SELECT * FROM c
WHERE c.tenantId = 'tenant-abc'
  AND c.projectId = 'proj-001'

-- ✅ Uses first level only — still targeted (all of tenant's data)
SELECT * FROM c WHERE c.tenantId = 'tenant-abc'

-- ❌ Skips level 1 — cross-partition fan-out
SELECT * FROM c WHERE c.projectId = 'proj-001'

Exam tip: query left-to-right

With hierarchical partition keys, queries must specify levels from left to right without skipping. You can use level 1 alone, levels 1+2, or levels 1+2+3. But you cannot skip level 1 and query only level 2 — that’s a cross-partition query. Think of it like a phone book: you can look up by country → city → name, but not city alone without a country.

Synthetic vs hierarchical keys

Aspect	Synthetic keys	Hierarchical partition keys
Where defined	In your application code (a computed property)	In the container definition (Cosmos DB feature)
Max levels	Unlimited (it's just string concatenation)	Up to 3 levels
20 GB limit	Still applies to the synthetic key value	Effectively broken — each level combination is a separate logical partition
Query support	You manage the key format in queries	Native prefix queries (left-to-right)
Write logic	You must compute the key before every write	Cosmos DB combines the levels automatically
Fan-out on reads	Random suffix requires reading all suffix values	Prefix queries are natively efficient
Best for	Simple distribution improvements, write-heavy append	Multi-level hierarchical data (tenant → project → item)
Retroactive	Can be added to existing containers (new property)	Must be set at container creation

Best practice: Make the last level unique (e.g., /id). This ensures every item has its own logical partition, giving maximum distribution while still allowing efficient prefix queries at higher levels.

Exam tip: hierarchical key creation

Hierarchical partition keys must be defined at container creation time — you cannot add or change levels later. The paths must be in the document (they’re not computed). If you need to change the hierarchy, you must create a new container and migrate data.

🎬 Video walkthrough

Flashcards

Question

What are the three synthetic partition key techniques?

Click or press Enter to reveal answer

Answer

1) Concatenation — combine fields (e.g., 'tenant-abc_task'). 2) Random suffix — append a random number for write distribution (e.g., 'tenant-abc_7'). 3) Hash — deterministic distribution using a hash function. All are computed in your application code before writing.

Click to flip back

Question

How many levels can a hierarchical partition key have?

Click or press Enter to reveal answer

Answer

Up to 3 levels (e.g., /tenantId, /projectId, /id). Each unique combination of all levels forms a separate logical partition, effectively breaking the 20 GB limit for any single top-level value.

Click to flip back

Question

What is the query rule for hierarchical partition keys?

Click or press Enter to reveal answer

Answer

Queries must specify levels from left to right without skipping. You can filter on level 1 alone, levels 1+2, or levels 1+2+3. Skipping a level (e.g., querying only level 2) results in a cross-partition fan-out.

Click to flip back

Question

What is the best practice for the last level in a hierarchical partition key?

Click or press Enter to reveal answer

Answer

Make it unique — typically /id. This ensures every item is its own logical partition, giving maximum distribution. Higher levels enable efficient prefix queries for group-level access patterns.

Click to flip back

Knowledge check

Knowledge Check

Priya's enterprise tenant 'BigCorp' has 25 GB of data. She used a simple /tenantId partition key. What problem will she hit?

Knowledge Check

Ravi uses a random suffix (0-9) synthetic key for write-heavy IoT data. What's the trade-off?

Knowledge Check

Priya defines hierarchical keys as /tenantId, /projectId, /id. A query filters only on projectId (skipping tenantId). What happens?

Next up: Relationships — Embedding vs Referencing — learn when to nest data inside a document and when to keep separate documents with references.