Synthetic and Hierarchical Partition Keys
Go beyond simple partition keys with synthetic keys (concatenation, random suffixes, hashing) and hierarchical partition keys (up to 3 levels) to break the 20 GB logical partition limit and improve distribution.
When a simple partition key isnβt enough
Imagine your libraryβs βSβ shelf is overflowing. You have two options: (1) Create a combined label like βS-Fictionβ and βS-Historyβ to spread books across sub-shelves (thatβs a synthetic key). (2) Build a multi-level system β Floor β Section β Shelf (thatβs a hierarchical key).
Both solve the same problem: one shelf had too much stuff. Synthetic keys are a string trick you do yourself. Hierarchical keys are a Cosmos DB feature that lets you define up to 3 levels of partitioning.
Synthetic partition keys
Technique 1: Concatenation
Combine two or more fields into a single string:
// In your application code, before writing to Cosmos DB
var item = new
{
id = Guid.NewGuid().ToString(),
tenantId = "tenant-abc",
type = "task",
partitionKey = "tenant-abc_task", // synthetic key
title = "Update hero section"
};
await container.CreateItemAsync(item, new PartitionKey("tenant-abc_task"));
When to use: Your queries almost always filter on both fields (e.g., tenantId AND type).
Technique 2: Random suffix
Append a random number to spread writes across partitions:
// Spread a hot tenant across 10 sub-partitions
int suffix = new Random().Next(0, 10);
string syntheticKey = $"tenant-bigcorp_{suffix}";
var item = new
{
id = Guid.NewGuid().ToString(),
tenantId = "tenant-bigcorp",
partitionKey = syntheticKey, // e.g., "tenant-bigcorp_7"
sensorData = new { temperature = 72.5 }
};
Trade-off: Writes are perfectly distributed, but reads need to fan out across all suffix values:
// To read ALL data for tenant-bigcorp, you need 10 queries
for (int i = 0; i < 10; i++)
{
string pk = $"tenant-bigcorp_{i}";
// query each sub-partition
}
Technique 3: Hash
Use a hash for deterministic distribution without fan-out (if you know the input):
string input = $"{tenantId}_{projectId}";
int hash = Math.Abs(input.GetHashCode()) % 100;
string syntheticKey = $"{tenantId}_{hash}";
When to use: You want even distribution AND can recompute the key at read time from known inputs.
Hierarchical partition keys
Hierarchical keys are a Cosmos DB feature β you define up to 3 levels of partition key paths when creating the container:
// Create a container with hierarchical partition keys
ContainerProperties properties = new ContainerProperties(
id: "workitems",
partitionKeyPaths: new List<string> { "/tenantId", "/projectId", "/id" }
);
Database database = await cosmosClient.CreateDatabaseIfNotExistsAsync("novasaas");
Container container = await database.CreateContainerIfNotExistsAsync(properties, throughput: 10000);
How it works
Level 1: /tenantId β "tenant-abc"
Level 2: /projectId β "proj-001"
Level 3: /id β "task-042"
Logical partition = combination of all 3 levels
- Queries at level 1 (
WHERE c.tenantId = 'abc') target all partitions for that tenant - Queries at level 1 + 2 (
WHERE c.tenantId = 'abc' AND c.projectId = 'proj-001') are more targeted - Queries at all 3 levels are a precise point read
Breaking the 20 GB limit
With a simple /tenantId key, all data for one tenant must fit in 20 GB. With hierarchical keys:
- Each unique combination of all levels is a logical partition
- Tenant βabcβ with 100 projects has 100+ logical partitions (one per project per item)
- Each logical partition stays small β the 20 GB limit effectively disappears for that tenant
Query efficiency: left-to-right
Hierarchical keys enable prefix queries β you can query from left to right:
-- β
Uses first 2 levels β efficient, scoped
SELECT * FROM c
WHERE c.tenantId = 'tenant-abc'
AND c.projectId = 'proj-001'
-- β
Uses first level only β still targeted (all of tenant's data)
SELECT * FROM c WHERE c.tenantId = 'tenant-abc'
-- β Skips level 1 β cross-partition fan-out
SELECT * FROM c WHERE c.projectId = 'proj-001'
Exam tip: query left-to-right
With hierarchical partition keys, queries must specify levels from left to right without skipping. You can use level 1 alone, levels 1+2, or levels 1+2+3. But you cannot skip level 1 and query only level 2 β thatβs a cross-partition query. Think of it like a phone book: you can look up by country β city β name, but not city alone without a country.
Synthetic vs hierarchical keys
| Aspect | Synthetic keys | Hierarchical partition keys |
|---|---|---|
| Where defined | In your application code (a computed property) | In the container definition (Cosmos DB feature) |
| Max levels | Unlimited (it's just string concatenation) | Up to 3 levels |
| 20 GB limit | Still applies to the synthetic key value | Effectively broken β each level combination is a separate logical partition |
| Query support | You manage the key format in queries | Native prefix queries (left-to-right) |
| Write logic | You must compute the key before every write | Cosmos DB combines the levels automatically |
| Fan-out on reads | Random suffix requires reading all suffix values | Prefix queries are natively efficient |
| Best for | Simple distribution improvements, write-heavy append | Multi-level hierarchical data (tenant β project β item) |
| Retroactive | Can be added to existing containers (new property) | Must be set at container creation |
Best practice: Make the last level unique (e.g., /id). This ensures every item has its own logical partition, giving maximum distribution while still allowing efficient prefix queries at higher levels.
Exam tip: hierarchical key creation
Hierarchical partition keys must be defined at container creation time β you cannot add or change levels later. The paths must be in the document (theyβre not computed). If you need to change the hierarchy, you must create a new container and migrate data.
π¬ Video walkthrough
π¬ Video coming soon
Synthetic & Hierarchical Keys β DP-420 Module 4
Synthetic & Hierarchical Keys β DP-420 Module 4
~14 minFlashcards
Knowledge check
Priya's enterprise tenant 'BigCorp' has 25 GB of data. She used a simple /tenantId partition key. What problem will she hit?
Ravi uses a random suffix (0-9) synthetic key for write-heavy IoT data. What's the trade-off?
Priya defines hierarchical keys as /tenantId, /projectId, /id. A query filters only on projectId (skipping tenantId). What happens?
Next up: Relationships β Embedding vs Referencing β learn when to nest data inside a document and when to keep separate documents with references.