Indexing Policies: Range, Spatial, and Composite
Tune Cosmos DB indexing policies to optimise query performance and reduce write costs β covering default indexing, include/exclude paths, range indexes, spatial indexes, composite indexes, and indexing modes.
Why indexing matters
Think of indexing like the index in the back of a textbook. Without it, finding βpartition keyβ means reading every page. With a good index, you flip straight to page 47.
Cosmos DB indexes everything by default β which is great for reads but adds cost to every write. Tuning your indexing policy means choosing exactly which βentriesβ go in the index so writes are cheaper and reads stay fast.
Amaraβs indexing challenge
π‘ Amara at SensorFlow writes 500M events per day. Each event has 20+ fields, but her queries only filter on deviceId, timestamp, and sensorType. The default βindex everythingβ policy means every write indexes all 20+ fields β wasting RU/s on indexes her queries never use.
Default indexing policy
Every new container starts with this policy:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{ "path": "/*" }
],
"excludedPaths": [
{ "path": "/\"_etag\"/?" }
]
}
/*means all properties are indexed_etagis excluded by default (system property)indexingMode: consistentmeans indexes update synchronously with writes
Include/exclude paths
The two strategies for customising:
Strategy 1: Opt-out (exclude specific paths)
Start from the default /* and exclude paths you donβt query:
{
"includedPaths": [{ "path": "/*" }],
"excludedPaths": [
{ "path": "/rawPayload/*" },
{ "path": "/metadata/*" },
{ "path": "/\"_etag\"/?" }
]
}
Strategy 2: Opt-in (include specific paths) Exclude everything, then include only what you need:
{
"includedPaths": [
{ "path": "/deviceId/?" },
{ "path": "/timestamp/?" },
{ "path": "/sensorType/?" }
],
"excludedPaths": [
{ "path": "/*" }
]
}
Exam tip: path syntax
/?β indexes the scalar value at this path (string, number, boolean)/*β indexes all descendants recursively (arrays, nested objects)/property/?β index a specific propertyβs scalar value/array/[]/?β index scalar values inside an array
For write-heavy workloads with known query patterns, the opt-in strategy (exclude /*, include specific paths) saves the most write RU/s.
Index types
Range indexes (default)
Used for equality (=), range (>, <, >=, <=, BETWEEN), and ORDER BY on a single field. Created automatically for all included paths.
Spatial indexes
For geospatial queries (ST_DISTANCE, ST_WITHIN, ST_INTERSECTS):
{
"includedPaths": [{ "path": "/*" }],
"spatialIndexes": [
{
"path": "/location/*",
"types": ["Point", "Polygon"]
}
]
}
Composite indexes
Required for ORDER BY on multiple fields or for queries that filter AND sort on different properties:
{
"compositeIndexes": [
[
{ "path": "/deviceId", "order": "ascending" },
{ "path": "/timestamp", "order": "descending" }
],
[
{ "path": "/sensorType", "order": "ascending" },
{ "path": "/value", "order": "ascending" }
]
]
}
When you need a composite index:
ORDER BY c.deviceId ASC, c.timestamp DESCβ multi-field sortWHERE c.sensorType = 'temp' ORDER BY c.timestamp DESCβ filter + sort on different fields
Exam tip: composite index order matters
The order (ascending/descending) in a composite index must exactly match the ORDER BY clause in your query. If your query does ORDER BY c.a ASC, c.b DESC, you need a composite index with [a ASC, b DESC].
A composite index with [a ASC, b ASC] will NOT serve ORDER BY a ASC, b DESC. The exam loves testing this mismatch.
Indexing modes
| Mode | Behaviour | Write Cost | Read Cost | Use Case |
|---|---|---|---|---|
| Consistent | Indexes updated synchronously with writes | Higher (index maintenance) | Low (indexes always current) | Default β most workloads |
| None | No indexing β container is a pure key-value store | Lowest (no index overhead) | High (full scan for non-point queries) | Pure point read/write workloads only |
Note: The
lazyindexing mode has been deprecated. Onlyconsistentandnoneare current options.
Exam tip: indexing mode 'none'
Setting indexing mode to none means no queries work except point reads (by id + partition key). Any SQL query will result in a full scan with massive RU cost. Only use this when your access pattern is exclusively point reads and writes β like a session store or cache.
Impact on write RU cost
Default policy (index everything):
Write 1KB document with 20 properties β ~10 RU
Optimised policy (index 3 properties):
Write 1KB document with 20 properties β ~6 RU
No indexing:
Write 1KB document β ~5 RU (minimum)
For Amaraβs 500M daily writes, reducing from 10 to 6 RU per write saves 2 billion RU/day β a significant cost reduction.
π¬ Video walkthrough
π¬ Video coming soon
Indexing Policies β DP-420 Module 18
Indexing Policies β DP-420 Module 18
~16 minFlashcards
Knowledge Check
Amara's container writes 500M events/day with 20+ properties, but queries only filter on deviceId, timestamp, and sensorType. How should she optimise the indexing policy?
Amara's query is: SELECT * FROM c WHERE c.sensorType = 'temp' ORDER BY c.timestamp DESC. The query fails with an error. What's missing?
A composite index is defined as [a ASC, b ASC]. Which query will this index serve?
Next up: Request Units and Query Cost Optimization β understanding what drives RU consumption and five techniques to reduce query costs.