Indexing Policies: Range, Spatial, and Composite

Why indexing matters

Simple explanation

Think of indexing like the index in the back of a textbook. Without it, finding “partition key” means reading every page. With a good index, you flip straight to page 47.

Cosmos DB indexes everything by default — which is great for reads but adds cost to every write. Tuning your indexing policy means choosing exactly which “entries” go in the index so writes are cheaper and reads stay fast.

Amara’s indexing challenge

📡 Amara at SensorFlow writes 500M events per day. Each event has 20+ fields, but her queries only filter on deviceId, timestamp, and sensorType. The default “index everything” policy means every write indexes all 20+ fields — wasting RU/s on indexes her queries never use.

Default indexing policy

Every new container starts with this policy:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    { "path": "/*" }
  ],
  "excludedPaths": [
    { "path": "/\"_etag\"/?" }
  ]
}

/* means all properties are indexed
_etag is excluded by default (system property)
indexingMode: consistent means indexes update synchronously with writes

Include/exclude paths

The two strategies for customising:

Strategy 1: Opt-out (exclude specific paths) Start from the default /* and exclude paths you don’t query:

{
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [
    { "path": "/rawPayload/*" },
    { "path": "/metadata/*" },
    { "path": "/\"_etag\"/?" }
  ]
}

Strategy 2: Opt-in (include specific paths) Exclude everything, then include only what you need:

{
  "includedPaths": [
    { "path": "/deviceId/?" },
    { "path": "/timestamp/?" },
    { "path": "/sensorType/?" }
  ],
  "excludedPaths": [
    { "path": "/*" }
  ]
}

Exam tip: path syntax

/? — indexes the scalar value at this path (string, number, boolean)
/* — indexes all descendants recursively (arrays, nested objects)
/property/? — index a specific property’s scalar value
/array/[]/? — index scalar values inside an array

For write-heavy workloads with known query patterns, the opt-in strategy (exclude /*, include specific paths) saves the most write RU/s.

Index types

Range indexes (default)

Used for equality (=), range (>, <, >=, <=, BETWEEN), and ORDER BY on a single field. Created automatically for all included paths.

Spatial indexes

For geospatial queries (ST_DISTANCE, ST_WITHIN, ST_INTERSECTS):

{
  "includedPaths": [{ "path": "/*" }],
  "spatialIndexes": [
    {
      "path": "/location/*",
      "types": ["Point", "Polygon"]
    }
  ]
}

Composite indexes

Required for ORDER BY on multiple fields or for queries that filter AND sort on different properties:

{
  "compositeIndexes": [
    [
      { "path": "/deviceId", "order": "ascending" },
      { "path": "/timestamp", "order": "descending" }
    ],
    [
      { "path": "/sensorType", "order": "ascending" },
      { "path": "/value", "order": "ascending" }
    ]
  ]
}

When you need a composite index:

ORDER BY c.deviceId ASC, c.timestamp DESC — multi-field sort
WHERE c.sensorType = 'temp' ORDER BY c.timestamp DESC — filter + sort on different fields

Exam tip: composite index order matters

The order (ascending/descending) in a composite index must exactly match the ORDER BY clause in your query. If your query does ORDER BY c.a ASC, c.b DESC, you need a composite index with [a ASC, b DESC].

A composite index with [a ASC, b ASC] will NOT serve ORDER BY a ASC, b DESC. The exam loves testing this mismatch.

Indexing modes

Mode	Behaviour	Write Cost	Read Cost	Use Case
Consistent	Indexes updated synchronously with writes	Higher (index maintenance)	Low (indexes always current)	Default — most workloads
None	No indexing — container is a pure key-value store	Lowest (no index overhead)	High (full scan for non-point queries)	Pure point read/write workloads only

Note: The lazy indexing mode has been deprecated. Only consistent and none are current options.

Exam tip: indexing mode 'none'

Setting indexing mode to none means no queries work except point reads (by id + partition key). Any SQL query will result in a full scan with massive RU cost. Only use this when your access pattern is exclusively point reads and writes — like a session store or cache.

Impact on write RU cost

Default policy (index everything):
  Write 1KB document with 20 properties → ~10 RU

Optimised policy (index 3 properties):
  Write 1KB document with 20 properties → ~6 RU

No indexing:
  Write 1KB document → ~5 RU (minimum)

For Amara’s 500M daily writes, reducing from 10 to 6 RU per write saves 2 billion RU/day — a significant cost reduction.

🎬 Video walkthrough

Flashcards

Question

What is the default indexing policy in Cosmos DB?

Click or press Enter to reveal answer

Answer

All properties are indexed (includedPaths: /*) with consistent indexing mode. Only _etag is excluded by default. This ensures any query works out of the box but increases write RU cost.

Click to flip back

Question

When do you need a composite index?

Click or press Enter to reveal answer

Answer

For ORDER BY on multiple fields (e.g., ORDER BY a ASC, b DESC) or queries that filter on one property and sort on another. The composite index order must EXACTLY match the ORDER BY direction in the query.

Click to flip back

Question

What happens if you set indexing mode to 'none'?

Click or press Enter to reveal answer

Answer

No indexes are maintained. Only point reads (by id + partition key) work efficiently. SQL queries trigger full scans with high RU cost. Use only for pure key-value access patterns like session stores or caches.

Click to flip back

Question

What are spatial indexes used for?

Click or press Enter to reveal answer

Answer

Geospatial queries like ST_DISTANCE, ST_WITHIN, and ST_INTERSECTS. You must explicitly configure them with the geometry types (Point, Polygon, LineString, MultiPolygon) — they're not part of the default indexing policy.

Click to flip back

Knowledge Check

Amara's container writes 500M events/day with 20+ properties, but queries only filter on deviceId, timestamp, and sensorType. How should she optimise the indexing policy?

Knowledge Check

Amara's query is: SELECT * FROM c WHERE c.sensorType = 'temp' ORDER BY c.timestamp DESC. The query fails with an error. What's missing?

Knowledge Check

A composite index is defined as [a ASC, b ASC]. Which query will this index serve?

Next up: Request Units and Query Cost Optimization — understanding what drives RU consumption and five techniques to reduce query costs.