πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-420 Domain 4
Domain 4 β€” Module 1 of 4 25%
18 of 28 overall

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review
Domain 4: Optimize Query and Operation Performance Premium ⏱ ~16 min read

Indexing Policies: Range, Spatial, and Composite

Tune Cosmos DB indexing policies to optimise query performance and reduce write costs β€” covering default indexing, include/exclude paths, range indexes, spatial indexes, composite indexes, and indexing modes.

Why indexing matters

β˜• Simple explanation

Think of indexing like the index in the back of a textbook. Without it, finding β€œpartition key” means reading every page. With a good index, you flip straight to page 47.

Cosmos DB indexes everything by default β€” which is great for reads but adds cost to every write. Tuning your indexing policy means choosing exactly which β€œentries” go in the index so writes are cheaper and reads stay fast.

Cosmos DB automatically indexes all properties in every document by default. While this ensures any query can be served efficiently, it comes at a cost:

  • Write amplification: Every write must update all indexed paths, increasing write RU cost.
  • Storage: Index data consumes storage alongside your documents.

Customising the indexing policy lets you include only the paths your queries actually use, reducing write costs without sacrificing read performance.

Amara’s indexing challenge

πŸ“‘ Amara at SensorFlow writes 500M events per day. Each event has 20+ fields, but her queries only filter on deviceId, timestamp, and sensorType. The default β€œindex everything” policy means every write indexes all 20+ fields β€” wasting RU/s on indexes her queries never use.

Default indexing policy

Every new container starts with this policy:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    { "path": "/*" }
  ],
  "excludedPaths": [
    { "path": "/\"_etag\"/?" }
  ]
}
  • /* means all properties are indexed
  • _etag is excluded by default (system property)
  • indexingMode: consistent means indexes update synchronously with writes

Include/exclude paths

The two strategies for customising:

Strategy 1: Opt-out (exclude specific paths) Start from the default /* and exclude paths you don’t query:

{
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [
    { "path": "/rawPayload/*" },
    { "path": "/metadata/*" },
    { "path": "/\"_etag\"/?" }
  ]
}

Strategy 2: Opt-in (include specific paths) Exclude everything, then include only what you need:

{
  "includedPaths": [
    { "path": "/deviceId/?" },
    { "path": "/timestamp/?" },
    { "path": "/sensorType/?" }
  ],
  "excludedPaths": [
    { "path": "/*" }
  ]
}
πŸ’‘ Exam tip: path syntax
  • /? β€” indexes the scalar value at this path (string, number, boolean)
  • /* β€” indexes all descendants recursively (arrays, nested objects)
  • /property/? β€” index a specific property’s scalar value
  • /array/[]/? β€” index scalar values inside an array

For write-heavy workloads with known query patterns, the opt-in strategy (exclude /*, include specific paths) saves the most write RU/s.

Index types

Range indexes (default)

Used for equality (=), range (>, <, >=, <=, BETWEEN), and ORDER BY on a single field. Created automatically for all included paths.

Spatial indexes

For geospatial queries (ST_DISTANCE, ST_WITHIN, ST_INTERSECTS):

{
  "includedPaths": [{ "path": "/*" }],
  "spatialIndexes": [
    {
      "path": "/location/*",
      "types": ["Point", "Polygon"]
    }
  ]
}

Composite indexes

Required for ORDER BY on multiple fields or for queries that filter AND sort on different properties:

{
  "compositeIndexes": [
    [
      { "path": "/deviceId", "order": "ascending" },
      { "path": "/timestamp", "order": "descending" }
    ],
    [
      { "path": "/sensorType", "order": "ascending" },
      { "path": "/value", "order": "ascending" }
    ]
  ]
}

When you need a composite index:

  • ORDER BY c.deviceId ASC, c.timestamp DESC β€” multi-field sort
  • WHERE c.sensorType = 'temp' ORDER BY c.timestamp DESC β€” filter + sort on different fields
πŸ’‘ Exam tip: composite index order matters

The order (ascending/descending) in a composite index must exactly match the ORDER BY clause in your query. If your query does ORDER BY c.a ASC, c.b DESC, you need a composite index with [a ASC, b DESC].

A composite index with [a ASC, b ASC] will NOT serve ORDER BY a ASC, b DESC. The exam loves testing this mismatch.

Indexing modes

ModeBehaviourWrite CostRead CostUse Case
ConsistentIndexes updated synchronously with writesHigher (index maintenance)Low (indexes always current)Default β€” most workloads
NoneNo indexing β€” container is a pure key-value storeLowest (no index overhead)High (full scan for non-point queries)Pure point read/write workloads only

Note: The lazy indexing mode has been deprecated. Only consistent and none are current options.

πŸ’‘ Exam tip: indexing mode 'none'

Setting indexing mode to none means no queries work except point reads (by id + partition key). Any SQL query will result in a full scan with massive RU cost. Only use this when your access pattern is exclusively point reads and writes β€” like a session store or cache.

Impact on write RU cost

Default policy (index everything):
  Write 1KB document with 20 properties β†’ ~10 RU

Optimised policy (index 3 properties):
  Write 1KB document with 20 properties β†’ ~6 RU

No indexing:
  Write 1KB document β†’ ~5 RU (minimum)

For Amara’s 500M daily writes, reducing from 10 to 6 RU per write saves 2 billion RU/day β€” a significant cost reduction.

🎬 Video walkthrough

🎬 Video coming soon

Indexing Policies β€” DP-420 Module 18

Indexing Policies β€” DP-420 Module 18

~16 min

Flashcards

Question

What is the default indexing policy in Cosmos DB?

Click or press Enter to reveal answer

Answer

All properties are indexed (includedPaths: /*) with consistent indexing mode. Only _etag is excluded by default. This ensures any query works out of the box but increases write RU cost.

Click to flip back

Question

When do you need a composite index?

Click or press Enter to reveal answer

Answer

For ORDER BY on multiple fields (e.g., ORDER BY a ASC, b DESC) or queries that filter on one property and sort on another. The composite index order must EXACTLY match the ORDER BY direction in the query.

Click to flip back

Question

What happens if you set indexing mode to 'none'?

Click or press Enter to reveal answer

Answer

No indexes are maintained. Only point reads (by id + partition key) work efficiently. SQL queries trigger full scans with high RU cost. Use only for pure key-value access patterns like session stores or caches.

Click to flip back

Question

What are spatial indexes used for?

Click or press Enter to reveal answer

Answer

Geospatial queries like ST_DISTANCE, ST_WITHIN, and ST_INTERSECTS. You must explicitly configure them with the geometry types (Point, Polygon, LineString, MultiPolygon) β€” they're not part of the default indexing policy.

Click to flip back

Knowledge Check

Knowledge Check

Amara's container writes 500M events/day with 20+ properties, but queries only filter on deviceId, timestamp, and sensorType. How should she optimise the indexing policy?

Knowledge Check

Amara's query is: SELECT * FROM c WHERE c.sensorType = 'temp' ORDER BY c.timestamp DESC. The query fails with an error. What's missing?

Knowledge Check

A composite index is defined as [a ASC, b ASC]. Which query will this index serve?


Next up: Request Units and Query Cost Optimization β€” understanding what drives RU consumption and five techniques to reduce query costs.

← Previous

Data Movement: ADF, Kafka, and Spark Connectors

Next β†’

Request Units and Query Cost Optimization

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.