πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-420 Domain 5
Domain 5 β€” Module 1 of 7 14%
22 of 28 overall

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review
Domain 5: Maintain an Azure Cosmos DB Solution Premium ⏱ ~16 min read

Monitoring: Metrics, Logs, and Alerts

Monitor Cosmos DB health with key metrics (NormalizedRUConsumption, TotalRequests, ServerSideLatency), diagnostic logs, Azure Monitor workbooks, and alert rules for proactive issue detection.

Why monitoring matters

β˜• Simple explanation

Think of monitoring as the dashboard gauges in your car. You need to know when the engine is overheating (throttling), when you’re running low on fuel (RU/s budget), and when something unusual happens (error spikes). Without gauges, you’re driving blind.

Cosmos DB exposes metrics, diagnostic logs, and insights through Azure Monitor. Effective monitoring enables:

  • Proactive alerting: Catch throttling (429s) and latency spikes before users notice.
  • Cost optimisation: Track RU consumption patterns to right-size throughput.
  • Performance tuning: Identify expensive queries and hot partitions.
  • Compliance: Audit logs for access patterns and operations.

Marcus’s monitoring mission

βš™οΈ Marcus at FinSecure runs Cosmos DB in three production environments with SOC 2 compliance requirements. He needs to:

  • Detect 429 throttling within minutes
  • Track query performance trends
  • Maintain audit logs for compliance
  • Get alerted when latency exceeds SLA thresholds

Key metrics

MetricWhat It Tells YouAlert Threshold
NormalizedRUConsumption% of provisioned RU/s consumed (0-100%)Alert at >70% sustained
TotalRequestsNumber of requests (including 429s)Alert on sudden spikes
TotalRequestUnitsTotal RU consumedTrack for cost trends
ServerSideLatencyBackend processing time (P50, P99)Alert at >10ms P99
AvailableStorageRemaining storage per partitionAlert at >80% used
MetadataRequestsControl plane operationsUnusual spikes may indicate config issues
πŸ’‘ Exam tip: NormalizedRUConsumption is the key metric

NormalizedRUConsumption is the most important metric for detecting throughput issues. It shows the percentage of provisioned RU/s consumed across all partition key ranges. When it hits 100%, requests get throttled (429 errors).

Key details:

  • It’s per-physical-partition β€” a hot partition can show 100% while the account average is 30%
  • Sustained >70% means you should consider increasing throughput or enabling autoscale
  • The exam often presents this metric in scenarios asking β€œwhat should you monitor for throttling?”

Diagnostic logging

Enable diagnostic logs to capture detailed operation data:

az monitor diagnostic-settings create \
  --name "cosmos-diagnostics" \
  --resource "/subscriptions/.../providers/Microsoft.DocumentDB/databaseAccounts/finsecure-cosmos" \
  --workspace "/subscriptions/.../workspaces/finsecure-logs" \
  --logs '[
    {"category": "QueryRuntimeStatistics", "enabled": true},
    {"category": "DataPlaneRequests", "enabled": true},
    {"category": "PartitionKeyStatistics", "enabled": true},
    {"category": "ControlPlaneRequests", "enabled": true}
  ]'
Log CategoryWhat It Captures
DataPlaneRequestsEvery read, write, query with RU cost and latency
QueryRuntimeStatisticsQuery execution details, index utilisation, scan counts
PartitionKeyStatisticsTop partition keys by storage and RU consumption
PartitionKeyRUConsumptionRU consumption broken down by partition key
ControlPlaneRequestsAccount-level operations (create, delete, scale)

KQL query examples

Query your diagnostic logs with Kusto Query Language (KQL) in Log Analytics:

// Find the most expensive queries in the last hour
CDBDataPlaneRequests
| where TimeGenerated > ago(1h)
| where OperationName == "Query"
| summarize AvgRU = avg(RequestCharge), MaxRU = max(RequestCharge),
            Count = count() by tostring(QueryText)
| order by AvgRU desc
| take 10
// Detect 429 throttling events
CDBDataPlaneRequests
| where TimeGenerated > ago(24h)
| where StatusCode == 429
| summarize ThrottledCount = count() by bin(TimeGenerated, 5m), CollectionName
| order by TimeGenerated desc
// Identify hot partitions
CDBPartitionKeyRUConsumption
| where TimeGenerated > ago(1h)
| summarize TotalRU = sum(RequestCharge) by PartitionKey
| order by TotalRU desc
| take 10

Azure Monitor workbooks

Azure provides built-in Cosmos DB insights β€” pre-built dashboards in the Azure portal:

  • Overview: Throughput, requests, storage, availability at a glance
  • Throughput: NormalizedRUConsumption with partition-level breakdown
  • Requests: Status code distribution, latency percentiles
  • Storage: Per-partition storage usage, index size
  • Failures: 429 rates, timeout rates, error categorisation

Diana’s tip: πŸ” Diana, Marcus’s security auditor, uses the ControlPlaneRequests log category to track who made configuration changes β€” required for SOC 2 audit trails.

Alert rules

# Alert when NormalizedRUConsumption exceeds 80% for 5 minutes
az monitor metrics alert create \
  --name "cosmos-throttle-warning" \
  --resource "/subscriptions/.../databaseAccounts/finsecure-cosmos" \
  --condition "avg NormalizedRUConsumption > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action "/subscriptions/.../actionGroups/ops-team"
AlertConditionSeverity
Throttling warningNormalizedRUConsumption > 80% for 5minWarning
Throttling criticalNormalizedRUConsumption > 95% for 5minCritical
Latency spikeServerSideLatency P99 > 20ms for 10minWarning
Storage warningAvailableStorage < 20%Warning
Error rateHTTP 5xx count > 10 in 5minCritical

🎬 Video walkthrough

🎬 Video coming soon

Monitoring Cosmos DB β€” DP-420 Module 22

Monitoring Cosmos DB β€” DP-420 Module 22

~16 min

Flashcards

Question

What is NormalizedRUConsumption and why is it the most important metric?

Click or press Enter to reveal answer

Answer

It shows the percentage (0-100%) of provisioned RU/s consumed across partition key ranges. At 100%, requests are throttled (429 errors). It's per-physical-partition, so a hot partition can be at 100% while the overall average is lower. Alert at >70% sustained.

Click to flip back

Question

Which diagnostic log category captures query execution details?

Click or press Enter to reveal answer

Answer

QueryRuntimeStatistics β€” it logs query execution details including index utilisation, scan counts, and query text. DataPlaneRequests captures every operation's RU cost and latency. Both are needed for comprehensive query performance monitoring.

Click to flip back

Question

How do you identify hot partitions in Cosmos DB?

Click or press Enter to reveal answer

Answer

Use the PartitionKeyRUConsumption diagnostic log or the PartitionKeyStatistics log category. Query with KQL to find partition keys consuming the most RU/s. NormalizedRUConsumption shows per-partition metrics in Azure Monitor.

Click to flip back

Knowledge Check

Knowledge Check

Marcus notices NormalizedRUConsumption at 95% sustained for his orders container. What should he do first?

Knowledge Check

Diana needs an audit trail of who changed the Cosmos DB account configuration for SOC 2 compliance. Which diagnostic log category should she enable?

Knowledge Check

Marcus wants to find the top 10 most expensive queries in the last hour. Which tool and approach should he use?


Next up: Backup and Restore β€” choosing between periodic and continuous backup, configuring policies, and point-in-time restore.

← Previous

Change Feed Patterns: Materialized Views and Estimator

Next β†’

Backup and Restore: Periodic vs Continuous

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.