πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901 aws-aif-c01
Guided DP-420 Domain 3
Domain 3 β€” Module 3 of 3 100%
17 of 28 overall

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review

DP-420 Study Guide

Domain 1: Design and Implement Data Models

  • Cosmos DB β€” The Big Picture Free
  • Designing Your Data Model Free
  • Partition Key Strategy Free
  • Synthetic and Hierarchical Partition Keys Free
  • Relationships β€” Embedding vs Referencing Free
  • SDK Connectivity and Client Configuration Free
  • SDK CRUD Operations and Transactions Free
  • SQL Queries in Cosmos DB Free
  • SDK Query Pagination and LINQ Free
  • Server-Side Programming Free
  • Transactions in Practice Free

Domain 2: Design and Implement Data Distribution

  • Global Replication and Failover
  • Consistency Levels: Five Choices, Real Trade-Offs
  • Multi-Region Writes and Conflict Resolution

Domain 3: Integrate and Move Data

  • Change Feed with Azure Functions and Processors
  • Analytical Workloads: Synapse Link and Fabric Mirroring
  • Data Movement: ADF, Kafka, and Spark Connectors

Domain 4: Optimize Query and Operation Performance

  • Indexing Policies: Range, Spatial, and Composite
  • Request Units and Query Cost Optimization
  • Integrated Cache and Dedicated Gateway
  • Change Feed Patterns: Materialized Views and Estimator

Domain 5: Maintain an Azure Cosmos DB Solution

  • Monitoring: Metrics, Logs, and Alerts
  • Backup and Restore: Periodic vs Continuous
  • Network Security: Firewalls, VNets, and Private Endpoints
  • Data Security: Encryption, Keys, and RBAC
  • Cost Optimization: Throughput Modes and RU Strategy
  • DevOps: Infrastructure as Code and Deployments
  • Exam Strategy and Cross-Domain Review
Domain 3: Integrate and Move Data Premium ⏱ ~14 min read

Data Movement: ADF, Kafka, and Spark Connectors

Choose and implement the right data movement strategy for Cosmos DB β€” Azure Data Factory copy activities, Kafka connectors, Spark connectors, bulk executor, and live migration patterns.

Moving data in and out of Cosmos DB

β˜• Simple explanation

Think of Cosmos DB as a warehouse. Sometimes you need to bring stock in (migrate from an old database), ship stock out (feed analytics), or continuously transfer between warehouses (streaming). Different tools are like different trucks β€” you pick the one that fits the load.

Azure Data Factory is the general-purpose truck. Kafka is the conveyor belt for real-time streams. Spark is the heavy-lifter for big-data processing.

Cosmos DB supports multiple data movement patterns:

  • Batch import/export: Azure Data Factory, bulk executor library
  • Real-time streaming: Kafka connector (source and sink)
  • Big-data processing: Spark connector (read, write, streaming)
  • Live migration: Bulk copy + change feed for zero-downtime cutover

Amara’s data integration challenge

πŸ“‘ Amara at SensorFlow has three data movement needs:

  1. Migrate 2TB of historical sensor data from MongoDB to Cosmos DB
  2. Stream real-time sensor events from Kafka into Cosmos DB
  3. Export daily aggregates to a data lake for TomΓ‘s’s ML models

Each need calls for a different tool.

Choosing the right tool

ToolPatternThroughputCoding RequiredBest For
Azure Data FactoryBatch copy, scheduled pipelinesHigh (parallel copy)No-code/low-codeETL/ELT, scheduled migrations, cross-service copy
Kafka ConnectorReal-time streaming (source + sink)Very high (continuous)Configuration + some codeEvent streaming, CDC, real-time integration
Spark ConnectorBatch + streaming read/writeVery high (distributed)Spark code (Scala/Python)Big-data processing, ML pipelines, complex transforms
Bulk ExecutorHigh-speed batch importMaximum (SDK-level)C#/Java codeInitial data load, one-time migration
Data Migration ToolOne-time importModerateNo-code (GUI/CLI)Simple one-time migrations from JSON, CSV, MongoDB, SQL

Azure Data Factory copy activity

ADF is the go-to for scheduled, no-code data movement:

{
  "type": "Copy",
  "source": {
    "type": "MongoDbV2Source",
    "query": "{ 'timestamp': { '$gte': '2024-01-01' } }"
  },
  "sink": {
    "type": "CosmosDbSqlApiSink",
    "writeBehavior": "upsert",
    "writeBatchSize": 500
  },
  "settings": {
    "parallelCopies": 8,
    "dataIntegrationUnits": 32
  }
}

Key ADF settings for Cosmos DB:

  • writeBehavior: insert or upsert (upsert is safer for retries)
  • writeBatchSize: Number of documents per batch (default 10, max 200 for insert)
  • parallelCopies: Number of concurrent write threads
  • Throughput impact: ADF writes consume container RU/s β€” monitor 429 errors
πŸ’‘ Exam tip: ADF and RU consumption

ADF copy activities consume your container’s provisioned RU/s. If you run a large migration during peak hours, you can starve your application. Best practices:

  • Run migrations during off-peak hours
  • Temporarily increase RU/s (autoscale helps)
  • Use ADF’s write batch size and parallel copies settings to throttle
  • Monitor the 429 TooManyRequests metric during migration

Kafka connector

The Cosmos DB Kafka connector supports both source (read from change feed) and sink (write to Cosmos DB):

# Kafka sink connector configuration
name=cosmosdb-sink
connector.class=com.azure.cosmos.kafka.connect.sink.CosmosDBSinkConnector
topics=sensor-readings
connect.cosmos.connection.endpoint=https://sensorflow-cosmos.documents.azure.com:443/
connect.cosmos.master.key=<key>
connect.cosmos.databasename=sensorflow
connect.cosmos.containers.topicmap=sensor-readings#readings

Source connector uses the change feed to stream Cosmos DB changes into Kafka topics β€” useful for building event-driven architectures where downstream systems subscribe to data changes.

Spark connector

For big-data scenarios, the Spark connector reads and writes directly:

# Read from Cosmos DB in PySpark
df = spark.read.format("cosmos.oltp") \
    .option("spark.cosmos.accountEndpoint", endpoint) \
    .option("spark.cosmos.accountKey", key) \
    .option("spark.cosmos.database", "sensorflow") \
    .option("spark.cosmos.container", "readings") \
    .load()

# Transform and write back
daily_agg = df.groupBy("deviceId", "date").agg(avg("temperature"))

daily_agg.write.format("cosmos.oltp") \
    .option("spark.cosmos.accountEndpoint", endpoint) \
    .option("spark.cosmos.accountKey", key) \
    .option("spark.cosmos.database", "sensorflow") \
    .option("spark.cosmos.container", "daily-aggregates") \
    .option("spark.cosmos.write.strategy", "ItemOverwrite") \
    .mode("append") \
    .save()

Live migration pattern (zero downtime)

For migrating from an existing database to Cosmos DB without downtime:

Phase 1: Bulk copy (historical data)
  Source DB ──[ADF/Spark bulk copy]──→ Cosmos DB

Phase 2: Change data capture (ongoing changes)
  Source DB ──[CDC stream]──→ Cosmos DB
  (Runs in parallel with Phase 1, catches changes during copy)

Phase 3: Cutover
  - Verify data consistency between source and target
  - Switch application connection string to Cosmos DB
  - Keep CDC running briefly to catch final stragglers
  - Decommission source database
πŸ’‘ Exam tip: migration validation

The exam may ask about validating a migration. Key checks:

  • Document count comparison between source and target
  • Spot-check specific documents for data integrity
  • Application testing against the new Cosmos DB instance before cutover
  • Throughput monitoring β€” ensure provisioned RU/s can handle production traffic

🎬 Video walkthrough

🎬 Video coming soon

Data Movement β€” DP-420 Module 17

Data Movement β€” DP-420 Module 17

~14 min

Flashcards

Question

What are the two writeBehavior options in ADF's Cosmos DB sink?

Click or press Enter to reveal answer

Answer

'insert' (fails on duplicate id+partition key) and 'upsert' (insert or replace). Upsert is safer for retries because re-running a failed pipeline won't cause duplicate key errors.

Click to flip back

Question

How does the Kafka source connector read data from Cosmos DB?

Click or press Enter to reveal answer

Answer

It uses the Cosmos DB change feed to stream inserts and updates into Kafka topics. This provides real-time CDC (change data capture) from Cosmos DB to any Kafka consumer.

Click to flip back

Question

What are the three phases of a zero-downtime live migration to Cosmos DB?

Click or press Enter to reveal answer

Answer

1) Bulk copy β€” move historical data using ADF or Spark. 2) Change data capture β€” stream ongoing changes in parallel. 3) Cutover β€” verify consistency, switch the app, catch stragglers, decommission the source.

Click to flip back

Knowledge Check

Knowledge Check

Amara needs to migrate 2TB of historical data from MongoDB to Cosmos DB with minimal coding. Which tool is best?

Knowledge Check

SensorFlow runs an ADF migration job that triggers frequent 429 errors. What should Amara do?

Knowledge Check

Amara needs real-time, continuous data flow from Cosmos DB to downstream consumers. Which approach is most appropriate?


Next up: Indexing Policies β€” how to tune Range, Spatial, and Composite indexes to optimise query performance and reduce write costs.

← Previous

Analytical Workloads: Synapse Link and Fabric Mirroring

Next β†’

Indexing Policies: Range, Spatial, and Composite

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.