Global Replication and Failover
Learn when and how to distribute Cosmos DB data globally, configure automatic failover policies, and perform manual failovers β including the critical detail that a recovered region returns as read-only.
Why go global?
Imagine GlobeCart has warehouses in 12 countries. If all your product data sits in one warehouse in Sydney, customers in London wait for every request to fly halfway around the world.
Global replication copies your data to multiple Azure regions so reads (and optionally writes) happen close to each customer. If Sydney burns down, London keeps serving.
Jakeβs scenario: taking GlobeCart global
π Jake at GlobeCart has 50M products served from East US. European customers complain about 200ms+ latency on product searches. His plan:
- Add West Europe and Southeast Asia as read replicas β product catalogue reads drop to under 10 ms locally.
- Configure automatic failover β if East US goes down, West Europe takes over writes.
- Keep a single write region (for now) β simpler consistency, lower cost.
Single-write vs multi-write architecture
| Aspect | Single-Write Region | Multi-Write Regions |
|---|---|---|
| Write latency | Low in write region, high everywhere else | Low everywhere β writes go to nearest region |
| Availability SLA | 99.99% (four nines) | 99.999% (five nines) |
| Conflict resolution | Not needed β one writer | Required β Last-Writer-Wins or custom |
| Consistency options | All five levels available | Strong and bounded staleness NOT available |
| Cost | Standard RU/s Γ regions | Higher β every region serves writes |
| Complexity | Simple β one source of truth | Complex β conflict handling, eventual consistency |
Automatic failover
Automatic failover kicks in when the write region experiences an outage. Cosmos DB promotes the next region in your priority list.
Failover priority (Jake's config):
Priority 0: East US β current write region
Priority 1: West Europe β first to promote if East US fails
Priority 2: Southeast Asia β second in line
Critical exam detail β the recovered region: When East US recovers after a failover, it does not automatically become the write region again. It rejoins as a read-only replica at the lowest priority. Jake must manually reprioritize if he wants East US back as the writer.
Exam tip: automatic failover timing
Automatic failover is not instant. Cosmos DB waits for a configurable detection period before triggering promotion. During this window, writes fail but reads from other regions continue.
The exam tests whether you know the write region goes down first, then failover promotes a read region β thereβs always a brief write outage.
Manual failover
Manual failover is operator-triggered and guarantees zero data loss. Use it for:
- Planned maintenance or region migration
- Disaster recovery drills
- Moving the write region closer to shifting traffic patterns
# Trigger manual failover β promote West Europe to write region
az cosmosdb failover-priority-change \
--name globecart-cosmos \
--resource-group rg-globecart \
--failover-policies "westeurope=0" "eastus=1" "southeastasia=2"
// Verify current write region via SDK
AccountProperties account = await client.ReadAccountAsync();
foreach (AccountRegion region in account.WritableRegions)
{
Console.WriteLine($"Write region: {region.Name}");
}
Service-managed vs automatic failover
| Feature | Service-Managed Failover | Automatic Failover (customer-enabled) |
|---|---|---|
| Trigger | Azure detects prolonged outage | You enable it; Azure detects outage and acts |
| Control | Microsoft decides when to failover | You set priority list, Azure executes |
| Write downtime | Longer β at Microsoft's discretion | Shorter β detection triggers promotion |
| Data loss risk | Possible with async replication | Possible with async replication |
| Manual failover | Always zero data loss | Always zero data loss |
| Configuration | None β always on | Enable + set region priority order |
Exam tip: SLA numbers you must know
- 99.99% β single-region or multi-region reads (single write region)
- 99.999% β multi-region writes enabled
- 99.99% for reads on any multi-region account regardless of write config
- Manual failover = zero data loss guaranteed
- Automatic failover = potential data loss (unacknowledged writes during detection window)
π¬ Video walkthrough
π¬ Video coming soon
Global Replication and Failover β DP-420 Module 12
Global Replication and Failover β DP-420 Module 12
~14 minFlashcards
Knowledge Check
GlobeCart's Cosmos DB account has automatic failover enabled with priority: East US (0), West Europe (1), Southeast Asia (2). East US experiences a prolonged outage and failover occurs. After East US recovers, what is the new write region?
Jake wants to move GlobeCart's write region from East US to West Europe during a planned maintenance window. Which approach guarantees zero data loss?
Which consistency levels are NOT available when multi-region writes are enabled?
Next up: Consistency Levels β the five consistency choices in Cosmos DB, their real trade-offs, and why this is the most heavily tested topic on the DP-420 exam.