Global Replication and Failover | Guided by A Guide to Cloud

Why go global?

Simple explanation

Imagine GlobeCart has warehouses in 12 countries. If all your product data sits in one warehouse in Sydney, customers in London wait for every request to fly halfway around the world.

Global replication copies your data to multiple Azure regions so reads (and optionally writes) happen close to each customer. If Sydney burns down, London keeps serving.

Jake’s scenario: taking GlobeCart global

🛒 Jake at GlobeCart has 50M products served from East US. European customers complain about 200ms+ latency on product searches. His plan:

Add West Europe and Southeast Asia as read replicas — product catalogue reads drop to under 10 ms locally.
Configure automatic failover — if East US goes down, West Europe takes over writes.
Keep a single write region (for now) — simpler consistency, lower cost.

Single-write vs multi-write architecture

Aspect	Single-Write Region	Multi-Write Regions
Write latency	Low in write region, high everywhere else	Low everywhere — writes go to nearest region
Availability SLA	99.99% (four nines)	99.999% (five nines)
Conflict resolution	Not needed — one writer	Required — Last-Writer-Wins or custom
Consistency options	All five levels available	Strong and bounded staleness NOT available
Cost	Standard RU/s × regions	Higher — every region serves writes
Complexity	Simple — one source of truth	Complex — conflict handling, eventual consistency

Automatic failover

Automatic failover kicks in when the write region experiences an outage. Cosmos DB promotes the next region in your priority list.

Failover priority (Jake's config):
  Priority 0: East US        ← current write region
  Priority 1: West Europe    ← first to promote if East US fails
  Priority 2: Southeast Asia ← second in line

Critical exam detail — the recovered region: When East US recovers after a failover, it does not automatically become the write region again. It rejoins as a read-only replica at the lowest priority. Jake must manually reprioritize if he wants East US back as the writer.

Exam tip: automatic failover timing

Automatic failover is not instant. Cosmos DB waits for a configurable detection period before triggering promotion. During this window, writes fail but reads from other regions continue.

The exam tests whether you know the write region goes down first, then failover promotes a read region — there’s always a brief write outage.

Manual failover

Manual failover is operator-triggered and guarantees zero data loss. Use it for:

Planned maintenance or region migration
Disaster recovery drills
Moving the write region closer to shifting traffic patterns

# Trigger manual failover — promote West Europe to write region
az cosmosdb failover-priority-change \
  --name globecart-cosmos \
  --resource-group rg-globecart \
  --failover-policies "westeurope=0" "eastus=1" "southeastasia=2"

// Verify current write region via SDK
AccountProperties account = await client.ReadAccountAsync();
foreach (AccountRegion region in account.WritableRegions)
{
    Console.WriteLine($"Write region: {region.Name}");
}

Service-managed vs automatic failover

Feature	Service-Managed Failover	Automatic Failover (customer-enabled)
Trigger	Azure detects prolonged outage	You enable it; Azure detects outage and acts
Control	Microsoft decides when to failover	You set priority list, Azure executes
Write downtime	Longer — at Microsoft's discretion	Shorter — detection triggers promotion
Data loss risk	Possible with async replication	Possible with async replication
Manual failover	Always zero data loss	Always zero data loss
Configuration	None — always on	Enable + set region priority order

Exam tip: SLA numbers you must know

99.99% — single-region or multi-region reads (single write region)
99.999% — multi-region writes enabled
99.99% for reads on any multi-region account regardless of write config
Manual failover = zero data loss guaranteed
Automatic failover = potential data loss (unacknowledged writes during detection window)

🎬 Video walkthrough

🎬 Video coming soon

Global Replication and Failover — DP-420 Module 12

~14 min

Flashcards

Question

After an automatic failover, what happens when the original write region recovers?

Click or press Enter to reveal answer

Answer

It rejoins as a READ-ONLY replica at the lowest failover priority. It does NOT automatically become the write region again. You must manually reprioritize to restore it as the writer.

Click to flip back

Question

What is the availability SLA difference between single-write and multi-write Cosmos DB?

Click or press Enter to reveal answer

Answer

Single-write: 99.99% (four nines). Multi-write: 99.999% (five nines). The extra nine comes from eliminating the single write region as a point of failure.

Click to flip back

Question

Does manual failover risk data loss?

Click or press Enter to reveal answer

Answer

No — manual failover guarantees zero data loss. It waits for all pending writes to replicate before switching. Automatic failover CAN lose unacknowledged writes because the outage is unplanned.

Click to flip back

Question

Can you add or remove Azure regions from a Cosmos DB account without downtime?

Click or press Enter to reveal answer

Answer

Yes — adding or removing regions is a zero-downtime operation. Cosmos DB transparently handles data replication to the new region or drains data from a removed region.

Click to flip back

Knowledge Check

GlobeCart's Cosmos DB account has automatic failover enabled with priority: East US (0), West Europe (1), Southeast Asia (2). East US experiences a prolonged outage and failover occurs. After East US recovers, what is the new write region?

Knowledge Check

Jake wants to move GlobeCart's write region from East US to West Europe during a planned maintenance window. Which approach guarantees zero data loss?

Knowledge Check

Which consistency levels are NOT available when multi-region writes are enabled?

Next up: Consistency Levels — the five consistency choices in Cosmos DB, their real trade-offs, and why this is the most heavily tested topic on the DP-420 exam.