Azure Site Recovery & Disaster Recovery | Guided by A Guide to Cloud

What is Azure Site Recovery?

Simple explanation

Azure Site Recovery (ASR) is like having a complete duplicate of your office in another city. If a fire destroys the main office, everyone drives to the backup office and keeps working.

Backup protects your data (files, databases). Site Recovery protects your entire infrastructure — VMs, networking, applications. It continuously replicates your VMs to another Azure region. If the primary region goes down, you “fail over” to the secondary region. Your VMs come up there, and business continues.

The key difference: backup is about recovering data. Site Recovery is about recovering entire workloads — often within minutes.

Site Recovery vs Azure Backup

Backup and Site Recovery complement each other — most organisations need both
Feature	Azure Backup	Azure Site Recovery
Purpose	Protect data (restore files, databases, VMs)	Protect workloads (replicate and failover entire environments)
Scope	Individual resources	Entire application stacks across regions
Recovery speed	Minutes to hours (depends on data size)	Minutes (VMs already replicated)
Protection against	Data corruption, accidental deletion, ransomware	Region-wide outages, site-level disasters
Data freshness	Point-in-time (last backup)	Near real-time (continuous replication)
Key metric	RPO: hours to days (backup frequency)	RPO: seconds to minutes (replication lag)
Vault type	Recovery Services or Backup vault	Recovery Services vault only

RPO and RTO

Two critical metrics define your disaster recovery capabilities:

Metric	Definition	Example
RPO (Recovery Point Objective)	Maximum acceptable data loss (measured in time)	RPO of 15 minutes means you can lose up to 15 minutes of data
RTO (Recovery Time Objective)	Maximum acceptable downtime before recovery	RTO of 1 hour means the workload must be running within 1 hour

ASR typically achieves:

RPO: Seconds to minutes (continuous replication)
RTO: Minutes to an hour (depending on VM count and recovery plan complexity)

Exam tip: RPO vs RTO

RPO answers “how much data can we afford to lose?” and RTO answers “how long can we be down?” The exam often presents scenarios where you need to choose a solution based on these requirements. If a scenario needs RPO under 1 hour and RTO under 15 minutes, Site Recovery is the answer — not backup.

Replication architecture

When you enable Site Recovery for an Azure VM, here’s what gets created:

Source region (where your VMs run):

Original VMs, disks, and networking
Cache storage account (stages replication data before sending to target)

Target region (where VMs fail over to):

Recovery Services vault (manages replication and failover)
Replica managed disks (mirrors of source disks)
Target VNet, subnets, and NSGs (can be auto-created or you pre-create them)
Availability set or zone configuration (matching source)

Replication flow: VM writes data to disk, the Azure Site Recovery extension captures changes, data is sent to the cache storage account, and then replicated to managed disks in the target region.

Real-world: Meridian Financial's DR setup

Meridian Financial runs their core banking application in Australia East (primary). Alex configures Site Recovery to replicate to Australia Southeast (secondary):

15 VMs across web, app, and database tiers — all replicated
A recovery plan groups VMs into tiers: databases start first, then app servers, then web servers
Custom scripts in the recovery plan update DNS records and reconfigure load balancers
Test failover runs quarterly (no production impact)
RPO: under 5 minutes. RTO: under 30 minutes.

Meridian’s compliance team signs off because ASR meets their regulatory requirement of sub-1-hour recovery.

Failover types

Failover Type	When Used	Production Impact
Test failover	DR drills and validation	None — creates VMs in an isolated network; production unaffected
Planned failover	Known event (e.g., planned maintenance in source region)	Minimal — replication ensures zero data loss
Unplanned failover	Actual disaster (region outage)	Some data loss possible (up to latest recovery point)

Test failover

Test failover is critical — it validates your DR plan without affecting production:

Select a recovery point
Choose an isolated VNet (not your production network)
Azure creates replica VMs in the target region
Validate the application works correctly
Clean up (delete the test VMs)

Exam tip: Test failover doesn't affect production

Test failover creates VMs in an isolated virtual network in the target region. It does NOT affect production VMs, replication, or any live workloads. After testing, you clean up the test VMs. The exam expects you to know that test failovers are non-disruptive and should be performed regularly.

Failback

After failing over to the secondary region, you eventually want to return to the primary region. This process is called failback:

Re-protect — reverse replication from secondary back to primary
Wait for replication to synchronise
Planned failover — fail back to the primary region with zero data loss
Re-protect again — resume normal replication from primary to secondary

Recovery plans

Recovery plans orchestrate multi-VM failover with ordering and automation:

Features:

Group VMs into tiers (e.g., Group 1: databases, Group 2: app servers, Group 3: web servers)
Groups fail over in order — Group 1 completes before Group 2 starts
Add manual actions (pause for verification between groups)
Add scripts (Azure Automation runbooks for DNS updates, load balancer configuration)

Question

What is the difference between RPO and RTO?

Click or press Enter to reveal answer

Answer

RPO (Recovery Point Objective) is the maximum acceptable data loss, measured in time — how old can your last recovery point be? RTO (Recovery Time Objective) is the maximum acceptable downtime — how quickly must services be restored? Site Recovery typically achieves RPO of seconds to minutes and RTO of minutes.

Click to flip back

Question

Does a test failover affect production workloads?

Click or press Enter to reveal answer

Answer

No. Test failover creates replica VMs in an isolated virtual network in the target region. Production VMs continue running, and replication is not interrupted. After validation, you clean up the test VMs. Test failovers should be performed regularly to validate your DR plan.

Click to flip back

Question

What vault type does Azure Site Recovery use?

Click or press Enter to reveal answer

Answer

Recovery Services vault. ASR only works with Recovery Services vaults — not Backup vaults. The vault is typically created in the target (secondary) region and manages replication, failover, and recovery point management.

Click to flip back

Question

What is the purpose of a cache storage account in Site Recovery?

Click or press Enter to reveal answer

Answer

The cache storage account in the source region stages replication data before it is sent to the target region. VM disk changes are captured, written to the cache account, and then asynchronously replicated to managed disks in the target region.

Click to flip back

Knowledge check

Knowledge Check

TechCorp Solutions needs to ensure their production web application can recover within 30 minutes if the entire Australia East region goes down. Data loss of up to 5 minutes is acceptable. Which solution should Alex implement?

Knowledge Check

Meridian Financial wants to validate their disaster recovery plan without affecting production. Which Site Recovery operation should they perform?

Knowledge Check

After a successful failover to the secondary region, Alex needs to return workloads to the primary region. What is the correct first step?

🎬 Video coming soon