Disaster Recovery Strategy for SAP
Design disaster recovery for SAP on Azure covering RPO/RTO concepts, cross-region paired regions, DR approaches including HSR async, Azure Site Recovery, backup-based recovery, and cost-optimized non-production DR.
HA vs DR β different problems, different solutions
π‘οΈ Lars draws a line on the whiteboard. βWe have spent three modules on high availability β protecting against component failures within a region. Now Dr. Schmidt asks: what if the entire Azure region goes down? An earthquake, a massive outage, a datacenter fire. HA within the region will not help.β
βοΈ Mei nods seriously. βThat is disaster recovery territory. DR is about surviving a regional catastrophe by having your systems ready to run in a completely different Azure region. The trade-offs are different β we are balancing recovery speed, data loss tolerance, and cost.β
Think of HA vs DR like a hospital analogy.
High availability is having a backup surgeon in the same hospital β if the lead surgeon gets sick, the backup steps in immediately. Disaster recovery is having a partnership with a hospital in another city. If your hospital floods, patients can be transferred to the partner hospital. It takes longer and you might lose some in-transit records, but the patients survive. The partner hospital costs money to maintain even when it is not treating your patients.
RPO and RTO
These two metrics drive every DR design decision:
RPO (Recovery Point Objective) β the maximum acceptable amount of data loss, measured in time. An RPO of 15 minutes means you can afford to lose the last 15 minutes of transactions. RPO determines your replication strategy.
RTO (Recovery Time Objective) β the maximum acceptable downtime before the system is operational again. An RTO of 4 hours means the DR system must be serving users within 4 hours of a disaster. RTO determines how βwarmβ your DR environment needs to be.
π‘οΈ Lars consults Dr. Schmidtβs requirements. βGlobalPharma needs RPO of 15 minutes for ERP and RTO of 4 hours. For non-production, RPO of 24 hours and RTO of 24 hours are acceptable.β
DR approaches for SAP on Azure
| Approach | RPO | RTO | Cost | Best for |
|---|---|---|---|---|
| HSR async (cross-region) | Minutes (near-zero lag) | 2-10 minutes (if pre-started) | High β full VM running in DR region | HANA database with strict RPO |
| Azure Site Recovery (ASR) | 5-15 minutes | Minutes to 1 hour | Low β replicated disks, no running VMs | Application servers, ASCS |
| Backup-based (GRS vault) | Hours (last backup interval) | Hours (restore + start) | Very low β only storage costs | Non-production, dev/test |
| ANF cross-region replication | RPO depends on replication schedule | Minutes to hours | Medium β replicated volumes | HANA shared and data volumes |
π Architecture diagram: Open the SAP Cross-Region DR diagram in Excalidraw to see the primary + DR region layout with HSR async and ASR replication flows.
HSR async for HANA DR
For the HANA database, asynchronous HSR to a DR region is the gold standard for low RPO:
- Replication runs continuously in the background
- The secondary in the DR region is a few seconds to minutes behind
- Can be combined with HA HSR (3-node topology: primary, HA secondary, DR tertiary)
- The DR secondary can be a smaller VM that is scaled up during an actual disaster
Azure Site Recovery (ASR) for application tier
ASR replicates VM disks to the DR region continuously:
- Application servers, ASCS VMs, and Web Dispatchers are excellent candidates
- ASR captures disk writes and replicates to the DR region (RPO 5-15 minutes)
- During disaster, ASR creates VMs from the replicated disks
- Recovery plans define the startup order (HANA first, then ASCS, then app servers)
- No running VMs in the DR region until failover β very cost-effective
Backup-based DR
For non-production systems or systems with relaxed RTOs:
- Azure Backup with geo-redundant vaults stores backups in a paired region
- During disaster, restore VMs and databases from backup
- RPO equals the last backup interval (could be hours)
- RTO is hours (restore time + system startup)
- The most cost-effective approach but slowest recovery
Choosing DR per tier
A common pattern is to use different DR approaches for different tiers:
- HANA database: HSR async to DR region (lowest RPO, fastest RTO)
- ASCS/SCS: ASR with recovery plan (moderate RPO, fast RTO)
- Application servers: ASR or rebuild from automation (moderate RPO, moderate RTO)
- Non-production systems: Backup-based recovery only (high RPO, high RTO, lowest cost)
Exam tip: Mixed DR strategies
The exam loves scenarios where you need to choose different DR approaches for different tiers based on RPO/RTO requirements and cost constraints. The answer is almost always a mix: HSR async for HANA, ASR for the app tier, and backup-based for non-prod. Pure backup-based for everything is too slow for production HANA. Pure HSR for everything is too expensive for app servers.
Azure paired regions
Azure paired regions are geographically separated datacenter pairs designed for DR:
- Region pairs are at least 300 miles apart
- Platform updates roll through pairs sequentially (never both at once)
- Data residency is maintained within the same geography
- Some Azure services (GRS, ASR) replicate to the paired region by default
For SAP DR, always design your DR region to be the paired region of your primary region unless regulatory requirements dictate otherwise.
Cost considerations
DR infrastructure can be expensive. Strategies to reduce cost:
- Do not run DR VMs until needed β use ASR instead of running idle VMs
- Right-size DR VMs β the DR HANA VM can be smaller, scaled up during disaster declaration
- Reserved Instances do not apply to DR VMs β unless you run them continuously
- Non-production systems do not need hot DR β backup-based recovery is sufficient
- Regular DR testing is a cost event β plan for it in the budget
Knowledge check
GlobalPharma needs RPO of 15 minutes for their production HANA database in DR. Which approach should Lars implement?
Lars wants to minimize DR costs for SAP application servers while maintaining an RTO of 1 hour. What should he use?
What is the recommended DR approach for PrecisionSteel's non-production SAP development system?
Summary
You now understand DR strategy for SAP on Azure: RPO drives replication choices, RTO drives standby readiness, and cost drives how much infrastructure you maintain in the DR region. HSR async for HANA, ASR for application servers, and backup-based for non-prod is the standard pattern. Azure paired regions provide the geographic separation.
Next, we implement these strategies: setting up HANA multi-tier replication, configuring ASR recovery plans, and testing DR procedures.
π¬ Video coming soon