Disaster Recovery Implementation | Guided by A Guide to Cloud

Putting the DR plan into action

🛡️ Lars spreads the DR architecture diagram across the table. “We know the strategy: HSR async for HANA, ASR for the app tier, backup for non-prod. Now we need to build it. Dr. Schmidt wants a documented, tested DR solution — not just a plan on paper.”

☁️ Mei rolls up her sleeves. “Let me walk you through each implementation. The trickiest part is HANA multi-tier replication — combining HA and DR into a single replication chain.”

Simple explanation

Think of it like a chain of messengers.

The primary HANA database (Node 1) sends every message to the local backup (Node 2) instantly and waits for confirmation (synchronous — for HA). Node 2 then forwards the message to a distant backup (Node 3) without waiting for confirmation (asynchronous — for DR). If Node 1 fails, Node 2 takes over locally. If the entire city floods, Node 3 in another city has almost all the messages. Three nodes, two replication links, two different speeds.

HANA multi-tier replication (3-node topology)

The most robust HANA DR architecture uses three nodes:

Node 1 (Primary, Region A, Zone 1) — active production instance. Replicates synchronously to Node 2.

Node 2 (HA Secondary, Region A, Zone 2) — local HA standby. Receives synchronous replication from Node 1. If Node 1 fails, Node 2 becomes primary (automated via Pacemaker). Also replicates asynchronously to Node 3.

Node 3 (DR Tertiary, Region B) — remote DR standby. Receives asynchronous replication from Node 2. If Region A is lost entirely, Node 3 is manually promoted to primary.

HANA multi-tier replication topology
Aspect	Node 1 to Node 2 (HA)	Node 2 to Node 3 (DR)
Replication mode	Synchronous (SYNC/SYNCMEM)	Asynchronous (ASYNC)
RPO	Zero	Minutes (depends on lag)
Failover	Automatic (Pacemaker)	Manual promotion
Region	Same region (different zone)	Different region (paired)
VM sizing	Same size as primary	Can be smaller (scale up for DR)
Network	Low latency (intra-region)	Higher latency tolerated (inter-region)

Exam tip: 3-node topology is the gold standard

When the exam describes a scenario needing both HA and DR for HANA, the answer is the 3-node multi-tier topology. Remember: sync for HA (local), async for DR (remote). The DR failover is always manual — you do not want automatic failover to a remote region based on a network blip.

After local failover

When Node 1 fails and Node 2 takes over locally:

Node 2 becomes the new primary
The async replication to Node 3 continues from Node 2
When Node 1 is recovered, it re-registers as the secondary to Node 2
No data loss, no DR impact

After regional failover

When Region A is completely lost:

Confirm Region A is genuinely unavailable (not a transient issue)
Promote Node 3 to primary in Region B
Start application servers in Region B (via ASR or automation)
Update DNS to point to Region B
Node 3 is now the standalone primary until Region A recovers

ASR recovery plans for the application tier

Azure Site Recovery recovery plans define the failover sequence for the entire SAP landscape:

Group 1 — HANA database (manual step: confirm HSR failover or promote DR node)

Group 2 — ASCS/SCS (ASR failover: create ASCS VM from replicated disks)

Group 3 — Application servers (ASR failover: create app server VMs from replicated disks)

Group 4 — Web Dispatcher (ASR failover: create Web Dispatcher from replicated disks)

Each group waits for the previous group to complete before starting. Custom scripts can be added between groups (e.g., update DNS, configure load balancer, start SAP services).

🛡️ Lars nods. “So the recovery plan ensures HANA is up before ASCS tries to connect, and ASCS is up before the application servers start looking for it.”

☁️ Mei confirms. “Exactly. SAP has a strict startup order. Recovery plans enforce it automatically.”

Custom scripts in recovery plans

ASR recovery plans support pre-action and post-action scripts (PowerShell or Azure Automation runbooks). Common SAP scripts include: updating DNS records, configuring Azure Load Balancer in the DR region, starting SAP services after VM boot, and running post-failover health checks. These scripts are critical for a fully automated DR failover.

Backup-based DR

For non-production systems or as a last resort for production:

Azure Backup with GRS vaults — backups are replicated to the paired region automatically
Cross-Region Restore — restore VMs and databases in the DR region directly from the GRS vault
HANA backup via Backint can be restored to a new VM in the DR region
RPO equals the last backup interval (hours)
RTO is hours (restore time + system startup + configuration)

ANF cross-region replication

Azure NetApp Files supports cross-region replication for volumes:

Replicates ANF volumes to a paired region on a schedule (10-minute, hourly, or daily)
Useful for HANA shared volumes, transport directories, and backup staging
The destination volume is read-only until failover
During DR, break the replication and make the destination volume read-write
Complements HSR — use ANF replication for /hana/shared and HSR for database data

DR testing

🛡️ Lars insists. “An untested DR plan is not a plan — it is a wish. How do we test without disrupting production?”

Testing approaches:

ASR test failover — creates VMs in an isolated VNet in the DR region without affecting production replication
HANA DR node test — temporarily promote the DR node, verify data consistency, then re-register as secondary
Tabletop exercise — walk through the runbook with all stakeholders without executing
Full DR drill — execute the complete failover procedure on a scheduled maintenance window

Document everything:

Failover time (actual RTO achieved)
Data consistency verification (actual RPO verified)
Client reconnection behavior
Issues encountered and resolutions
Update the runbook based on findings

Question

What is the HANA 3-node multi-tier topology for HA + DR?

Click or press Enter to reveal answer

Answer

Node 1 (primary) replicates synchronously to Node 2 (HA secondary, same region). Node 2 replicates asynchronously to Node 3 (DR tertiary, remote region). HA failover is automatic via Pacemaker. DR failover is manual. This provides RPO=0 for local failures and near-zero RPO for regional disasters.

Click to flip back

Question

What is the correct startup order in an ASR recovery plan for SAP?

Click or press Enter to reveal answer

Answer

Group 1: HANA database (manual HSR promotion). Group 2: ASCS/SCS. Group 3: Application servers. Group 4: Web Dispatcher. Each group completes before the next starts. SAP requires this order because each layer depends on the one below it.

Click to flip back

Question

How does ASR test failover work without disrupting production?

Click or press Enter to reveal answer

Answer

ASR test failover creates VMs from replicated disks in an isolated VNet in the DR region. Production replication continues unaffected. The test VMs are completely isolated from production and can be deleted after testing. This allows DR validation without any production impact.

Click to flip back

Knowledge check

Knowledge Check

GlobalPharma needs both HA (RPO=0) and DR (RPO less than 15 minutes) for HANA. What topology should Lars implement?

Knowledge Check

Mei is creating an ASR recovery plan for SAP. What should be the correct order of failover groups?

Knowledge Check

Lars wants to test DR without impacting production. Which method should he use?

Summary

You have now completed Domain 3. You can implement the full HA/DR stack: HANA 3-node multi-tier replication for combined HA and DR, ASR recovery plans with ordered startup for the application tier, backup-based DR for non-production, ANF cross-region replication for shared volumes, and DR testing procedures. Combined with the ASCS HA and HANA HSR from earlier modules, your SAP system is resilient against both component failures and regional disasters.

Next up is Domain 4: keeping the SAP system running day after day. We will cover monitoring, backup, security, cost optimization, and ongoing operations.

🎬 Video coming soon