HANA System Replication for HA

Protecting the HANA database

🛡️ Lars points to the architecture diagram. “We have ASCS protected. Now the database. If our HANA VM fails, GlobalPharma loses access to all data. Dr. Schmidt wants automatic failover — no manual intervention, no 30-minute restart from disk.”

☁️ Mei draws two HANA nodes. “That is exactly what HANA System Replication gives us. HSR continuously copies data from the primary HANA instance to a secondary instance on a separate VM. If the primary fails, Pacemaker triggers failover and the secondary becomes the new primary — usually within minutes.”

Simple explanation

Think of it like two identical notebooks.

You write every transaction in your primary notebook. A colleague sits next to you, copying every entry into their notebook in real time (synchronous replication). If you suddenly cannot continue, your colleague already has an identical copy and takes over immediately. With the “read-enabled” option, your colleague can even answer questions from their notebook while they are still copying — so the backup is not just sitting idle.

📐 Architecture diagram: Open the HANA HA Cluster diagram in Excalidraw to see the full Pacemaker cluster with HSR, Azure Load Balancer, and STONITH fencing.

HSR replication modes for HA

HANA System Replication supports multiple modes. For HA, you need to know:

SYNC (synchronous) — the primary waits for the secondary to acknowledge the write before committing. Guarantees zero data loss (RPO=0). Recommended for HA.

SYNCMEM (synchronous in-memory) — the primary waits for the secondary to receive the data in memory (but not persist to disk). Slightly faster than SYNC with RPO=0 when only the primary fails. Note: in the unlikely event of a simultaneous dual-node failure, in-flight data that has not been persisted on the secondary could be lost.

ASYNC (asynchronous) — the primary does not wait for the secondary. Used for DR across regions where latency makes synchronous impractical. Potential data loss (RPO > 0).

HSR replication modes
Feature	SYNC	SYNCMEM	ASYNC
Primary waits for secondary	Yes — write to disk	Yes — write to memory	No — fire and forget
Data loss on failover (RPO)	Zero	Zero when only the primary fails (simultaneous dual failure could lose in-flight data)	Possible — depends on lag
Performance impact	Higher latency on writes	Moderate latency	Minimal impact on primary
Network requirement	Low latency (same region/zone)	Low latency (same region/zone)	Tolerates higher latency (cross-region)
Use case	HA within a region	HA within a region (alternative to SYNC)	DR across regions
Exam recommendation	Primary choice for HA	Know it exists as HA alternative	Know it for DR scenarios

Exam tip: SYNC + memory preload for HA

When the exam asks about HANA HA configuration, the answer is synchronous replication (SYNC or SYNCMEM) with memory preload enabled on the secondary. Memory preload means the secondary loads data tables into memory so it can serve queries immediately after takeover, reducing RTO significantly.

Active/read-enabled secondary

Starting with HANA 2.0 SPS01, the secondary node in an HSR pair can serve read-only queries while actively receiving replication data. This is called the active/read-enabled secondary.

Benefits:

Offload read-heavy reporting queries to the secondary
Better utilization of the secondary VM (it is not just idle standby)
Reduces load on the primary for better write performance
No additional licensing cost for HANA Enterprise Edition

Limitations:

Read queries on the secondary may see slightly stale data during replication lag
If the secondary needs to take over as primary, read sessions are disconnected
Only available with HANA 2.0 SPS01 or later

🛡️ Lars considers. “So GlobalPharma’s auditors can run their compliance reports against the secondary node without slowing down production?”

☁️ Mei nods. “Exactly. And if the primary fails, the secondary drops the read connections and becomes the new primary. It is a much better use of the standby hardware than having it sit idle.”

Pacemaker for HANA HA

On Linux, Pacemaker automates HANA HSR failover using two specialized resource agents:

SAPHana — manages the HANA primary/secondary roles. It monitors HSR status and orchestrates takeover when the primary fails. This agent understands HANA-specific states and replication status.

SAPHanaTopology — monitors the HANA replication topology (which node is primary, which is secondary, replication status). It feeds information to SAPHana for decision-making.

Both agents work together:

SAPHanaTopology continuously checks replication status on both nodes
SAPHana uses this information to determine cluster health
If the primary fails, SAPHana promotes the secondary to primary
STONITH fences the failed node
Azure Load Balancer health probe detects the new primary

Hook scripts

HANA hook scripts are Python scripts that HANA calls during replication events (takeover, registration, status changes). They integrate HANA’s internal replication awareness with Pacemaker:

SAPHanaSR hook — notifies Pacemaker about HSR status changes
Runs inside the HANA process, providing faster notification than polling
Reduces the time Pacemaker needs to detect a replication issue
Must be configured on both primary and secondary nodes

Hook scripts improve failover speed

Without hook scripts, Pacemaker relies on periodic polling to detect HSR status changes. With hook scripts, HANA proactively notifies Pacemaker the moment a replication event occurs. This can reduce failover detection time from 30+ seconds to near-instant. The exam may test whether you know that hook scripts complement (not replace) the Pacemaker resource agents.

Azure Load Balancer for HANA

The Load Balancer configuration for HANA HA follows the same principles as ASCS:

Standard SKU, internal — HANA cluster IPs are private
Floating IP enabled — mandatory for the HANA virtual IP
Health probe on port 625xx — where xx is the HANA instance number (e.g., 62503 for instance 03)
Backend pool — contains both HANA VMs
HA ports rule — forwards all HANA ports (3xx13, 3xx14, 3xx15, etc.) through a single rule

Applications connect to the Load Balancer’s frontend IP, which always resolves to the active HANA primary. After failover, the health probe detects the new primary and traffic switches automatically.

Testing failover

🛡️ Lars insists. “We need to test this before going live. Dr. Schmidt requires documented evidence that failover works.”

Testing approaches:

Graceful takeover — use hdbnsutil -sr_takeover to trigger a planned failover
Simulate VM failure — stop the primary VM from the Azure portal
Kill HANA process — terminate the HANA indexserver process to test Pacemaker detection
Network isolation — block network between nodes to test fencing
Document results: failover time, data loss verification, client reconnection behavior

Question

What HSR replication mode is recommended for HANA HA on Azure?