High Availability Concepts for SAP | Guided by A Guide to Cloud

Why HA for SAP?

🛡️ Lars Eriksson adjusts his glasses. “At GlobalPharma, SAP is our nervous system. Procurement, manufacturing, quality control, regulatory reporting — it all flows through SAP. Dr. Schmidt, our compliance officer, has one rule: ‘If SAP goes down during a regulatory submission, we have a major audit finding.’ So high availability is not optional. It is a compliance requirement.”

☁️ Mei nods. “Lars is right. SAP is business-critical for most organizations. An hour of downtime can mean millions in lost revenue, failed deliveries, or regulatory violations. Azure gives us the building blocks, but we need to assemble them correctly for SAP.”

Simple explanation

Think of HA like having a backup generator for your house.

The power grid usually works fine, but when a storm knocks out power, your backup generator kicks in automatically. High availability for SAP works the same way — you have a second system standing by that takes over immediately when the primary fails. The users might notice a brief flicker, but the lights stay on.

Single points of failure in SAP

Before designing HA, you need to identify what can fail. In a standard SAP architecture:

ASCS/SCS — Critical SPOF The message server and enqueue server are singletons. Only one instance runs at a time. If it fails, no new user sessions can be established and SAP locks are lost. This is the most critical SPOF.

HANA database — Critical SPOF A single HANA instance means a VM failure causes complete data-tier outage. While data is persisted to disk, restarting a HANA instance from disk can take 30+ minutes for large databases.

Application servers — Not a SPOF (with multiple instances) If you run multiple application servers, losing one reduces capacity but does not cause an outage. The message server redirects users to surviving instances.

Web Dispatcher — Potential SPOF If you run a single Web Dispatcher, HTTP traffic has no path to the application servers. Deploy two with a load balancer for HA.

Exam tip: Know your SPOFs

The exam frequently asks “which component is a single point of failure?” The answer is ASCS/SCS and the HANA database. Application servers are NOT SPOFs if you have multiple instances. Always recommend HA clustering for ASCS and HANA, and multiple instances for the application tier.

Azure HA building blocks

Azure provides the infrastructure layer for SAP HA:

Availability zones — protect against datacenter failures. Place the primary and secondary cluster nodes in different zones.

Azure Load Balancer (Standard, internal) — directs traffic to the active cluster node using floating IP and health probes. Essential for both ASCS and HANA clusters.

Shared storage — ASCS clusters need shared file systems for SAP profiles and transport directories. Options include Azure Shared Disk, ANF, and NFS Azure Files.

Azure Fence Agent — integrates with Linux Pacemaker to power off or restart a failed VM during split-brain scenarios (fencing).

Shared storage options for SAP HA

Shared storage for SAP HA clusters
Feature	Azure Shared Disk	Azure NetApp Files (ANF)	NFS on Azure Files
Protocol	Block (attached to multiple VMs)	NFS	NFS
Typical SAP use	Windows ASCS (WSFC with shared disk)	Linux ASCS file shares, HANA shared	Linux ASCS file shares, transport directory
Multi-zone support	ZRS Shared Disk for cross-zone	ANF with zone placement	Zone-redundant storage (ZRS)
Management	Attach to VMs via Azure portal	Managed service — create volumes	Managed service — create file shares
Performance tier	Premium SSD or Ultra Disk	Standard/Premium/Ultra tiers	Premium tier recommended
Exam relevance	Know for Windows ASCS HA	Know for Linux ASCS and HANA scale-out	Know as an alternative to ANF for Linux

STONITH and fencing

In Linux HA clusters (Pacemaker), fencing is the mechanism that ensures a failed node is truly dead before the surviving node takes over resources. Without fencing, a split-brain scenario can occur where both nodes believe they are the primary — leading to data corruption.

STONITH (Shoot The Other Node In The Head) is the Pacemaker fencing mechanism:

It uses a fence agent to power off or reset the failed node
On Azure, the two options are Azure Fence Agent (calls Azure APIs to restart the VM) and SBD (STONITH Block Device, uses shared storage for fencing messages)
STONITH is mandatory for production Pacemaker clusters — without it, the cluster is not supported

🛡️ Lars raises a concern. “So if one node becomes unresponsive, the surviving node tells Azure to force-restart it?”

☁️ Mei confirms. “Exactly. The Azure Fence Agent calls the Azure API to deallocate or restart the problematic VM. This guarantees the failed node is not still running and holding resources that the new primary needs.”

Fencing is non-negotiable

Every Pacemaker cluster for SAP on Azure must have STONITH configured. The exam will test this. If a question describes a Pacemaker cluster without fencing, the correct answer is always “configure STONITH.” Without it, neither SAP nor Microsoft will support the configuration.

Application server HA

Application servers do not need clustering — their HA comes from running multiple instances:

Deploy at least 2 application servers per SAP system
Register all instances with the message server for automatic load distribution
Spread across availability zones for fault isolation
If one application server fails, users are redirected to the remaining instances
Batch jobs can be configured to restart on any available server

Question

What are the two critical single points of failure in an SAP architecture?

Click or press Enter to reveal answer

Answer

ASCS/SCS (message server + enqueue server) and the HANA database. Both require HA clustering. Application servers are NOT SPOFs when multiple instances are deployed — losing one reduces capacity but does not cause an outage.

Click to flip back

Question

What is STONITH and why is it mandatory for Pacemaker clusters?

Click or press Enter to reveal answer

Answer

STONITH (Shoot The Other Node In The Head) is the Pacemaker fencing mechanism that ensures a failed node is truly powered off before the surviving node takes over. Without STONITH, split-brain scenarios can cause data corruption. It is mandatory for all production SAP Pacemaker clusters on Azure.

Click to flip back

Question

What role does Azure Load Balancer play in SAP HA?

Click or press Enter to reveal answer

Answer

Azure Load Balancer (Standard, internal) with floating IP directs traffic to the active cluster node. Health probes detect which node is active. It is essential for both ASCS and HANA HA clusters to route traffic to the correct node after failover.

Click to flip back

Question

Which shared storage option is typically used for Windows ASCS HA on Azure?

Click or press Enter to reveal answer

Answer

Azure Shared Disk (Premium SSD or Ultra Disk). Windows ASCS HA uses Windows Server Failover Clustering (WSFC) with a shared disk for the SAP profile and transport directories. Linux ASCS HA typically uses ANF or NFS Azure Files instead.

Click to flip back

Knowledge check

Knowledge Check

GlobalPharma is designing SAP HA. Lars asks which components need clustering. What should Mei recommend?

Knowledge Check

Lars is configuring a Pacemaker cluster for ASCS on SUSE Linux. What STONITH mechanism should he use on Azure?

Knowledge Check

GlobalPharma runs SAP on Linux and needs shared storage for the ASCS HA file shares. Which option should Lars choose?

Summary

You now understand why SAP needs high availability: ASCS and HANA are critical SPOFs that require clustering, application servers achieve HA through multiple instances, and Azure provides the building blocks (availability zones, Load Balancer, shared storage, fencing). STONITH is mandatory for all Pacemaker clusters.

Next, we dive into the details of ASCS/SCS high availability — clustering the message server and enqueue server with ERS and Pacemaker.

🎬 Video coming soon