High Availability Concepts for SAP
Understand why SAP workloads need high availability, identify single points of failure, learn Azure HA building blocks including availability zones and load balancers, and explore shared storage options for SAP clustering.
Why HA for SAP?
π‘οΈ Lars Eriksson adjusts his glasses. βAt GlobalPharma, SAP is our nervous system. Procurement, manufacturing, quality control, regulatory reporting β it all flows through SAP. Dr. Schmidt, our compliance officer, has one rule: βIf SAP goes down during a regulatory submission, we have a major audit finding.β So high availability is not optional. It is a compliance requirement.β
βοΈ Mei nods. βLars is right. SAP is business-critical for most organizations. An hour of downtime can mean millions in lost revenue, failed deliveries, or regulatory violations. Azure gives us the building blocks, but we need to assemble them correctly for SAP.β
Think of HA like having a backup generator for your house.
The power grid usually works fine, but when a storm knocks out power, your backup generator kicks in automatically. High availability for SAP works the same way β you have a second system standing by that takes over immediately when the primary fails. The users might notice a brief flicker, but the lights stay on.
Single points of failure in SAP
Before designing HA, you need to identify what can fail. In a standard SAP architecture:
ASCS/SCS β Critical SPOF The message server and enqueue server are singletons. Only one instance runs at a time. If it fails, no new user sessions can be established and SAP locks are lost. This is the most critical SPOF.
HANA database β Critical SPOF A single HANA instance means a VM failure causes complete data-tier outage. While data is persisted to disk, restarting a HANA instance from disk can take 30+ minutes for large databases.
Application servers β Not a SPOF (with multiple instances) If you run multiple application servers, losing one reduces capacity but does not cause an outage. The message server redirects users to surviving instances.
Web Dispatcher β Potential SPOF If you run a single Web Dispatcher, HTTP traffic has no path to the application servers. Deploy two with a load balancer for HA.
Exam tip: Know your SPOFs
The exam frequently asks βwhich component is a single point of failure?β The answer is ASCS/SCS and the HANA database. Application servers are NOT SPOFs if you have multiple instances. Always recommend HA clustering for ASCS and HANA, and multiple instances for the application tier.
Azure HA building blocks
Azure provides the infrastructure layer for SAP HA:
Availability zones β protect against datacenter failures. Place the primary and secondary cluster nodes in different zones.
Azure Load Balancer (Standard, internal) β directs traffic to the active cluster node using floating IP and health probes. Essential for both ASCS and HANA clusters.
Shared storage β ASCS clusters need shared file systems for SAP profiles and transport directories. Options include Azure Shared Disk, ANF, and NFS Azure Files.
Azure Fence Agent β integrates with Linux Pacemaker to power off or restart a failed VM during split-brain scenarios (fencing).
Shared storage options for SAP HA
| Feature | Azure Shared Disk | Azure NetApp Files (ANF) | NFS on Azure Files |
|---|---|---|---|
| Protocol | Block (attached to multiple VMs) | NFS | NFS |
| Typical SAP use | Windows ASCS (WSFC with shared disk) | Linux ASCS file shares, HANA shared | Linux ASCS file shares, transport directory |
| Multi-zone support | ZRS Shared Disk for cross-zone | ANF with zone placement | Zone-redundant storage (ZRS) |
| Management | Attach to VMs via Azure portal | Managed service β create volumes | Managed service β create file shares |
| Performance tier | Premium SSD or Ultra Disk | Standard/Premium/Ultra tiers | Premium tier recommended |
| Exam relevance | Know for Windows ASCS HA | Know for Linux ASCS and HANA scale-out | Know as an alternative to ANF for Linux |
STONITH and fencing
In Linux HA clusters (Pacemaker), fencing is the mechanism that ensures a failed node is truly dead before the surviving node takes over resources. Without fencing, a split-brain scenario can occur where both nodes believe they are the primary β leading to data corruption.
STONITH (Shoot The Other Node In The Head) is the Pacemaker fencing mechanism:
- It uses a fence agent to power off or reset the failed node
- On Azure, the two options are Azure Fence Agent (calls Azure APIs to restart the VM) and SBD (STONITH Block Device, uses shared storage for fencing messages)
- STONITH is mandatory for production Pacemaker clusters β without it, the cluster is not supported
π‘οΈ Lars raises a concern. βSo if one node becomes unresponsive, the surviving node tells Azure to force-restart it?β
βοΈ Mei confirms. βExactly. The Azure Fence Agent calls the Azure API to deallocate or restart the problematic VM. This guarantees the failed node is not still running and holding resources that the new primary needs.β
Fencing is non-negotiable
Every Pacemaker cluster for SAP on Azure must have STONITH configured. The exam will test this. If a question describes a Pacemaker cluster without fencing, the correct answer is always βconfigure STONITH.β Without it, neither SAP nor Microsoft will support the configuration.
Application server HA
Application servers do not need clustering β their HA comes from running multiple instances:
- Deploy at least 2 application servers per SAP system
- Register all instances with the message server for automatic load distribution
- Spread across availability zones for fault isolation
- If one application server fails, users are redirected to the remaining instances
- Batch jobs can be configured to restart on any available server
Knowledge check
GlobalPharma is designing SAP HA. Lars asks which components need clustering. What should Mei recommend?
Lars is configuring a Pacemaker cluster for ASCS on SUSE Linux. What STONITH mechanism should he use on Azure?
GlobalPharma runs SAP on Linux and needs shared storage for the ASCS HA file shares. Which option should Lars choose?
Summary
You now understand why SAP needs high availability: ASCS and HANA are critical SPOFs that require clustering, application servers achieve HA through multiple instances, and Azure provides the building blocks (availability zones, Load Balancer, shared storage, fencing). STONITH is mandatory for all Pacemaker clusters.
Next, we dive into the details of ASCS/SCS high availability β clustering the message server and enqueue server with ERS and Pacemaker.
π¬ Video coming soon