Availability Sets, Zones & Scale Sets
Keep your VMs running when hardware fails, datacenters go down, or traffic spikes. Learn availability sets, availability zones, and Virtual Machine Scale Sets — Azure's three pillars of VM resilience.
Why availability matters
Availability is about keeping your app running even when things break.
On-prem, you’d have two servers in a failover cluster. If one dies, the other takes over. Azure gives you three ways to do this: Availability Sets spread VMs across fault domains within a datacenter. Availability Zones spread VMs across separate datacenters in a region. Scale Sets automatically create and remove VMs based on demand.
Availability Sets
An availability set distributes VMs across fault domains and update domains within a single datacenter.
- Fault domain (FD) — a group of VMs sharing common power and network. Max 3 FDs per set. If a rack loses power, only VMs in that FD are affected.
- Update domain (UD) — a group of VMs that can be rebooted simultaneously during maintenance. Max 20 UDs. Azure reboots one UD at a time.
SLA: 99.95% for 2+ VMs in an availability set.
Availability Zones
Availability zones are physically separate datacenters within an Azure region. Each zone has independent power, cooling, and networking.
- Most regions have 3 zones (Zone 1, 2, 3)
- Deploy VMs across zones for datacenter-level protection
- SLA: 99.99% for VMs across 2+ zones (higher than availability sets)
| Feature | Availability Sets | Availability Zones |
|---|---|---|
| Protection scope | Hardware rack failure within a datacenter | Entire datacenter failure within a region |
| Physical separation | Different racks (fault domains) | Different buildings (datacenters) |
| SLA | 99.95% | 99.99% |
| Cost | No extra cost (just VM cost) | No extra cost (but cross-zone bandwidth charges) |
| Configuration | Assign VMs to the same availability set | Deploy VMs to different zones in the same region |
| Works with | VMs in the same datacenter | VMs in the same region, different datacenters |
Exam tip: Know the SLA numbers
This is heavily tested:
- Single VM with Premium SSD: 99.9% SLA
- Availability Set (2+ VMs): 99.95% SLA
- Availability Zones (2+ VMs across zones): 99.99% SLA
If a question asks for the highest availability, the answer is Availability Zones. If the question asks for protection against rack failure only, Availability Sets are sufficient.
Virtual Machine Scale Sets (VMSS)
VMSS lets you create and manage a group of identical, load-balanced VMs that automatically scale based on demand.
Key features:
- Auto-scale — add VMs when demand increases, remove when it decreases
- Load balancing — built-in integration with Azure Load Balancer or Application Gateway
- Rolling updates — update VMs in batches without downtime
- Support for Availability Zones — distribute instances across zones
Scaling options:
| Type | How It Works |
|---|---|
| Manual scaling | Set fixed instance count (e.g., always 5 VMs) |
| Auto-scale (metric) | Scale based on CPU%, memory, or custom metrics |
| Auto-scale (schedule) | Scale at specific times (e.g., more VMs during business hours) |
Real-world: CloudFirst Labs auto-scales
CloudFirst Labs runs their web app on a VMSS with these rules:
- Minimum 2 instances (always running)
- Maximum 10 instances (cost cap)
- Scale out: add 1 instance when average CPU exceeds 70% for 5 minutes
- Scale in: remove 1 instance when average CPU drops below 30% for 10 minutes
- Schedule: minimum 4 instances Monday-Friday 8am-6pm (business hours)
During a product launch, the VMSS automatically scaled to 8 instances, then back to 2 overnight. Zero manual intervention.
Knowledge check
Meridian Financial needs their production database VMs to survive a complete datacenter failure within their Azure region. What should Alex configure?
CloudFirst Labs wants their web tier to automatically add VMs when traffic spikes and remove them when traffic drops. What Azure resource should they use?
🎬 Video coming soon