High Availability and Scalability
Two of the most important cloud benefits — keeping your apps running when things fail, and growing your resources to match demand. Here's how Azure delivers both.
Why do availability and scalability matter?
Imagine your favourite coffee shop.
High availability = the shop is open every day, even if one barista calls in sick. There’s always someone to make your coffee because they have backup staff.
Scalability = during the morning rush, the shop opens extra registers and calls in more baristas. When it’s quiet in the afternoon, they close the extra registers. They match capacity to demand.
In cloud computing, your apps need to be “always open” (available) and able to “call in more baristas” (scale) when traffic spikes.
High availability — keeping things running
When Peak Roasters launches their online ordering system, they can’t afford downtime. Every minute the ordering page is down, they lose sales.
High availability means: even if a server crashes, the app keeps running because it’s deployed across multiple servers (or even multiple data centres).
How Azure delivers high availability
| Mechanism | What It Does | Example |
|---|---|---|
| Redundancy | Multiple copies of your app across servers | 3 VMs behind a load balancer |
| Load balancing | Distributes traffic across healthy instances | Azure Load Balancer |
| Availability zones | Separate physical locations within a region | Zone 1, Zone 2, Zone 3 |
| Region pairs | Azure matches regions for disaster recovery | Australia East + Australia Southeast |
| Auto-restart | Failed VMs automatically restart on healthy hardware | Azure fabric controller |
SLAs — measuring availability
Azure measures availability using Service Level Agreements (SLAs) — guarantees expressed as uptime percentages:
| SLA | Downtime Per Year | Downtime Per Month |
|---|---|---|
| 99% | 3.65 days | 7.3 hours |
| 99.9% | 8.76 hours | 43.8 minutes |
| 99.95% | 4.38 hours | 21.9 minutes |
| 99.99% | 52.6 minutes | 4.38 minutes |
Key exam concept: Higher SLAs require more redundancy, which costs more. A single VM might offer 99.9% SLA. Two VMs in an availability set might offer 99.95%. Two VMs across availability zones might offer 99.99%.
Exam tip: The nines matter
The exam may ask about SLA percentages and what they translate to in real downtime. Key numbers to remember:
- 99.9% (three nines) = about 8.76 hours of downtime per year
- 99.99% (four nines) = about 52.6 minutes of downtime per year
When you combine services, the combined SLA is lower than the individual SLAs. If Service A has 99.9% and Service B has 99.9%, together they offer 99.9% x 99.9% = 99.8% uptime.
Scalability — matching resources to demand
Scalability means you can add (or remove) resources based on demand. There are two types:
| Feature | Vertical Scaling (Scale Up/Down) | Horizontal Scaling (Scale Out/In) |
|---|---|---|
| What changes | Size of a single resource | Number of resource instances |
| Example | Upgrade a VM from 2 CPU/4 GB to 8 CPU/32 GB | Go from 1 VM to 5 VMs behind a load balancer |
| Analogy | Replacing a small truck with a bigger truck | Adding more trucks to the fleet |
| Limit | Max hardware size of the machine | Virtually unlimited |
| Downtime | Usually requires a restart | No downtime — new instances are added live |
| Best for | Databases, single-instance apps | Web apps, APIs, stateless services |
Scaling in action: Summit Construction
Summit Construction’s project portal normally handles 50 users. But during quarterly reviews, 500 project managers log in simultaneously.
Without cloud: They’d need to buy servers capable of handling 500 users — even though they only need that capacity 4 times a year. Those servers sit idle 95% of the time.
With Azure: The portal runs on 2 VMs normally. During quarterly reviews, it automatically scales out to 10 VMs. After the review, it scales back to 2. They only pay for the extra VMs during those peak periods.
Elasticity — automatic scaling
Elasticity is a specific type of scalability where resources automatically increase and decrease based on demand — without human intervention.
| Concept | Definition |
|---|---|
| Scalability | The system can handle more load by adding resources |
| Elasticity | The system automatically adds and removes resources based on actual demand |
Think of a rubber band: it stretches when pulled and snaps back when released. An elastic cloud system stretches with traffic spikes and contracts when traffic drops.
Azure services that provide elasticity:
- Virtual Machine Scale Sets — automatically add/remove VMs based on CPU, memory, or custom metrics
- Azure App Service — auto-scale web apps based on request count or schedule
- Azure Functions — scale from zero to thousands of instances automatically
Real-world: Harbour Health during flu season
Harbour Health’s patient portal sees 10x normal traffic during flu season. With Azure’s elasticity:
- Normal: 3 VMs, handling 500 concurrent users
- Flu season peak: Auto-scales to 15 VMs, handling 5,000 concurrent users
- After flu season: Automatically scales back to 3 VMs
Total extra cost: only the additional VMs during the 6-week peak period, not year-round.
🎬 Video walkthrough
🎬 Video coming soon
High Availability and Scalability — AZ-900
High Availability and Scalability — AZ-900
~9 minFlashcards
Knowledge Check
Summit Construction's project portal normally serves 50 users but gets 500 users during quarterly reviews. Which scaling approach is MOST appropriate?
An Azure VM has a 99.9% SLA. What does this mean in practical terms?
Which cloud characteristic allows resources to automatically increase and decrease based on demand without manual intervention?
Next up: More cloud benefits — reliability, predictability, security, governance, and manageability in the cloud.