High Availability and Scalability

Why do availability and scalability matter?

Simple explanation

Imagine your favourite coffee shop.

High availability = the shop is open every day, even if one barista calls in sick. There’s always someone to make your coffee because they have backup staff.

Scalability = during the morning rush, the shop opens extra registers and calls in more baristas. When it’s quiet in the afternoon, they close the extra registers. They match capacity to demand.

In cloud computing, your apps need to be “always open” (available) and able to “call in more baristas” (scale) when traffic spikes.

High availability — keeping things running

When Peak Roasters launches their online ordering system, they can’t afford downtime. Every minute the ordering page is down, they lose sales.

High availability means: even if a server crashes, the app keeps running because it’s deployed across multiple servers (or even multiple data centres).

How Azure delivers high availability

Mechanism	What It Does	Example
Redundancy	Multiple copies of your app across servers	3 VMs behind a load balancer
Load balancing	Distributes traffic across healthy instances	Azure Load Balancer
Availability zones	Separate physical locations within a region	Zone 1, Zone 2, Zone 3
Region pairs	Azure matches regions for disaster recovery	Australia East + Australia Southeast
Auto-restart	Failed VMs automatically restart on healthy hardware	Azure fabric controller

SLAs — measuring availability

Azure measures availability using Service Level Agreements (SLAs) — guarantees expressed as uptime percentages:

SLA	Downtime Per Year	Downtime Per Month
99%	3.65 days	7.3 hours
99.9%	8.76 hours	43.8 minutes
99.95%	4.38 hours	21.9 minutes
99.99%	52.6 minutes	4.38 minutes

Key exam concept: Higher SLAs require more redundancy, which costs more. A single VM might offer 99.9% SLA. Two VMs in an availability set might offer 99.95%. Two VMs across availability zones might offer 99.99%.

Exam tip: The nines matter

The exam may ask about SLA percentages and what they translate to in real downtime. Key numbers to remember:

99.9% (three nines) = about 8.76 hours of downtime per year
99.99% (four nines) = about 52.6 minutes of downtime per year

When you combine services, the combined SLA is lower than the individual SLAs. If Service A has 99.9% and Service B has 99.9%, together they offer 99.9% x 99.9% = 99.8% uptime.

Scalability — matching resources to demand

Scalability means you can add (or remove) resources based on demand. There are two types:

Vertical vs horizontal scaling
Feature	Vertical Scaling (Scale Up/Down)	Horizontal Scaling (Scale Out/In)
What changes	Size of a single resource	Number of resource instances
Example	Upgrade a VM from 2 CPU/4 GB to 8 CPU/32 GB	Go from 1 VM to 5 VMs behind a load balancer
Analogy	Replacing a small truck with a bigger truck	Adding more trucks to the fleet
Limit	Max hardware size of the machine	Virtually unlimited
Downtime	Usually requires a restart	No downtime — new instances are added live
Best for	Databases, single-instance apps	Web apps, APIs, stateless services

Scaling in action: Summit Construction

Summit Construction’s project portal normally handles 50 users. But during quarterly reviews, 500 project managers log in simultaneously.

Without cloud: They’d need to buy servers capable of handling 500 users — even though they only need that capacity 4 times a year. Those servers sit idle 95% of the time.

With Azure: The portal runs on 2 VMs normally. During quarterly reviews, it automatically scales out to 10 VMs. After the review, it scales back to 2. They only pay for the extra VMs during those peak periods.

Elasticity — automatic scaling

Elasticity is a specific type of scalability where resources automatically increase and decrease based on demand — without human intervention.

Concept	Definition
Scalability	The system can handle more load by adding resources
Elasticity	The system automatically adds and removes resources based on actual demand

Think of a rubber band: it stretches when pulled and snaps back when released. An elastic cloud system stretches with traffic spikes and contracts when traffic drops.

Azure services that provide elasticity:

Virtual Machine Scale Sets — automatically add/remove VMs based on CPU, memory, or custom metrics
Azure App Service — auto-scale web apps based on request count or schedule
Azure Functions — scale from zero to thousands of instances automatically

Real-world: Harbour Health during flu season

Harbour Health’s patient portal sees 10x normal traffic during flu season. With Azure’s elasticity:

Normal: 3 VMs, handling 500 concurrent users
Flu season peak: Auto-scales to 15 VMs, handling 5,000 concurrent users
After flu season: Automatically scales back to 3 VMs

Total extra cost: only the additional VMs during the 6-week peak period, not year-round.

🎬 Video walkthrough

Flashcards

Question

What is high availability in cloud computing?

Click or press Enter to reveal answer

Answer

The ability of a system to remain operational and accessible even when components fail. Achieved through redundancy, load balancing, availability zones, and region pairs.

Click to flip back

Question

What is the difference between vertical scaling and horizontal scaling?

Click or press Enter to reveal answer

Answer

Vertical scaling (scale up) = making a single resource bigger (more CPU, RAM). Horizontal scaling (scale out) = adding more instances of a resource. Horizontal is preferred for cloud apps because it's virtually unlimited and requires no downtime.

Click to flip back

Question

What is the difference between scalability and elasticity?

Click or press Enter to reveal answer

Answer

Scalability means the system CAN handle more load. Elasticity means the system AUTOMATICALLY adjusts resources based on demand — scaling up during peaks and down during quiet periods.

Click to flip back

Question

If Service A has a 99.9% SLA and Service B has a 99.9% SLA, what is the combined SLA?

Click or press Enter to reveal answer

Answer

99.8% — calculated by multiplying: 0.999 x 0.999 = 0.998. The combined SLA is always LOWER than the individual SLAs.

Click to flip back

Knowledge Check

Summit Construction's project portal normally serves 50 users but gets 500 users during quarterly reviews. Which scaling approach is MOST appropriate?

Knowledge Check

An Azure VM has a 99.9% SLA. What does this mean in practical terms?

Knowledge Check

Which cloud characteristic allows resources to automatically increase and decrease based on demand without manual intervention?

Next up: More cloud benefits — reliability, predictability, security, governance, and manageability in the cloud.