Compute Targets: Choosing the Right Engine
Training needs horsepower. Inference needs reliability. Learn to select, configure, and optimise Azure ML compute targets — from cheap dev instances to GPU clusters.
Why compute matters in MLOps
Think of compute like a car rental.
For daily errands (exploring data, writing code), you rent a small hatchback — cheap, always available. For moving day (training a large model), you rent a truck — expensive, but you only need it for a few hours. For a taxi service (serving predictions to users), you need a reliable sedan that’s always running.
Azure ML gives you different “vehicles” for each job. Picking the wrong one means either wasting money or waiting too long.
Compute types at a glance
| Feature | Scaling | GPU Support | Cost Model | Best For |
|---|---|---|---|---|
| Compute Instance | Single VM (no scaling) | Yes | Pay while running | Dev, notebooks, debugging |
| Compute Cluster | 0 to N nodes (auto-scale) | Yes | Pay per node-minute, scale to 0 | Training jobs, sweeps, pipelines |
| Serverless Compute | On-demand (fully managed) | Yes | Pay per job | Burst training, no infra management |
| Kubernetes (AKS) | Managed by AKS | Yes | AKS pricing | Hybrid, multi-cloud, existing clusters |
| Managed Endpoint | Auto-scale instances | Yes | Pay per instance-hour | Real-time inference |
Compute instances: your personal dev machine
A compute instance is a single managed VM for interactive development. Think of it as your cloud-based workstation.
# Create a compute instance
az ml compute create \
--name kai-dev-vm \
--type ComputeInstance \
--size Standard_DS3_v2 \
--resource-group rg-ml-dev \
--workspace-name neuralspark-dev
Key facts:
- One user per instance — assigned to a specific identity
- Doesn’t auto-scale — it’s a single VM
- Schedule auto-shutdown to save costs (critical for exam scenarios)
- Can run Jupyter notebooks, VS Code, and terminal directly
Scenario: Kai's cost-saving compute schedule
NeuralSpark’s cloud bill spiked because data scientists left compute instances running overnight. Kai’s fix:
- Enable auto-shutdown at 7 PM for all dev instances
- Enable idle shutdown after 60 minutes of inactivity
- Use
Standard_DS3_v2(16 GB RAM, no GPU) for most dev work - Create a separate GPU instance (
Standard_NC6s_v3) only for model prototyping — with a 30-minute idle shutdown
Monthly savings: ~60% reduction in dev compute costs.
Compute clusters: scaling for training
A compute cluster scales from 0 to N nodes based on demand. When no jobs are running, it scales to zero nodes — you pay nothing.
from azure.ai.ml.entities import AmlCompute
cluster = AmlCompute(
name="gpu-training-cluster",
type="amlcompute",
size="Standard_NC24ads_A100_v4",
min_instances=0, # Scale to zero when idle
max_instances=4, # Up to 4 GPU nodes
idle_time_before_scale_down=300, # 5 minutes idle before removing a node
tier="Dedicated" # Dedicated (vs low-priority/spot)
)
ml_client.compute.begin_create_or_update(cluster).result()
What’s happening:
- Line 6: A100 GPUs — serious training hardware
- Line 7:
min_instances=0is the key cost control — no idle charges - Line 8: Up to 4 nodes for distributed training or parallel sweeps
- Line 9: Nodes stay alive 5 minutes after a job ends (avoids cold-start for the next job)
Dedicated vs low-priority compute
| Dedicated | Low-Priority (Spot) | |
|---|---|---|
| Cost | Full price | 60-80% discount |
| Availability | Guaranteed | Can be pre-empted anytime |
| Best for | Production training, time-sensitive jobs | Long-running experiments, hyperparameter sweeps |
| Risk | None | Job may be interrupted and needs retry logic |
Exam tip: When to use low-priority
The exam often tests cost optimisation scenarios. Key rule:
- Low-priority is safe for: hyperparameter sweeps (many short jobs, can restart), non-urgent exploration
- Low-priority is NOT safe for: production training on a deadline, single long-running job with no checkpointing
If the question mentions “budget constraints” and “hyperparameter tuning,” low-priority is usually correct.
Serverless compute
Serverless compute removes infrastructure management entirely. You submit a job, Azure provisions compute automatically, and charges per job.
# In a training job YAML
compute: serverless
resources:
instance_type: Standard_NC24ads_A100_v4
instance_count: 2
When to choose serverless over clusters:
- Burst workloads — unpredictable training demand
- No cluster management — don’t want to manage min/max nodes
- Quick experiments — no setup needed
When to choose clusters over serverless:
- Sustained workloads — cluster stays warm between jobs
- Fine-grained control — node images, identity, networking
- Cost predictability — known budget for allocated nodes
Inference compute (managed endpoints)
Models in production need dedicated compute for serving predictions. Azure ML managed online endpoints handle this:
# Create a managed online endpoint
az ml online-endpoint create \
--name churn-predictor-endpoint \
--resource-group rg-ml-prod \
--workspace-name neuralspark-prod
Inference compute is covered in detail in Module 12 (Deploying Models). For now, know that:
- Real-time endpoints use dedicated VMs that auto-scale based on request volume
- Batch endpoints spin up compute clusters to process large datasets
- You choose the VM size (SKU) for the endpoint deployment
Key terms flashcards
Knowledge check
NeuralSpark's cloud bill is too high. Kai discovers that data scientists are leaving compute instances running 24/7. What is the most effective fix?
Dr. Fatima is running a hyperparameter sweep with 200 trial combinations for a fraud detection model. The deadline is flexible — it just needs to finish this week. How should she configure compute to minimise cost?
🎬 Video coming soon
Next up: Infrastructure as Code — deploying ML infrastructure with Bicep and GitHub Actions.