🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 1
Domain 1 — Module 3 of 5 60%
3 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 1: Design and Implement an MLOps Infrastructure Premium ⏱ ~12 min read

Compute Targets: Choosing the Right Engine

Training needs horsepower. Inference needs reliability. Learn to select, configure, and optimise Azure ML compute targets — from cheap dev instances to GPU clusters.

Why compute matters in MLOps

☕ Simple explanation

Think of compute like a car rental.

For daily errands (exploring data, writing code), you rent a small hatchback — cheap, always available. For moving day (training a large model), you rent a truck — expensive, but you only need it for a few hours. For a taxi service (serving predictions to users), you need a reliable sedan that’s always running.

Azure ML gives you different “vehicles” for each job. Picking the wrong one means either wasting money or waiting too long.

Azure ML compute targets serve three distinct workloads:

  • Development — Interactive notebooks, debugging, small experiments (compute instances)
  • Training — Large-scale model training, hyperparameter sweeps (compute clusters, serverless compute)
  • Inference — Serving predictions in real-time or batch (managed endpoints)

Each has different cost profiles, scaling behaviour, and configuration options. The AI-300 exam tests your ability to select the right compute for each scenario.

Compute types at a glance

Azure ML compute options
FeatureScalingGPU SupportCost ModelBest For
Compute InstanceSingle VM (no scaling)YesPay while runningDev, notebooks, debugging
Compute Cluster0 to N nodes (auto-scale)YesPay per node-minute, scale to 0Training jobs, sweeps, pipelines
Serverless ComputeOn-demand (fully managed)YesPay per jobBurst training, no infra management
Kubernetes (AKS)Managed by AKSYesAKS pricingHybrid, multi-cloud, existing clusters
Managed EndpointAuto-scale instancesYesPay per instance-hourReal-time inference

Compute instances: your personal dev machine

A compute instance is a single managed VM for interactive development. Think of it as your cloud-based workstation.

# Create a compute instance
az ml compute create \
  --name kai-dev-vm \
  --type ComputeInstance \
  --size Standard_DS3_v2 \
  --resource-group rg-ml-dev \
  --workspace-name neuralspark-dev

Key facts:

  • One user per instance — assigned to a specific identity
  • Doesn’t auto-scale — it’s a single VM
  • Schedule auto-shutdown to save costs (critical for exam scenarios)
  • Can run Jupyter notebooks, VS Code, and terminal directly
Scenario: Kai's cost-saving compute schedule

NeuralSpark’s cloud bill spiked because data scientists left compute instances running overnight. Kai’s fix:

  1. Enable auto-shutdown at 7 PM for all dev instances
  2. Enable idle shutdown after 60 minutes of inactivity
  3. Use Standard_DS3_v2 (16 GB RAM, no GPU) for most dev work
  4. Create a separate GPU instance (Standard_NC6s_v3) only for model prototyping — with a 30-minute idle shutdown

Monthly savings: ~60% reduction in dev compute costs.

Compute clusters: scaling for training

A compute cluster scales from 0 to N nodes based on demand. When no jobs are running, it scales to zero nodes — you pay nothing.

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="gpu-training-cluster",
    type="amlcompute",
    size="Standard_NC24ads_A100_v4",
    min_instances=0,         # Scale to zero when idle
    max_instances=4,         # Up to 4 GPU nodes
    idle_time_before_scale_down=300,  # 5 minutes idle before removing a node
    tier="Dedicated"         # Dedicated (vs low-priority/spot)
)
ml_client.compute.begin_create_or_update(cluster).result()

What’s happening:

  • Line 6: A100 GPUs — serious training hardware
  • Line 7: min_instances=0 is the key cost control — no idle charges
  • Line 8: Up to 4 nodes for distributed training or parallel sweeps
  • Line 9: Nodes stay alive 5 minutes after a job ends (avoids cold-start for the next job)

Dedicated vs low-priority compute

DedicatedLow-Priority (Spot)
CostFull price60-80% discount
AvailabilityGuaranteedCan be pre-empted anytime
Best forProduction training, time-sensitive jobsLong-running experiments, hyperparameter sweeps
RiskNoneJob may be interrupted and needs retry logic
💡 Exam tip: When to use low-priority

The exam often tests cost optimisation scenarios. Key rule:

  • Low-priority is safe for: hyperparameter sweeps (many short jobs, can restart), non-urgent exploration
  • Low-priority is NOT safe for: production training on a deadline, single long-running job with no checkpointing

If the question mentions “budget constraints” and “hyperparameter tuning,” low-priority is usually correct.

Serverless compute

Serverless compute removes infrastructure management entirely. You submit a job, Azure provisions compute automatically, and charges per job.

# In a training job YAML
compute: serverless
resources:
  instance_type: Standard_NC24ads_A100_v4
  instance_count: 2

When to choose serverless over clusters:

  • Burst workloads — unpredictable training demand
  • No cluster management — don’t want to manage min/max nodes
  • Quick experiments — no setup needed

When to choose clusters over serverless:

  • Sustained workloads — cluster stays warm between jobs
  • Fine-grained control — node images, identity, networking
  • Cost predictability — known budget for allocated nodes

Inference compute (managed endpoints)

Models in production need dedicated compute for serving predictions. Azure ML managed online endpoints handle this:

# Create a managed online endpoint
az ml online-endpoint create \
  --name churn-predictor-endpoint \
  --resource-group rg-ml-prod \
  --workspace-name neuralspark-prod

Inference compute is covered in detail in Module 12 (Deploying Models). For now, know that:

  • Real-time endpoints use dedicated VMs that auto-scale based on request volume
  • Batch endpoints spin up compute clusters to process large datasets
  • You choose the VM size (SKU) for the endpoint deployment

Key terms flashcards

Question

Compute instance vs compute cluster?

Click or press Enter to reveal answer

Answer

Instance: single VM for dev/notebooks, one user, no auto-scaling. Cluster: 0 to N nodes for training, auto-scales to zero when idle, shared by many jobs.

Click to flip back

Question

What does min_instances=0 do on a compute cluster?

Click or press Enter to reveal answer

Answer

It enables scale-to-zero — when no jobs are queued, all nodes are de-allocated and you pay nothing. This is the most important cost control for training clusters.

Click to flip back

Question

When should you use low-priority (spot) compute?

Click or press Enter to reveal answer

Answer

For cost-sensitive, interruptible workloads like hyperparameter sweeps. NOT for time-critical production training or long-running jobs without checkpointing.

Click to flip back

Question

Serverless compute vs compute clusters?

Click or press Enter to reveal answer

Answer

Serverless: zero management, pay per job, good for burst. Clusters: more control, stay warm between jobs, better for sustained workloads and cost predictability.

Click to flip back

Knowledge check

Knowledge Check

NeuralSpark's cloud bill is too high. Kai discovers that data scientists are leaving compute instances running 24/7. What is the most effective fix?

Knowledge Check

Dr. Fatima is running a hyperparameter sweep with 200 trial combinations for a fraud detection model. The deadline is flexible — it just needs to finish this week. How should she configure compute to minimise cost?

🎬 Video coming soon


Next up: Infrastructure as Code — deploying ML infrastructure with Bicep and GitHub Actions.

← Previous

Data, Environments & Components

Next →

Infrastructure as Code: Provisioning at Scale

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.