πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 2
Domain 2 β€” Module 3 of 8 38%
8 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 2: Implement Machine Learning Model Lifecycle and Operations Premium ⏱ ~14 min read

Training Pipelines: Automate Everything

Stop running scripts manually. Build Azure ML pipelines that chain data prep, training, evaluation, and registration into reproducible, automated workflows.

Why pipelines?

β˜• Simple explanation

Imagine a car assembly line vs building a car by hand.

Building by hand: one person does everything β€” welding, painting, engine, interior. If they’re sick, nothing happens. If a step fails, you start over.

Assembly line: each station does one job. Raw metal goes in, a finished car comes out. If the painting station fails, you fix just that station. You can run the line 24/7.

ML pipelines are the assembly line for model training. Data prep β†’ feature engineering β†’ training β†’ evaluation β†’ registration. Each step is a reusable component. The whole pipeline runs automatically, logs everything, and can be triggered by GitHub Actions.

An Azure ML pipeline is a workflow of connected steps (components) that automates the ML training lifecycle:

  • Modularity β€” each step is a component with defined inputs/outputs
  • Reusability β€” components can be shared across pipelines via registries
  • Caching β€” unchanged steps are skipped on re-run (saves time and compute)
  • Parallelism β€” independent steps run simultaneously
  • Traceability β€” every step logs to MLflow, creating an end-to-end audit trail

Notebooks vs scripts vs pipelines

Notebooks vs scripts vs pipelines
FeatureReproducibleAutomatableProduction-ReadyBest For
Notebooks (.ipynb)Low β€” cell order mattersHard β€” requires conversionNoExploration, EDA, prototyping
Scripts (.py)Medium β€” deterministicYes β€” CLI/SDK submissionPartialSingle training jobs, simple workflows
PipelinesHigh β€” defined DAGYes β€” CI/CD triggersYesProduction training, multi-step workflows
πŸ’‘ Exam tip: Notebooks in production

The exam recognises notebooks for exploration and experimentation but NOT for production training. If a question asks β€œwhat should a team use for production model training,” the answer is pipelines (or scripts submitted as jobs), never notebooks.

Notebooks are great for:

  • Exploratory data analysis (EDA)
  • Rapid prototyping
  • Sharing results with stakeholders (visual outputs)

But they fail in production because:

  • Cell execution order is fragile
  • Hard to parameterise for different datasets
  • Difficult to test and version reliably

Building a pipeline with Python SDK v2

from azure.ai.ml import load_component, Input
from azure.ai.ml.dsl import pipeline

# Load reusable components from YAML definitions
prepare_data = load_component(source="components/prepare/component.yaml")
train_model = load_component(source="components/train/component.yaml")
evaluate_model = load_component(source="components/evaluate/component.yaml")

@pipeline(
    display_name="churn-training-pipeline",
    compute="gpu-training-cluster",
    experiment_name="churn-pipeline-runs"
)
def churn_pipeline(raw_data: Input, target_metric: float = 0.90):
    # Step 1: Data preparation
    prep_step = prepare_data(input_data=raw_data)

    # Step 2: Training (uses output from step 1)
    train_step = train_model(
        training_data=prep_step.outputs.cleaned_data,
        target_column="churned"
    )

    # Step 3: Evaluation (uses output from step 2)
    eval_step = evaluate_model(
        model=train_step.outputs.trained_model,
        test_data=prep_step.outputs.test_data,
        threshold=target_metric
    )

    return eval_step.outputs

# Create and submit the pipeline
pipeline_job = churn_pipeline(
    raw_data=Input(type="uri_folder", path="azureml:churn-data:2")
)
returned_job = ml_client.jobs.create_or_update(pipeline_job)

What’s happening:

  • Lines 5-7: Load components from YAML β€” each is a reusable building block
  • Lines 9-13: The @pipeline decorator defines the workflow metadata
  • Line 14: The pipeline function accepts inputs β€” parameterised for different datasets
  • Lines 16-29: Steps are chained by connecting outputs to inputs β€” Azure ML figures out the execution order
  • Line 36: One line to submit the entire pipeline to the cloud

Pipeline YAML definition (alternative)

You can also define pipelines in YAML (often preferred for CI/CD):

# pipelines/training-pipeline.yaml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: churn-training-pipeline
experiment_name: churn-pipeline-runs
compute: azureml:gpu-training-cluster

inputs:
  raw_data:
    type: uri_folder
    path: azureml:churn-data:2
  target_metric: 0.90

jobs:
  prepare:
    type: command
    component: file:components/prepare/component.yaml
    inputs:
      input_data: ${{parent.inputs.raw_data}}

  train:
    type: command
    component: file:components/train/component.yaml
    inputs:
      training_data: ${{parent.jobs.prepare.outputs.cleaned_data}}
      target_column: churned

  evaluate:
    type: command
    component: file:components/evaluate/component.yaml
    inputs:
      model: ${{parent.jobs.train.outputs.trained_model}}
      test_data: ${{parent.jobs.prepare.outputs.test_data}}
      threshold: ${{parent.inputs.target_metric}}

What’s happening:

  • Lines 15-19: Step 1 references the pipeline input
  • Lines 21-26: Step 2 references Step 1’s output β€” creating the dependency chain
  • Lines 28-34: Step 3 evaluates using outputs from both previous steps
πŸ’‘ Exam tip: Python DSL vs YAML pipelines

Both approaches create identical pipelines. The exam may test when to use each:

  • YAML pipelines: better for CI/CD (GitHub Actions can submit them directly), version-controlled, easy to review in PRs
  • Python DSL (@pipeline): better for complex logic, conditional steps, dynamic parameterisation

Most production MLOps teams use YAML for CI/CD pipelines and Python DSL for experimentation.

Scenario: Kai's automated retraining pipeline

NeuralSpark’s churn model needs monthly retraining on fresh data. Kai builds a pipeline triggered by GitHub Actions on the 1st of each month:

  1. Data prep β€” pulls latest customer data, cleans, splits
  2. Training β€” trains on fresh data with the same hyperparameters
  3. Evaluation β€” compares new model against production baseline
  4. Gate β€” if new model beats baseline by more than 1%, proceed
  5. Registration β€” registers the new model in the registry

The pipeline runs unattended. If the new model isn’t better, it stops at the gate and alerts the team.

Step caching

Azure ML caches pipeline step outputs. If a step’s inputs and code haven’t changed, Azure ML reuses the previous output instead of re-running.

This means:

  • Changing only the training script re-runs training and evaluation, but skips data prep
  • Changing the dataset re-runs everything from prep onwards
  • Changing the evaluation threshold re-runs only evaluation

Key terms flashcards

Question

What is an Azure ML pipeline?

Click or press Enter to reveal answer

Answer

A workflow of connected components (steps) that automates the ML training lifecycle β€” from data prep through training to evaluation. Each step has defined inputs/outputs, logs to MLflow, and can be cached.

Click to flip back

Question

YAML pipeline vs Python DSL pipeline?

Click or press Enter to reveal answer

Answer

YAML: better for CI/CD (GitHub Actions), easy to review in PRs, version-controlled. Python DSL (@pipeline): better for complex logic, conditional steps, dynamic parameters. Both create identical pipelines.

Click to flip back

Question

What is step caching in pipelines?

Click or press Enter to reveal answer

Answer

Azure ML reuses output from unchanged steps instead of re-running them. Only steps whose inputs or code changed are re-executed. Saves time and compute.

Click to flip back

Question

Why not use notebooks for production training?

Click or press Enter to reveal answer

Answer

Notebooks have fragile cell execution order, are hard to parametrise, difficult to test, and don't integrate well with CI/CD. Use pipelines or scripts submitted as jobs for production.

Click to flip back

Knowledge check

Knowledge Check

NeuralSpark's training pipeline has 3 steps: data prep, training, and evaluation. Kai changes only the training script. What happens when the pipeline re-runs?

Knowledge Check

Dr. Fatima's compliance team requires that every production model training workflow is fully traceable and can be triggered automatically from CI/CD. What should she use?

🎬 Video coming soon


Next up: Distributed Training β€” scaling to datasets and models that don’t fit on a single machine.

← Previous

AutoML & Hyperparameter Tuning

Next β†’

Distributed Training: Scale to Big Data

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.