🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-300 Domain 1
Domain 1 — Module 2 of 5 40%
2 of 25 overall

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production

AI-300 Study Guide

Domain 1: Design and Implement an MLOps Infrastructure

  • ML Workspace: Your AI Control Room Free
  • Data, Environments & Components
  • Compute Targets: Choosing the Right Engine
  • Infrastructure as Code: Provisioning at Scale
  • Git & CI/CD for ML Projects

Domain 2: Implement Machine Learning Model Lifecycle and Operations

  • MLflow: Track Every Experiment Free
  • AutoML & Hyperparameter Tuning
  • Training Pipelines: Automate Everything
  • Distributed Training: Scale to Big Data
  • Model Registration & Versioning
  • Model Approval & Responsible AI Gates
  • Deploying Models: Endpoints in Production
  • Drift, Monitoring & Retraining

Domain 3: Design and Implement a GenAIOps Infrastructure

  • Foundry: Hubs, Projects & Platform Setup Free
  • Network Security & IaC for Foundry
  • Deploying Foundation Models
  • Model Versioning & Production Strategies
  • PromptOps: Design, Compare, Version & Ship

Domain 4: Implement Generative AI Quality Assurance and Observability

  • Evaluation: Datasets, Metrics & Quality Gates Free
  • Safety Evaluations & Custom Metrics
  • Monitoring GenAI in Production
  • Cost Tracking, Logging & Debugging

Domain 5: Optimize Generative AI Systems and Model Performance

  • RAG Optimization: Better Retrieval, Better Answers Free
  • Embeddings & Hybrid Search
  • Fine-Tuning: Methods, Data & Production
Domain 1: Design and Implement an MLOps Infrastructure Premium ⏱ ~14 min read

Data, Environments & Components

Reproducibility is the backbone of MLOps. Master datastores, data assets, environments, components, and registries — the building blocks that make every experiment repeatable.

Making ML reproducible

☕ Simple explanation

Imagine baking a cake but never writing down the recipe.

You used “some flour” from “that bag in the pantry” and baked it in “whichever oven was free.” Next week you try again — different flour, different oven — and the cake tastes completely different. Was it the flour? The oven? Both?

MLOps has the same problem. If you don’t lock down your data (the ingredients), your environment (the oven), and your components (the recipe steps), you can’t reproduce results or debug failures. Azure ML gives you tools to version and manage all three.

Reproducibility in ML requires controlling three dimensions:

  • Data — Datastores (where data lives) and Data assets (what data is, versioned)
  • Software — Environments (conda/pip/Docker definitions that pin every dependency)
  • Logic — Components (reusable pipeline steps with defined inputs/outputs/code)

Azure ML registries let you share all of these across workspaces — so your dev team’s validated component can be promoted to production without rebuilding.

Datastores: where your data lives

A datastore is a reference to an existing Azure storage service. It doesn’t copy data — it stores connection information so your experiments can access it.

Datastore TypeBacked ByUse Case
Azure Blob StorageBlob containersUnstructured data: images, text files, logs
Azure Data Lake Gen2ADLS Gen2Large-scale structured/semi-structured data, analytics
Azure File ShareAzure FilesShared file systems, legacy file-based workflows
Azure SQL / PostgreSQLDatabasesTabular data, feature stores
from azure.ai.ml.entities import AzureBlobDatastore

# Register a blob datastore
blob_store = AzureBlobDatastore(
    name="training_data",
    account_name="neuralsparkstorage",
    container_name="datasets",
    description="Training datasets for all projects"
)
ml_client.datastores.create_or_update(blob_store)

What’s happening:

  • Lines 1: Import the datastore entity class
  • Lines 4-8: Define a reference to an existing blob container — no data is copied
  • Line 10: Register it in the workspace so experiments can reference it by name
💡 Exam tip: Credential-less datastores

Azure ML supports credential-less datastores using the workspace’s managed identity. This means the datastore doesn’t store any keys or connection strings — it relies on RBAC.

The exam favours this approach. If asked “what is the most secure way to connect a workspace to a storage account,” the answer is: managed identity + RBAC role assignment (e.g., “Storage Blob Data Reader” role).

Data assets: what your data is

A data asset is a versioned reference to specific data. Unlike a datastore (which points to a location), a data asset points to specific files or folders and tracks versions.

Three types of data assets in Azure ML
FeaturePoints ToVersionedBest For
URI FileA single fileYesA specific CSV, parquet, or image file
URI FolderA directoryYesA folder of images, a dataset partition
MLTableTabular data with schemaYesStructured data that needs column types, transforms
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# Create a versioned data asset
training_data = Data(
    name="customer-churn-v2",
    version="2",
    path="azureml://datastores/training_data/paths/churn/2026-04/",
    type=AssetTypes.URI_FOLDER,
    description="April 2026 customer churn dataset (cleaned)"
)
ml_client.data.create_or_update(training_data)

What’s happening:

  • Line 8: Points to a specific path in a registered datastore — versioning means you can always trace which data trained which model
  • Line 9: URI_FOLDER because it’s a directory of files, not a single file

Environments: your software stack

An environment defines the software dependencies for training and inference. It ensures that every run uses exactly the same Python packages, system libraries, and OS.

Environment TypeDefined ByWhen To Use
CuratedMicrosoft-managedQuick start, common frameworks (PyTorch, TensorFlow, sklearn)
Custom (conda)conda.yaml fileWhen you need specific package versions
Custom (Docker)DockerfileWhen you need system-level dependencies or custom base images
# conda.yaml — defines the software stack
name: churn-training-env
channels:
  - conda-forge
dependencies:
  - python=3.10
  - scikit-learn=1.4
  - pandas=2.2
  - pip:
    - azure-ai-ml==1.15.0
    - mlflow==2.12.0
from azure.ai.ml.entities import Environment

env = Environment(
    name="churn-training",
    version="3",
    conda_file="conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
    description="Churn prediction training environment v3"
)
ml_client.environments.create_or_update(env)

What’s happening:

  • The conda YAML pins every dependency to exact versions — no surprises between runs
  • The Docker image provides the OS base, and conda installs Python packages on top
  • Version “3” means you can roll back to v1 or v2 if something breaks
Scenario: Dr. Luca pins his genomics environment

Dr. Luca Bianchi at GenomeVault ran into a nightmare: a scikit-learn update changed how a model handled missing values, silently changing prediction accuracy by 2%. The model passed validation but produced different results in production.

His fix: pin every package version in conda.yaml and version the environment. Now every experiment references genomics-env:v7, and if Prof. Sarah Lin asks “can you reproduce last month’s results?” — the answer is always yes.

Lesson: Curated environments are great for prototyping, but production workloads need custom environments with pinned versions.

Components: reusable pipeline building blocks

A component is a self-contained piece of ML code with defined inputs, outputs, and an environment. Think of it as a function that can be plugged into different pipelines.

# component.yaml — a data preparation step
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: prepare_churn_data
version: "1"
display_name: Prepare Churn Data
type: command
inputs:
  raw_data:
    type: uri_folder
outputs:
  cleaned_data:
    type: uri_folder
code: ./src
environment: azureml:churn-training:3
command: >-
  python prepare.py
  --input $INPUTS.raw_data
  --output $OUTPUTS.cleaned_data

What’s happening:

  • Lines 7-12: Declares typed inputs and outputs — the pipeline knows what this component consumes and produces
  • Line 13: Points to the source code directory
  • Line 14: References a specific environment version
  • Lines 15-18: The actual command, with input/output paths injected by Azure ML

Registries: share across workspaces

An Azure ML registry is a central catalog for sharing assets (models, environments, components, data) across multiple workspaces. This is how you promote a validated component from dev to production without rebuilding it.

# Create a registry
az ml registry create \
  --name neuralspark-registry \
  --resource-group rg-ml-shared \
  --location eastus

# Share a component to the registry
az ml component create \
  --file component.yaml \
  --registry-name neuralspark-registry
💡 Exam tip: Registry vs workspace

A common exam scenario: “How do you share a trained model between the dev and production workspaces?”

Answer: Register the model in an Azure ML registry, then reference it from the production workspace. Registries are workspace-independent — they provide cross-workspace asset sharing with RBAC control.

Don’t confuse this with the workspace model registry (local to one workspace) vs the Azure ML registry (shared across workspaces).

Key terms flashcards

Question

Datastore vs data asset — what's the difference?

Click or press Enter to reveal answer

Answer

A datastore is a connection reference to a storage service (where). A data asset is a versioned pointer to specific data within a datastore (what). Datastores are reused across many data assets.

Click to flip back

Question

What are the three types of data assets?

Click or press Enter to reveal answer

Answer

URI File (single file), URI Folder (directory), and MLTable (tabular data with schema). All are versioned.

Click to flip back

Question

What is an Azure ML registry?

Click or press Enter to reveal answer

Answer

A central catalog for sharing models, environments, components, and data across multiple workspaces. Enables promotion from dev to prod without rebuilding assets.

Click to flip back

Question

Curated vs custom environment — when to use each?

Click or press Enter to reveal answer

Answer

Curated: quick prototyping with common frameworks. Custom (conda/Docker): production workloads where you need pinned versions for reproducibility.

Click to flip back

Knowledge check

Knowledge Check

Dr. Luca needs to ensure that his genomics training pipeline uses exactly the same Python packages every time it runs, even months later. What should he use?

Knowledge Check

Kai's NeuralSpark team has a data preparation component that works perfectly in the dev workspace. He needs to use the same component in the production workspace without rebuilding it. What should he use?

Knowledge Check

Dr. Fatima at Meridian Financial needs to connect a workspace to a storage account without storing any credentials. What is the recommended approach?

🎬 Video coming soon


Next up: Compute Targets — choosing the right engine for training vs inference.

← Previous

ML Workspace: Your AI Control Room

Next →

Compute Targets: Choosing the Right Engine

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.