ALM for Microsoft Foundry Agents

Foundry ALM is code-first

Simple explanation

If Copilot Studio ALM is like shipping sealed containers, Foundry ALM is like managing a software factory. Everything lives in Git — your agent code, your prompt flows, your model training scripts. Deployments happen through CI/CD pipelines, just like traditional software.

The big addition: models are deployable artefacts with their own lifecycle. You version them, test them, stage them, and promote them — just like you would with application code.

Copilot Studio vs Foundry ALM

Choose the right ALM approach based on the platform
Feature	Copilot Studio ALM	Foundry ALM	When to Use
Artefact storage	Power Platform solutions in Dataverse	Code in Git repositories	Copilot Studio for low-code agents. Foundry for code-first agents and custom models.
Version control	Solution versioning (major.minor.build.revision)	Git commits, branches, tags	Copilot Studio versions solutions as packages. Foundry versions everything as code.
Deployment tool	Power Platform Pipelines or Azure DevOps with solution tasks	GitHub Actions, Azure DevOps Pipelines, or Azure CLI	Copilot Studio uses solution import. Foundry uses standard deployment tooling.
Environment config	Environment variables and connection references	Infrastructure as Code parameters, environment files, Key Vault references	Same concept, different mechanisms.
Model management	Not applicable — Microsoft manages the models	Model registry with versioning, staging, and promotion	Foundry gives you full control over model lifecycle.
Testing approach	Manual testing plus solution checker	Automated evaluation pipelines with quality gates	Foundry supports automated quality gates in CI/CD.

Model registry and lifecycle

The model registry is central to Foundry ALM. It tracks every model version and its metadata:

Stage	What Happens	Key Artefacts
Training	Model trained on prepared data using training scripts	Training script, hyperparameters, training data version
Evaluation	Model tested against evaluation dataset	Evaluation metrics (accuracy, precision, recall, F1), evaluation dataset version
Registration	Model registered in the registry with version and metadata	Model artefact, model card (description, intended use, limitations)
Staging	Model deployed to a staging endpoint for integration testing	Staging endpoint URL, integration test results
Production	Model promoted to production endpoint	Production endpoint URL, traffic routing configuration
Monitoring	Model performance tracked in production	Performance metrics, data drift alerts, feedback data
Retraining	Model retrained when performance degrades	New training data, updated training script, retraining trigger

Prompt flow versioning

Prompt flows in Foundry (classic) are stored as YAML and Python files — making them fully version-controllable. Note that prompt flow is associated with the classic Foundry experience; current Foundry capabilities are evolving, but the ALM principles remain the same:

Flow definition (YAML) — defines the steps, inputs, outputs, and connections
Node implementations (Python) — custom logic for each step in the flow
Environment parameters — connection strings, model endpoints, API keys stored in environment-specific config
Evaluation flows — separate flows that test the quality of the main flow’s outputs

All of these live in Git. Every change creates a commit. Every deployment references a specific commit SHA.

Scenario: Ravi builds a CI/CD pipeline for Vanguard's credit risk model

Ravi Krishnan at Cloudbridge Partners sets up automated ALM for Vanguard’s credit risk model:

Git repository structure:

/models/credit-risk/ — training scripts, evaluation scripts, model configuration
/flows/credit-assessment/ — prompt flow YAML and Python nodes
/infra/ — Bicep templates for model endpoints and compute
/tests/ — integration tests and evaluation datasets

GitHub Actions pipeline (runs monthly):

Data preparation — pull latest financial data, apply transformations, version the dataset
Training — run the training script on GPU compute with the new data
Evaluation — run the evaluation flow against a held-out test set
Quality gate — if accuracy is below 90% or fairness metrics fail, the pipeline stops and alerts the team
Registration — register the new model version in the Foundry model registry
Canary deployment — deploy to staging, route 10% of traffic to the new model
A/B comparison — compare new model performance against the production baseline for 48 hours
Promotion or rollback — if A/B results pass thresholds, promote to 100%. Otherwise, roll back to the baseline.

Key design decision: Ravi parameterises the pipeline so it works across environments. Dev uses a smaller dataset and cheaper compute. Production uses the full dataset and production-grade compute. Same pipeline code, different parameters.

Exam tip: Foundry treats models as first-class deployable artefacts

The exam expects you to understand that in Foundry:

Models have their own CI/CD — separate from application code. Model training, evaluation, and deployment is a pipeline, not a manual process.
Model versions are immutable — once registered, a model version cannot be modified. You create a new version instead.
A/B testing is expected — canary deployments that compare new models against baselines are a standard pattern, not an advanced technique.
Prompt flows are code — they live in Git, have commit history, and deploy through pipelines. Do not confuse them with Copilot Studio topics (which are solution components).
Infrastructure as Code — model endpoints, compute resources, and networking are provisioned through Bicep or Terraform, not manual portal configuration.

Flashcards

Question

How does Foundry ALM differ from Copilot Studio ALM?

Click or press Enter to reveal answer

Answer

Foundry is code-first: artefacts stored in Git, deployed via CI/CD pipelines, with model registry for version management. Copilot Studio is solution-based: artefacts packaged in Power Platform solutions, deployed via Pipelines or solution import.

Click to flip back

Question

What is a model registry and why is it important?

Click or press Enter to reveal answer

Answer

A model registry tracks model versions with metadata (training data version, evaluation metrics, model card). It enables promotion from staging to production, rollback to previous versions, and audit trails for regulated industries.

Click to flip back

Question

What is canary deployment for AI models?

Click or press Enter to reveal answer

Answer

A deployment pattern where a new model version receives a small percentage of production traffic (e.g. 10%) while the baseline model handles the rest. Performance is compared over a set period. If the new model meets thresholds, it is promoted to 100%. Otherwise, it is rolled back.

Click to flip back

Question

How are prompt flows version-controlled in Foundry?

Click or press Enter to reveal answer

Answer

Prompt flows are stored as YAML definitions and Python node implementations in Git. Every change creates a commit. Deployments reference a specific commit SHA, enabling full traceability and rollback.

Click to flip back

Knowledge check

Knowledge Check

Dev Patel needs to deploy a retrained credit risk model to production. The model was trained on new data and shows improved accuracy in evaluation. What is the recommended deployment approach?

Knowledge Check

An architect proposes storing Foundry prompt flows in a SharePoint document library for version control. What is wrong with this approach?

Next up: ALM for D365 AI Features — managing AI feature rollouts in Dynamics 365 Finance, Supply Chain, Customer Service, and Sales.