ALM for Microsoft Foundry Agents
Design code-first ALM for Foundry agents and custom AI models β Git-based version control, CI/CD pipelines, model registries, and automated evaluation.
Foundry ALM is code-first
If Copilot Studio ALM is like shipping sealed containers, Foundry ALM is like managing a software factory. Everything lives in Git β your agent code, your prompt flows, your model training scripts. Deployments happen through CI/CD pipelines, just like traditional software.
The big addition: models are deployable artefacts with their own lifecycle. You version them, test them, stage them, and promote them β just like you would with application code.
Copilot Studio vs Foundry ALM
| Feature | Copilot Studio ALM | Foundry ALM | When to Use |
|---|---|---|---|
| Artefact storage | Power Platform solutions in Dataverse | Code in Git repositories | Copilot Studio for low-code agents. Foundry for code-first agents and custom models. |
| Version control | Solution versioning (major.minor.build.revision) | Git commits, branches, tags | Copilot Studio versions solutions as packages. Foundry versions everything as code. |
| Deployment tool | Power Platform Pipelines or Azure DevOps with solution tasks | GitHub Actions, Azure DevOps Pipelines, or Azure CLI | Copilot Studio uses solution import. Foundry uses standard deployment tooling. |
| Environment config | Environment variables and connection references | Infrastructure as Code parameters, environment files, Key Vault references | Same concept, different mechanisms. |
| Model management | Not applicable β Microsoft manages the models | Model registry with versioning, staging, and promotion | Foundry gives you full control over model lifecycle. |
| Testing approach | Manual testing plus solution checker | Automated evaluation pipelines with quality gates | Foundry supports automated quality gates in CI/CD. |
Model registry and lifecycle
The model registry is central to Foundry ALM. It tracks every model version and its metadata:
| Stage | What Happens | Key Artefacts |
|---|---|---|
| Training | Model trained on prepared data using training scripts | Training script, hyperparameters, training data version |
| Evaluation | Model tested against evaluation dataset | Evaluation metrics (accuracy, precision, recall, F1), evaluation dataset version |
| Registration | Model registered in the registry with version and metadata | Model artefact, model card (description, intended use, limitations) |
| Staging | Model deployed to a staging endpoint for integration testing | Staging endpoint URL, integration test results |
| Production | Model promoted to production endpoint | Production endpoint URL, traffic routing configuration |
| Monitoring | Model performance tracked in production | Performance metrics, data drift alerts, feedback data |
| Retraining | Model retrained when performance degrades | New training data, updated training script, retraining trigger |
Prompt flow versioning
Prompt flows in Foundry (classic) are stored as YAML and Python files β making them fully version-controllable. Note that prompt flow is associated with the classic Foundry experience; current Foundry capabilities are evolving, but the ALM principles remain the same:
- Flow definition (YAML) β defines the steps, inputs, outputs, and connections
- Node implementations (Python) β custom logic for each step in the flow
- Environment parameters β connection strings, model endpoints, API keys stored in environment-specific config
- Evaluation flows β separate flows that test the quality of the main flowβs outputs
All of these live in Git. Every change creates a commit. Every deployment references a specific commit SHA.
Scenario: Ravi builds a CI/CD pipeline for Vanguard's credit risk model
Ravi Krishnan at Cloudbridge Partners sets up automated ALM for Vanguardβs credit risk model:
Git repository structure:
- /models/credit-risk/ β training scripts, evaluation scripts, model configuration
- /flows/credit-assessment/ β prompt flow YAML and Python nodes
- /infra/ β Bicep templates for model endpoints and compute
- /tests/ β integration tests and evaluation datasets
GitHub Actions pipeline (runs monthly):
- Data preparation β pull latest financial data, apply transformations, version the dataset
- Training β run the training script on GPU compute with the new data
- Evaluation β run the evaluation flow against a held-out test set
- Quality gate β if accuracy is below 90% or fairness metrics fail, the pipeline stops and alerts the team
- Registration β register the new model version in the Foundry model registry
- Canary deployment β deploy to staging, route 10% of traffic to the new model
- A/B comparison β compare new model performance against the production baseline for 48 hours
- Promotion or rollback β if A/B results pass thresholds, promote to 100%. Otherwise, roll back to the baseline.
Key design decision: Ravi parameterises the pipeline so it works across environments. Dev uses a smaller dataset and cheaper compute. Production uses the full dataset and production-grade compute. Same pipeline code, different parameters.
Exam tip: Foundry treats models as first-class deployable artefacts
The exam expects you to understand that in Foundry:
- Models have their own CI/CD β separate from application code. Model training, evaluation, and deployment is a pipeline, not a manual process.
- Model versions are immutable β once registered, a model version cannot be modified. You create a new version instead.
- A/B testing is expected β canary deployments that compare new models against baselines are a standard pattern, not an advanced technique.
- Prompt flows are code β they live in Git, have commit history, and deploy through pipelines. Do not confuse them with Copilot Studio topics (which are solution components).
- Infrastructure as Code β model endpoints, compute resources, and networking are provisioned through Bicep or Terraform, not manual portal configuration.
Flashcards
Knowledge check
Dev Patel needs to deploy a retrained credit risk model to production. The model was trained on new data and shows improved accuracy in evaluation. What is the recommended deployment approach?
An architect proposes storing Foundry prompt flows in a SharePoint document library for version control. What is wrong with this approach?
π¬ Video coming soon
Next up: ALM for D365 AI Features β managing AI feature rollouts in Dynamics 365 Finance, Supply Chain, Customer Service, and Sales.