Deploying Models & CI/CD

Deploying models and agents

Simple explanation

Deploying a model is like installing an app on a server — you pick the version, configure the settings, and make it available to users.

In Foundry, you choose a model from the catalog, give it a deployment name, set capacity limits, and it gets an API endpoint. Same for agents — you define the agent, deploy it, and it gets an endpoint your app can call.

CI/CD means automating this process so every code change is tested and deployed automatically — no manual clicking in the portal.

Model deployment configuration

When deploying a model in Foundry, you configure:

Setting	What It Controls	Example
Deployment name	The identifier your app uses to call this model	”gpt4o-prod”, “phi4-staging”
Model version	Which version of the model to use	GPT-4o 2024-11-20
Deployment type	Serverless or provisioned throughput	Provisioned for production
Rate limit / TPM	Maximum tokens per minute	80,000 TPM for prod
Content filter	Which safety filters to apply	Default or custom configuration
Region	Where the model runs	East US 2

Exam tip: Deployment names matter

Your application code references the deployment name, not the model name. This means you can swap model versions (GPT-4o to GPT-4.1) without changing application code — just update the deployment to point to the new model version.

The exam may test this pattern: “How can you upgrade a model version without modifying application code?” Answer: Update the model version on the existing deployment name.

Agent deployment

Agents deploy differently from raw models. An agent deployment includes:

Component	What Gets Deployed
Agent definition	Instructions, model reference, tool schemas
Tool connections	API endpoints, function definitions, knowledge sources
Configuration	Temperature, max tokens, safety settings
Version	Agent version for rollback capability

CI/CD for AI solutions

AI development with and without CI/CD
Feature	Without CI/CD	With CI/CD
Deploy process	Manual portal clicks	Automated pipeline triggered by git push
Testing	Hope it works	Automated evaluation (quality, safety, groundedness)
Consistency	Different every time	Identical across environments
Rollback	Manually redeploy old version	One-click or automatic on failure
Audit trail	Who changed what? Good luck	Full git history + pipeline logs

CI/CD pipeline stages for AI

Stage	What Happens	Tools
Build	Package application code and agent definitions	GitHub Actions, Azure DevOps
Evaluate	Run automated quality, safety, and groundedness tests	Foundry Evaluation SDK
Deploy to staging	Push to staging Foundry Project	Azure CLI, Foundry SDK
Integration test	Verify end-to-end with real API calls	pytest, custom test suites
Promote to production	Deploy to prod after approval	Manual gate or auto-promote
Monitor	Watch for drift, errors, safety events	Azure Monitor, Foundry tracing

Real-world example: Kai's CI/CD pipeline

Kai sets up a GitHub Actions pipeline for the logistics AI platform:

On pull request: Run Foundry evaluations against test scenarios (20 predefined questions + expected answers)
On merge to main: Deploy to staging Foundry Project, run integration tests
Manual approval: Team lead reviews evaluation scores before production
On approval: Deploy to production, update model deployment, run smoke tests
On failure: Auto-rollback to previous deployment version

The whole process runs in under 15 minutes. No portal clicking required.

Key terms

Question

What is a model deployment name?

Click or press Enter to reveal answer

Answer

The identifier your application uses to call a deployed model. By referencing the deployment name (not the model name), you can swap model versions without changing application code.

Click to flip back

Question

What is provisioned throughput measured in?

Click or press Enter to reveal answer

Answer

Provisioned Throughput Units (PTU). You reserve a fixed number of PTUs, which guarantee a model-specific Tokens Per Minute (TPM) rate and consistent latency for production workloads.

Click to flip back

Question

How does CI/CD work for AI solutions?

Click or press Enter to reveal answer

Answer

Automated pipelines that build, evaluate (quality/safety/groundedness), deploy to staging, run integration tests, and promote to production. Uses Foundry SDK and evaluation framework for AI-specific testing.

Click to flip back

Knowledge check

Knowledge Check

MediaForge wants to upgrade their content generation model from GPT-4o to GPT-4.1 without changing any application code. What should they do?

Knowledge Check

Atlas Financial's compliance team requires that every AI model change is auditable and can be rolled back within 5 minutes. Which practice best supports this?