Deploying Models & CI/CD
Models don't deploy themselves. Learn how to configure model and agent deployments in Foundry, and integrate your AI projects into CI/CD pipelines for repeatable, reliable releases.
Deploying models and agents
Deploying a model is like installing an app on a server β you pick the version, configure the settings, and make it available to users.
In Foundry, you choose a model from the catalog, give it a deployment name, set capacity limits, and it gets an API endpoint. Same for agents β you define the agent, deploy it, and it gets an endpoint your app can call.
CI/CD means automating this process so every code change is tested and deployed automatically β no manual clicking in the portal.
Model deployment configuration
When deploying a model in Foundry, you configure:
| Setting | What It Controls | Example |
|---|---|---|
| Deployment name | The identifier your app uses to call this model | βgpt4o-prodβ, βphi4-stagingβ |
| Model version | Which version of the model to use | GPT-4o 2024-11-20 |
| Deployment type | Serverless or provisioned throughput | Provisioned for production |
| Rate limit / TPM | Maximum tokens per minute | 80,000 TPM for prod |
| Content filter | Which safety filters to apply | Default or custom configuration |
| Region | Where the model runs | East US 2 |
Exam tip: Deployment names matter
Your application code references the deployment name, not the model name. This means you can swap model versions (GPT-4o to GPT-4.1) without changing application code β just update the deployment to point to the new model version.
The exam may test this pattern: βHow can you upgrade a model version without modifying application code?β Answer: Update the model version on the existing deployment name.
Agent deployment
Agents deploy differently from raw models. An agent deployment includes:
| Component | What Gets Deployed |
|---|---|
| Agent definition | Instructions, model reference, tool schemas |
| Tool connections | API endpoints, function definitions, knowledge sources |
| Configuration | Temperature, max tokens, safety settings |
| Version | Agent version for rollback capability |
CI/CD for AI solutions
| Feature | Without CI/CD | With CI/CD |
|---|---|---|
| Deploy process | Manual portal clicks | Automated pipeline triggered by git push |
| Testing | Hope it works | Automated evaluation (quality, safety, groundedness) |
| Consistency | Different every time | Identical across environments |
| Rollback | Manually redeploy old version | One-click or automatic on failure |
| Audit trail | Who changed what? Good luck | Full git history + pipeline logs |
CI/CD pipeline stages for AI
| Stage | What Happens | Tools |
|---|---|---|
| Build | Package application code and agent definitions | GitHub Actions, Azure DevOps |
| Evaluate | Run automated quality, safety, and groundedness tests | Foundry Evaluation SDK |
| Deploy to staging | Push to staging Foundry Project | Azure CLI, Foundry SDK |
| Integration test | Verify end-to-end with real API calls | pytest, custom test suites |
| Promote to production | Deploy to prod after approval | Manual gate or auto-promote |
| Monitor | Watch for drift, errors, safety events | Azure Monitor, Foundry tracing |
Real-world example: Kai's CI/CD pipeline
Kai sets up a GitHub Actions pipeline for the logistics AI platform:
- On pull request: Run Foundry evaluations against test scenarios (20 predefined questions + expected answers)
- On merge to main: Deploy to staging Foundry Project, run integration tests
- Manual approval: Team lead reviews evaluation scores before production
- On approval: Deploy to production, update model deployment, run smoke tests
- On failure: Auto-rollback to previous deployment version
The whole process runs in under 15 minutes. No portal clicking required.
Key terms
Knowledge check
MediaForge wants to upgrade their content generation model from GPT-4o to GPT-4.1 without changing any application code. What should they do?
Atlas Financial's compliance team requires that every AI model change is auditable and can be rolled back within 5 minutes. Which practice best supports this?
π¬ Video coming soon