Foundry Model Catalog and Application Insights
Choose the right AI model from the Foundry model catalog for custom prompts, and monitor your agents with Application Insights telemetry.
Part 1: The Foundry Model Catalog
Think of the model catalog like a car dealership.
You would not buy a sports car to deliver furniture, and you would not buy a delivery van for a race. AI models are the same — some are fast and cheap (great for simple FAQs), others are powerful and expensive (needed for complex reasoning). The Foundry model catalog is your dealership: hundreds of models from Microsoft, OpenAI, Meta, and others, each with different strengths.
For Copilot Studio, the model catalog matters when you create custom prompts — special AI instructions that run inside your agent’s topics. Instead of using the default model, you pick the model that best fits each task.
Application Insights is your agent’s dashboard — it tracks every conversation, measures response times, catches errors, and shows you how your agent performs in the real world.
Choosing the right model
Not all AI models are equal. The exam expects you to match model characteristics to use cases.
| Feature | Cost | Speed | Capability | Best for |
|---|---|---|---|---|
| GPT-4o | Higher — premium per-token pricing | Moderate — 10-30 seconds for complex reasoning | Highest — complex reasoning, nuanced understanding, multi-step analysis, medical/legal/financial domains | High-stakes tasks: clinical decision support, contract analysis, complex troubleshooting |
| GPT-4o mini | Low — fraction of GPT-4o cost | Fast — typically under 5 seconds | Good — handles most business tasks well, strong at summarisation and classification | High-volume tasks: FAQ answers, ticket classification, simple summarisation, routing logic |
| Phi (small language model) | Lowest — designed for cost efficiency at scale | Fastest — sub-second for simple tasks | Moderate — strong for structured tasks, weaker on open-ended reasoning | Edge deployment, high-throughput classification, cost-sensitive scenarios with thousands of daily calls |
| Llama (Meta) | Varies by size — competitive with GPT-4o mini | Varies — depends on model size (8B, 70B, 405B) | Strong — open-source, good at general reasoning, code generation | Teams preferring open-source models, specific compliance requirements, code-heavy tasks |
Custom prompts with model selection
A custom prompt in Copilot Studio is a prompt node within a topic that sends a specific instruction to an AI model and returns the result. By connecting to the Foundry model catalog, you can choose which model processes each prompt.
Configuration steps:
- Connect your Copilot Studio environment to Foundry — this enables model catalog access
- In a topic, add a Prompt node (also called a “Create prompt” or “AI Builder prompt” action)
- Select the Foundry model — choose from the catalog based on your task requirements
- Write the prompt instruction — what the model should do with the input (e.g., “Classify this ticket as urgent, normal, or low priority”)
- Map input variables — pass conversation variables into the prompt
- Map output variables — capture the model’s response for use in subsequent nodes
- Test with sample inputs — verify the model produces expected outputs
Why not just use the default model for everything?
Custom prompts with specific models give three advantages: cost optimization (GPT-4o mini for simple tasks, GPT-4o only for complex reasoning — can cut AI costs 60-80%), task-specific accuracy (some models excel at certain tasks), and compliance (choose models deployed in specific Azure regions for data residency requirements).
Part 2: Monitoring with Application Insights
Building an agent is half the job. The other half is knowing whether it actually works in production. Application Insights provides the observability layer.
Connecting Application Insights to your agent:
- Create an Application Insights resource in Azure (or use an existing one)
- In Copilot Studio, go to Settings then Agent settings then Application Insights
- Paste the connection string from your Application Insights resource
- Save and publish — telemetry starts flowing within minutes
Key telemetry captured:
| Metric | What it tells you | Why it matters |
|---|---|---|
| Session count | How many conversations happen per day/week | Adoption tracking — is the agent being used? |
| Topic completion rate | Percentage of topic starts that reach the end node | Quality signal — incomplete topics suggest confusion or errors |
| Resolution rate | Percentage of sessions resolved without human escalation | Effectiveness — the agent’s core success metric |
| Escalation rate | How often conversations transfer to a human | Capacity planning — high escalation means the agent needs improvement |
| Average response time | How long the agent takes to respond | User experience — slow responses increase abandonment |
| Error rate | Failed connector calls, timeout errors, unhandled exceptions | Reliability — errors need immediate investigation |
KQL query examples for agent monitoring
Application Insights data is queried using KQL (Kusto Query Language). Common queries include sessions per day (summarize dcount(session_Id) by bin(timestamp, 1d)), top escalated topics, and average response latency.
You do not need to memorise KQL syntax for the exam — but knowing that Application Insights enables query-driven monitoring is testable.
Scenario: Lena picks models and sets up monitoring
Lena’s hospital agent handles two workloads: clinical decision support (complex medical questions requiring accuracy) and general FAQ (parking, cafeteria, IT password resets — speed and cost matter more).
For clinical custom prompts she selects GPT-4o — its reasoning accuracy on medical terminology is worth the premium. For FAQ prompts she picks GPT-4o mini — simple Q&A at a fraction of the cost, under 3 seconds.
She connects Application Insights and builds dashboards tracking clinical accuracy, daily usage, and cost per model. After the first week, data reveals 40% of “clinical” queries are simple medication lookups. She adds a classification prompt (GPT-4o mini) that routes simple lookups to the cheaper model. Cost drops 35% with no accuracy impact. Data-driven model optimization in action.
Exam tip: model selection is about matching cost and capability to the task
The exam will describe scenarios and ask which model to choose. The decision framework is simple:
- Complex reasoning, high stakes → GPT-4o (or the most capable model available)
- Simple tasks, high volume → GPT-4o mini (good balance of cost and capability)
- Maximum cost efficiency, structured tasks → Phi (smallest, cheapest, fastest)
- Open-source requirement → Llama
If the scenario mentions “thousands of daily requests” and “simple classification,” the answer is almost always GPT-4o mini or Phi — not GPT-4o.
Lena's agent handles thousands of simple FAQ questions daily and a few dozen complex clinical queries. How should she configure model selection?
After connecting Application Insights, which metric best indicates that the agent is failing to help users?
What is the primary benefit of using the Foundry model catalog with custom prompts instead of the default Copilot Studio model?
🎬 Video coming soon
Foundry Model Catalog and Application Insights