Foundry Model Catalog and Application Insights

Part 1: The Foundry Model Catalog

Simple explanation

Think of the model catalog like a car dealership.

You would not buy a sports car to deliver furniture, and you would not buy a delivery van for a race. AI models are the same — some are fast and cheap (great for simple FAQs), others are powerful and expensive (needed for complex reasoning). The Foundry model catalog is your dealership: hundreds of models from Microsoft, OpenAI, Meta, and others, each with different strengths.

For Copilot Studio, the model catalog matters when you create custom prompts — special AI instructions that run inside your agent’s topics. Instead of using the default model, you pick the model that best fits each task.

Application Insights is your agent’s dashboard — it tracks every conversation, measures response times, catches errors, and shows you how your agent performs in the real world.

Choosing the right model

Not all AI models are equal. The exam expects you to match model characteristics to use cases.

Foundry model catalog — key models compared
Feature	Cost	Speed	Capability	Best for
GPT-4o	Higher — premium per-token pricing	Moderate — 10-30 seconds for complex reasoning	Highest — complex reasoning, nuanced understanding, multi-step analysis, medical/legal/financial domains	High-stakes tasks: clinical decision support, contract analysis, complex troubleshooting
GPT-4o mini	Low — fraction of GPT-4o cost	Fast — typically under 5 seconds	Good — handles most business tasks well, strong at summarisation and classification	High-volume tasks: FAQ answers, ticket classification, simple summarisation, routing logic
Phi (small language model)	Lowest — designed for cost efficiency at scale	Fastest — sub-second for simple tasks	Moderate — strong for structured tasks, weaker on open-ended reasoning	Edge deployment, high-throughput classification, cost-sensitive scenarios with thousands of daily calls
Llama (Meta)	Varies by size — competitive with GPT-4o mini	Varies — depends on model size (8B, 70B, 405B)	Strong — open-source, good at general reasoning, code generation	Teams preferring open-source models, specific compliance requirements, code-heavy tasks

Custom prompts with model selection

A custom prompt in Copilot Studio is a prompt node within a topic that sends a specific instruction to an AI model and returns the result. By connecting to the Foundry model catalog, you can choose which model processes each prompt.

Configuration steps:

Connect your Copilot Studio environment to Foundry — this enables model catalog access
In a topic, add a Prompt node (also called a “Create prompt” or “AI Builder prompt” action)
Select the Foundry model — choose from the catalog based on your task requirements
Write the prompt instruction — what the model should do with the input (e.g., “Classify this ticket as urgent, normal, or low priority”)
Map input variables — pass conversation variables into the prompt
Map output variables — capture the model’s response for use in subsequent nodes
Test with sample inputs — verify the model produces expected outputs

Why not just use the default model for everything?

Custom prompts with specific models give three advantages: cost optimization (GPT-4o mini for simple tasks, GPT-4o only for complex reasoning — can cut AI costs 60-80%), task-specific accuracy (some models excel at certain tasks), and compliance (choose models deployed in specific Azure regions for data residency requirements).

Part 2: Monitoring with Application Insights

Building an agent is half the job. The other half is knowing whether it actually works in production. Application Insights provides the observability layer.

Connecting Application Insights to your agent:

Create an Application Insights resource in Azure (or use an existing one)
In Copilot Studio, go to Settings then Agent settings then Application Insights
Paste the connection string from your Application Insights resource
Save and publish — telemetry starts flowing within minutes

Key telemetry captured:

Metric	What it tells you	Why it matters
Session count	How many conversations happen per day/week	Adoption tracking — is the agent being used?
Topic completion rate	Percentage of topic starts that reach the end node	Quality signal — incomplete topics suggest confusion or errors
Resolution rate	Percentage of sessions resolved without human escalation	Effectiveness — the agent’s core success metric
Escalation rate	How often conversations transfer to a human	Capacity planning — high escalation means the agent needs improvement
Average response time	How long the agent takes to respond	User experience — slow responses increase abandonment
Error rate	Failed connector calls, timeout errors, unhandled exceptions	Reliability — errors need immediate investigation

KQL query examples for agent monitoring

Application Insights data is queried using KQL (Kusto Query Language). Common queries include sessions per day (summarize dcount(session_Id) by bin(timestamp, 1d)), top escalated topics, and average response latency.

You do not need to memorise KQL syntax for the exam — but knowing that Application Insights enables query-driven monitoring is testable.

Scenario: Lena picks models and sets up monitoring

Lena’s hospital agent handles two workloads: clinical decision support (complex medical questions requiring accuracy) and general FAQ (parking, cafeteria, IT password resets — speed and cost matter more).

For clinical custom prompts she selects GPT-4o — its reasoning accuracy on medical terminology is worth the premium. For FAQ prompts she picks GPT-4o mini — simple Q&A at a fraction of the cost, under 3 seconds.

She connects Application Insights and builds dashboards tracking clinical accuracy, daily usage, and cost per model. After the first week, data reveals 40% of “clinical” queries are simple medication lookups. She adds a classification prompt (GPT-4o mini) that routes simple lookups to the cheaper model. Cost drops 35% with no accuracy impact. Data-driven model optimization in action.

Exam tip: model selection is about matching cost and capability to the task

The exam will describe scenarios and ask which model to choose. The decision framework is simple:

Complex reasoning, high stakes → GPT-4o (or the most capable model available)
Simple tasks, high volume → GPT-4o mini (good balance of cost and capability)
Maximum cost efficiency, structured tasks → Phi (smallest, cheapest, fastest)
Open-source requirement → Llama

If the scenario mentions “thousands of daily requests” and “simple classification,” the answer is almost always GPT-4o mini or Phi — not GPT-4o.

Question

What is the Foundry model catalog?