🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AB-100 Domain 3
Domain 3 — Module 2 of 13 15%
18 of 29 overall

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free
Domain 3: Deploy AI-Powered Business Solutions Premium ⏱ ~14 min read

Telemetry Interpretation and Agent Tuning

Interpret telemetry data, analyse user feedback backlogs, and apply AI-based tools to identify issues and tune agent performance.

Telemetry Interpretation and Agent Tuning

☕ Simple explanation

Imagine you own a restaurant. You’ve got security cameras, receipt data, and comment cards. The cameras show you what’s happening — long queues, empty tables, confused waiters. The receipts show you what’s selling. The comment cards tell you how people feel.

But raw footage and receipts don’t fix anything. You need to interpret them. Why are queues forming at 6 PM? Because the kitchen is slow on pasta orders. Now you can fix it — retrain the pasta chef, simplify the menu, or add a second station.

Telemetry is your agent’s security camera, receipts, and comment cards. Tuning is the fix you apply after you understand the problem.

Telemetry interpretation sits between monitoring (collecting data) and improvement (applying changes). The skill tested on the exam is the analytical process: how to read conversation logs, correlate latency traces with error spikes, segment user feedback into actionable categories, and choose the right tuning lever.

There are four distinct tuning levers: prompt tuning (changing instructions), knowledge source tuning (updating grounding data), model fine-tuning (retraining with custom data), and flow redesign (restructuring the conversation logic). Each addresses different failure modes. The exam expects you to match the failure pattern to the correct tuning approach.

The Scenario

🤖 Jordan Reeves reviews the monitoring dashboards Sam built last sprint. The patient scheduling agent has a 72 percent resolution rate — below the 75 percent target. Worse, Jordan notices a pattern: conversations containing the word “reschedule” fail 30 percent of the time, compared to 8 percent for new appointments.

Jordan needs to dig into telemetry, figure out why, and tune the agent.

Telemetry Data Types

Every agent generates multiple telemetry streams. Understanding what each one tells you is the foundation of effective tuning:

Data TypeWhat It ContainsWhat It RevealsWhere to Find It
Conversation logsFull transcript of user-agent exchangesExact failure points, misunderstood intents, topic gapsCopilot Studio Analytics
Completion metricsToken usage, response confidence scores, topic match scoresWhether the agent is confident in its responses or guessingApplication Insights custom events
Latency tracesEnd-to-end timing for each conversation turnBottlenecks in knowledge retrieval, API calls, or model inferenceApplication Insights dependency tracking
Error logsSystem errors, connector failures, timeout eventsInfrastructure issues vs logic issuesApplication Insights exceptions
User satisfaction scoresCSAT ratings, thumbs up/down, free-text feedbackHow users feel about the experience, independent of resolutionCopilot Studio Analytics and custom surveys

The Four Tuning Levers

When telemetry reveals a problem, you need to pick the right fix. Using the wrong lever wastes time and may not solve the issue:

AspectPrompt TuningKnowledge Source TuningModel Fine-TuningFlow Redesign
What You ChangeSystem instructions, few-shot examples, guardrailsDocuments, data sources, grounding contentThe underlying model with custom training dataConversation flow structure, branching logic, escalation rules
When to UseAgent misunderstands intent or gives wrong toneAgent lacks information or cites outdated dataAgent consistently fails on domain-specific languageAgent follows the wrong path or loops in conversation
Effort LevelLow — minutes to hoursLow to Medium — update and re-indexHigh — requires labelled data and computeMedium — requires flow testing after changes
Risk LevelLow — easy to revertLow — content swapMedium — may affect other scenariosMedium — may break existing paths
Example FixAdd instruction: treat reschedule as a modification, not a cancellationAdd rescheduling policy document to knowledge baseFine-tune on 5,000 healthcare scheduling conversationsAdd explicit reschedule branch before the general booking flow

Feedback Backlog Analysis

User feedback is gold — but only if you mine it systematically. Here’s how Jordan processes the feedback backlog:

Step 1: Collect and Centralise

Pull feedback from all sources into one place: CSAT scores from Copilot Studio, thumbs-down transcripts, support tickets that mention the agent, and direct user comments. Jordan uses a Power Automate flow to pipe all of these into a Dataverse table.

Step 2: Categorise by Issue Type

Jordan tags each piece of feedback:

  • Intent misunderstanding — the agent didn’t understand what the user wanted
  • Incorrect information — the agent gave a wrong answer
  • Slow response — the agent took too long
  • Tone or style — the response was technically correct but felt robotic or unhelpful
  • Missing capability — the user wanted something the agent can’t do yet

Step 3: Identify Patterns

Sorting by category, Jordan sees that 45 percent of negative feedback falls under “intent misunderstanding,” and 60 percent of those involve rescheduling. The pattern is clear.

Step 4: Prioritise Fixes

Not every issue is worth fixing immediately. Jordan uses a simple impact matrix: frequency of the issue multiplied by severity. Rescheduling failures are high-frequency and high-severity (patients miss appointments), so it goes to the top of the list.

💡

Exam Tip: The exam tests your ability to describe the process for using telemetry, not just name the tools. A common question pattern: “A solution architect notices that agent accuracy has declined. What should they do FIRST?” The answer is almost always “analyse telemetry to identify the root cause” — not “retrain the model” or “rewrite the prompt.”

AI-Based Analysis Tools

You don’t have to do all the analysis manually. Several AI-powered tools can accelerate the process:

Azure AI Foundry Evaluation — Run evaluation pipelines against conversation logs. Define metrics like groundedness (did the response stick to source material?), relevance (did it answer the question asked?), and coherence (did it make sense?). Foundry scores each conversation and flags outliers.

Automated Regression Testing — After every tuning change, run the same set of test conversations through the agent. Compare results to the baseline. If the rescheduling fix improved rescheduling but broke new appointments, you catch it immediately.

Anomaly Detection on Metrics — Azure Monitor can detect unusual patterns in time-series data. If error rate suddenly doubles at 2 AM, anomaly detection flags it even if it’s below your static alert threshold. This catches gradual degradation that static thresholds miss.

💡

Deep Dive: Foundry evaluation uses a “judge” model to score responses. The judge model compares the agent’s response against the grounding source and the user’s question. Scores range from 1 to 5 for each dimension. A common tuning workflow: run evaluation, filter for scores below 3, review those conversations manually, then apply the appropriate tuning lever.

Applying the Fix

Jordan determines the rescheduling issue is an intent misunderstanding — the agent treats “reschedule” as a synonym for “cancel.” The fix requires two tuning levers:

  1. Prompt tuning — Add an explicit instruction: “When a user says reschedule, modify, change, or move their appointment, treat this as a modification request, not a cancellation. Confirm the existing appointment details before offering new time slots.”

  2. Flow redesign — Add a dedicated “Reschedule” topic in Copilot Studio that triggers on reschedule-related phrases, separate from the cancellation flow.

After deploying the fix, Jordan runs a regression test with 50 rescheduling scenarios and 50 new-appointment scenarios. Rescheduling accuracy jumps from 70 percent to 93 percent. New appointments remain at 92 percent. The fix is validated.

Flashcards

Question

What are the four tuning levers for improving agent performance?

Click or press Enter to reveal answer

Answer

1. Prompt tuning — change system instructions and examples. 2. Knowledge source tuning — update grounding documents and data. 3. Model fine-tuning — retrain with custom labelled data. 4. Flow redesign — restructure conversation branching and logic. Choose based on the type of failure the telemetry reveals.

Click to flip back

Question

What is the difference between telemetry interpretation and monitoring?

Click or press Enter to reveal answer

Answer

Monitoring is the continuous collection and display of metrics (dashboards, alerts). Telemetry interpretation is the analytical step where you examine the data to understand WHY a metric changed. Monitoring answers 'what happened.' Interpretation answers 'why it happened' and 'what to do about it.'

Click to flip back

Question

How should you prioritise items in a user feedback backlog?

Click or press Enter to reveal answer

Answer

Use an impact matrix: frequency multiplied by severity. High-frequency and high-severity issues go first. Low-frequency and low-severity issues go to the backlog. This ensures you fix the problems that affect the most users the most severely.

Click to flip back

Knowledge Check

Knowledge Check

Jordan discovers that the scheduling agent gives outdated clinic hours — it still shows pre-COVID hours from 2019. Conversation logs confirm the agent is confident in its responses but the information is wrong. Which tuning lever should Jordan use?

Knowledge Check

An agent's resolution rate has been steadily declining over 6 weeks. No code or prompt changes have been made. Which analysis approach is MOST appropriate?

Knowledge Check

What is the PRIMARY purpose of running automated regression tests after tuning an agent?

🎬 Video coming soon


Next up: Testing Strategy — build a comprehensive test framework for AI agents, including how to use Copilot to generate test cases.

← Previous

Agent Monitoring: Tools, Metrics, and Processes

Next →

Testing Strategy for AI Agents

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.