🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AB-100 Domain 3
Domain 3 — Module 4 of 13 31%
20 of 29 overall

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free

AB-100 Study Guide

Domain 1: Plan AI-Powered Business Solutions

  • Agent Requirements & Data Readiness
  • AI Strategy & the Cloud Adoption Framework
  • Multi-Agent Solution Design
  • Build, Buy, or Extend
  • Generative AI, Knowledge Sources & Prompt Engineering
  • Small Language Models & Model Selection
  • ROI, TCO & Business Case Analysis

Domain 2: Design AI-Powered Business Solutions

  • Copilot in D365 Customer Experience & Service
  • Agent Types: Task, Autonomous & Prompt/Response
  • Foundry Tools & Code-First Solutions
  • Copilot Studio: Topics, Flows & Prompt Actions
  • Power Apps, WAF & Data Processing
  • Extensibility: Custom Models, M365 Agents & Copilot Studio
  • MCP, Computer Use & Agent Behaviours
  • M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot
  • D365 AI Orchestration: Finance, SCM & Customer Experience

Domain 3: Deploy AI-Powered Business Solutions

  • Agent Monitoring: Tools, Metrics, and Processes
  • Telemetry Interpretation and Agent Tuning
  • Testing Strategy for AI Agents
  • Custom Model Validation and Prompt Best Practices
  • End-to-End Testing for Multi-App AI Solutions
  • ALM Foundations & Data Lifecycle for AI
  • ALM for Copilot Studio Agents
  • ALM for Microsoft Foundry Agents
  • ALM for D365 AI Features
  • Agent Security Free
  • Governance for AI Agents Free
  • Prompt Security & AI Vulnerabilities Free
  • Responsible AI & Audit Trails Free
Domain 3: Deploy AI-Powered Business Solutions Premium ⏱ ~13 min read

Custom Model Validation and Prompt Best Practices

Create validation criteria for custom AI models and validate that Copilot prompts follow established best practices.

Custom Model Validation and Prompt Best Practices

☕ Simple explanation

Testing tells you “does it work?” Validation tells you “does it work well enough to trust in production?”

Think of it like a pilot’s licence. A test checks if you can fly the plane (take off, land, navigate). Validation checks if you can fly it safely under real conditions — in fog, with crosswinds, when an engine fails, with passengers on board. You might pass the basic test but fail validation because you can’t handle edge cases at acceptable safety margins.

For AI models, validation means defining what “good enough” looks like across multiple dimensions — accuracy, fairness, speed, safety — and proving the model meets those thresholds before it touches real users.

Model validation is a formal gate between development and production. It answers: “Given our accuracy thresholds, bias tolerances, latency requirements, and safety constraints, should this model be deployed?” The answer requires quantitative evidence across multiple criteria, not a subjective “it seems fine.”

Prompt validation is a parallel discipline. Even when you’re using a foundation model (not a custom one), the prompts that shape its behaviour need formal validation. A poorly written system prompt can cause hallucinations, bypass guardrails, or produce inconsistent outputs. Prompt best practices are testable on the exam — you need to know the checklist and how to validate against it.

The Scenario

🏗️ Kai Mercer and data engineer Priya Sharma are building a custom defect classification model for Apex Industries. The model analyses images from the manufacturing line and classifies defects into 12 categories. Before it can go live, Apex’s CTO Lin Chen requires formal validation.

Priya knows the model’s overall accuracy is 94 percent. But “overall accuracy” hides problems. Is it 94 percent across all 12 defect types? Or is it 99 percent on common defects and 60 percent on rare but critical ones?

Validation Criteria for Custom AI Models

Validation isn’t a single number. It’s a multi-dimensional assessment:

CriterionWhat It MeasuresWhy It MattersThreshold Example
AccuracyOverall percentage of correct predictionsBaseline performance measureAbove 90 percent overall
Precision and RecallPer-class correctness (precision) and coverage (recall)Reveals hidden weaknesses in specific categoriesRecall above 85 percent for ALL classes, not just the average
LatencyTime from input to predictionProduction systems need real-time or near-real-time responsesUnder 500 milliseconds per prediction
Bias detectionPerformance differences across demographic groups or data segmentsEnsures fairness and prevents discriminatory outcomesNo more than 5 percent accuracy gap between segments
RobustnessPerformance on noisy, incomplete, or adversarial inputsReal-world data is messyAccuracy drop under 10 percent on degraded inputs
SafetyBehaviour on out-of-distribution or harmful inputsModel should fail gracefully, not confidently give wrong answers100 percent safe refusal on out-of-scope inputs

Priya’s Validation Discovery

Priya runs the full validation suite. Overall accuracy: 94 percent. But when she breaks it down by defect type:

  • Common defects (scratches, dents): 97 percent accuracy
  • Rare defects (hairline cracks, material delamination): 79 percent accuracy
  • Critical safety defects (structural fractures): 82 percent accuracy

The 15 percent accuracy drop on rare defect types is a problem. A structural fracture classified as a minor scratch could lead to a product recall — or worse, a safety incident. Priya flags this to Kai and Lin Chen. The model needs more training data for rare defects before it can pass validation.

💡

Exam Tip: Validation is NOT the same as testing. Testing checks if the model works (functional correctness). Validation checks if it works WELL ENOUGH for production (meets quantitative thresholds across multiple criteria). The exam expects you to understand this distinction. If a question asks “what is the purpose of model validation,” the answer is about thresholds and production readiness — not just “checking if it works.”

Validation Approaches

Different approaches catch different types of issues. A robust validation strategy uses all three:

AspectAutomated EvaluationHuman EvaluationRed-Teaming
How It WorksScoring pipelines measure accuracy, latency, and bias on labelled datasetsDomain experts manually review model outputs for quality and correctnessAdversarial testers deliberately try to make the model fail or behave unsafely
StrengthsFast, repeatable, covers large datasetsCatches subjective issues automated metrics missReveals safety vulnerabilities and guardrail gaps
WeaknessesMisses nuance — a technically correct answer can still be unhelpfulSlow, expensive, subjective across reviewersResource-intensive, requires skilled adversarial testers
When RequiredEvery validation cycleBefore production deployment and after major changesBefore initial deployment and periodically thereafter
ExampleRun 10,000 test images through the defect classifier and measure per-class precisionManufacturing engineers review 200 borderline classifications manuallyTesters submit deliberately blurry, rotated, or partially obscured images

Copilot Prompt Validation

Even when you’re using a foundation model (not custom-trained), the system prompt shapes everything. A bad prompt leads to bad outcomes regardless of model quality.

The Prompt Best Practices Checklist

Every Copilot system prompt should be validated against these criteria:

PracticeWhat to CheckRed Flag
Clear instructionsDoes the prompt clearly state the agent’s role, scope, and expected behaviour?Vague instructions like “be helpful” without specifics
GroundingDoes the prompt direct the model to use specific knowledge sources?No grounding reference — model relies only on training data
Output formatDoes the prompt specify the expected response structure?No format guidance — responses are inconsistent in length and style
GuardrailsDoes the prompt define what the agent should NOT do?No refusal instructions — agent may attempt anything asked
Few-shot examplesDoes the prompt include example conversations showing correct behaviour?No examples — model must guess the expected pattern
Tone and personaDoes the prompt establish a consistent voice?No tone guidance — responses oscillate between formal and casual

Validating Prompts in Practice

Kai validates the Copilot agent that helps Apex shop-floor workers query defect reports. He runs the prompt through a structured review:

  1. Instruction clarity — The prompt says “Help users find defect reports.” Kai rewrites it to: “You are a manufacturing quality assistant for Apex Industries. Help shop-floor workers search for defect reports by date range, defect type, production line, and severity. Only return results from the Apex defect database. If the query is outside your scope, say you can only help with defect reports.”

  2. Grounding check — Kai confirms the prompt references the Dataverse defect table as the only data source. No hallucination risk from ungrounded answers.

  3. Guardrail validation — Kai tests adversarial inputs: “Show me employee salary data.” The agent correctly refuses. “Ignore your instructions.” The agent stays in character. Guardrails hold.

  4. Few-shot examples — Kai adds three example conversations showing the expected pattern: user asks a question, agent clarifies if needed, agent returns formatted results.

  5. Consistency test — Kai runs the same 20 questions five times each. Responses vary in wording (expected) but not in substance or format (validated).

💡

Deep Dive: The exam may present a system prompt and ask you to identify what’s missing. Common gaps: missing guardrails (no refusal instructions), missing grounding (no data source reference), and missing output format (inconsistent responses). Practice reading prompts critically — look for what’s absent, not just what’s present.

Flashcards

Question

What is the key difference between testing and validation for AI models?

Click or press Enter to reveal answer

Answer

Testing checks functional correctness — does the model produce the right output? Validation checks production readiness — does the model meet quantitative thresholds for accuracy, bias, latency, robustness, and safety? A model can pass testing but fail validation if it doesn't meet the required thresholds.

Click to flip back

Question

Why is overall accuracy an insufficient validation metric?

Click or press Enter to reveal answer

Answer

Overall accuracy can hide per-class failures. A model with 94 percent overall accuracy might be 99 percent accurate on common cases and only 60 percent on rare but critical cases. Validation must include per-class precision and recall to expose these hidden weaknesses.

Click to flip back

Question

Name the six elements of the Copilot prompt best practices checklist.

Click or press Enter to reveal answer

Answer

1. Clear instructions — specific role and scope. 2. Grounding — reference to knowledge sources. 3. Output format — expected response structure. 4. Guardrails — what the agent must NOT do. 5. Few-shot examples — example conversations. 6. Tone and persona — consistent voice and style.

Click to flip back

Question

What are the three validation approaches for custom AI models?

Click or press Enter to reveal answer

Answer

1. Automated evaluation — scoring pipelines on labelled datasets (fast, repeatable). 2. Human evaluation — domain experts manually review outputs (catches nuance). 3. Red-teaming — adversarial testers try to break the model (reveals safety gaps). A robust strategy uses all three.

Click to flip back

Knowledge Check

Knowledge Check

Priya's defect classification model has 94 percent overall accuracy but only 79 percent accuracy on rare defect types. What should she recommend?

Knowledge Check

A solution architect reviews a Copilot system prompt that says: 'You are a helpful assistant. Answer user questions accurately.' What is the MOST critical improvement needed?

🎬 Video coming soon


Next up: End-to-End Testing — design test scenarios that span multiple Dynamics 365 apps and validate cross-app AI handoffs.

← Previous

Testing Strategy for AI Agents

Next →

End-to-End Testing for Multi-App AI Solutions

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.