🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-103 Domain 2
Domain 2 — Module 2 of 11 18%
10 of 27 overall

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together
Domain 2: Implement Generative AI and Agentic Solutions Premium ⏱ ~14 min read

Building RAG Applications

Retrieval-Augmented Generation is the most important pattern in enterprise AI. Learn how to build RAG apps that ground model responses in your actual data — with citations, relevance, and accuracy.

What is RAG?

☕ Simple explanation

RAG is like open-book exam for AI — instead of answering from memory (which might be wrong), the AI first looks up the answer in your company’s documents, then writes a response based on what it found.

Without RAG, an AI model can only use what it learned during training — which might be outdated or wrong for your specific domain. With RAG, the model searches your actual data before answering, so responses are grounded in facts you control.

Retrieval-Augmented Generation (RAG) is an architecture pattern that combines information retrieval with language model generation. The flow is:

  1. Query — user asks a question
  2. Retrieve — search an index for relevant documents
  3. Augment — inject retrieved documents into the model’s prompt as context
  4. Generate — model produces a response grounded in the retrieved data

RAG is the dominant pattern for enterprise AI because it keeps the model’s knowledge current, reduces hallucinations, and allows organisations to control which data the model has access to.

The RAG flow

StepWhat HappensService Used
1. User queryUser asks “What’s our refund policy for damaged goods?”Your application
2. SearchQuery is sent to the search indexAzure AI Search
3. RetrieveTop relevant documents are returnedAzure AI Search
4. Augment promptRetrieved docs are injected into the system promptYour application
5. GenerateLLM generates a grounded response with citationsGPT-4o (Foundry)
6. ReturnResponse with answer + source referencesYour application

Building a RAG app — key decisions

DecisionOptionsRecommendation
Search typeKeyword, semantic, vector, hybridHybrid (best recall + precision)
Context windowHow many retrieved chunks to include3-5 chunks (balance relevance vs token cost)
System promptInstructions for grounding behaviour”Answer ONLY from provided context. Cite sources.”
Citation formatHow to reference sourcesInline references with document title and section
FallbackWhat to do when no relevant docs found”I don’t have information about that” (not hallucinate)
💡 Exam tip: The grounding instruction in system prompts

The exam tests whether you know how to instruct the model to stay grounded. A common pattern:

“Answer the user’s question using ONLY the information in the provided context. If the context doesn’t contain the answer, say ‘I don’t have information about that.’ Always cite the source document.”

Without this instruction, the model may use its training data instead of your retrieved documents — defeating the purpose of RAG.

RAG quality factors

FactorWhat It AffectsHow to Improve
Chunking strategyWhether the right information is in a retrievable unitAlign chunks with natural document boundaries
Embedding qualityWhether similar content maps to similar vectorsUse latest embedding models, consistent pipeline
Search configurationWhether the most relevant chunks are returnedTune hybrid search weights, add semantic ranker
Prompt engineeringWhether the model uses context correctlyStrong grounding instructions, few-shot examples
Context window sizeBalance between relevance and noiseInclude top 3-5 chunks, not 20
ℹ️ Real-world example: NeuralMed's RAG patient chatbot

NeuralMed builds a patient information chatbot grounded in 10,000 medical articles:

  • Index: Azure AI Search with hybrid search (keyword for drug names + vector for symptoms)
  • Chunking: Paragraph-level, preserving article title and section as metadata
  • Context: Top 5 retrieved chunks injected into the prompt
  • Grounding prompt: “Answer using ONLY the provided medical articles. Cite the article title. If unsure, direct the patient to consult their doctor.”
  • Fallback: “I don’t have specific information about that. Please consult your healthcare provider.”
  • Evaluation: Groundedness score monitored in CI/CD — must stay above 0.85

Common RAG pitfalls

PitfallSymptomFix
Over-chunkingModel gets 20 small fragments, none with enough contextUse larger chunks or include surrounding context
Under-chunkingEach chunk is an entire document — too much noiseSplit into paragraphs or sections
No grounding instructionModel uses training data instead of retrieved docsAdd explicit grounding instruction to system prompt
Stale indexResponses contain outdated informationMonitor indexer health, schedule regular refreshes
Wrong search typeNatural-language questions miss exact-term matchesUse hybrid search combining vector + keyword

Key terms

Question

What is RAG (Retrieval-Augmented Generation)?

Click or press Enter to reveal answer

Answer

An architecture pattern where a user's query first retrieves relevant documents from a search index, then those documents are injected into the LLM's prompt as context, producing a grounded response based on actual data.

Click to flip back

Question

What is grounding in the context of RAG?

Click or press Enter to reveal answer

Answer

Anchoring the model's response in retrieved source data rather than letting it generate from training data alone. Grounded responses are factually based on documents you control, reducing hallucinations.

Click to flip back

Question

What is context window in RAG?

Click or press Enter to reveal answer

Answer

The number of retrieved document chunks included in the model's prompt. Too few = missing information. Too many = noise and higher token cost. Typical: 3-5 chunks for most applications.

Click to flip back

Question

What is the grounding instruction?

Click or press Enter to reveal answer

Answer

A directive in the system prompt that tells the model to answer ONLY from provided context and to say 'I don't know' if the context doesn't contain the answer. Critical for preventing hallucinations in RAG.

Click to flip back

Knowledge check

Knowledge Check

Atlas Financial's compliance chatbot occasionally cites regulations that don't exist — fabricated references that look plausible. What is the most likely cause?

Knowledge Check

NeuralMed's RAG chatbot returns accurate information for common conditions but fails to answer questions about rare diseases. The articles exist in the search index. What should they investigate?

🎬 Video coming soon

← Previous

Connecting Your App to Foundry

Next →

Workflows & Reasoning Pipelines

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.