Grounded Answers: Azure AI Search with Foundry

What is grounding — and why does it matter?

Simple explanation

Imagine an AI agent answering medical questions from memory alone.

Sometimes it gets things right. Sometimes it confidently makes things up — and in healthcare, that is dangerous. Grounding is like giving the agent a textbook and saying “only answer from this book.” Before the agent responds, it searches your trusted documents, finds the relevant passages, and uses those to craft its answer. If the textbook does not cover the topic, the agent says “I do not know” instead of guessing.

Azure AI Search is the search engine that indexes your documents. Foundry is the AI brain that reads the search results and generates an answer. Together, they create a RAG pipeline — Retrieval-Augmented Generation — the gold standard for trustworthy AI answers.

The RAG pipeline explained

Understanding the four stages is critical for the exam. Each stage has configuration decisions that affect answer quality.

Stage	What happens	Key decisions
1. Index	Documents are chunked into segments, converted to vector embeddings, and stored in Azure AI Search	Chunk size (smaller = precise, larger = more context), embedding model choice, which documents to include
2. Retrieve	User’s question is converted to an embedding and matched against the index using hybrid search (vector + keyword)	Number of results to return (top-k), search mode (vector, keyword, or hybrid), filters (metadata, date, category)
3. Augment	Retrieved chunks are injected into the LLM prompt as context: “Answer the user’s question using ONLY the following sources…”	Prompt template design, how many chunks to include, whether to include metadata
4. Generate	The Foundry model reads the augmented prompt and generates a grounded answer with citations	Model choice (GPT-4o for accuracy, GPT-4o mini for speed/cost), temperature setting, citation format

Why RAG instead of fine-tuning?

Fine-tuning changes the model itself — you train it on your data. RAG keeps the model unchanged and feeds it relevant documents at query time. For enterprise knowledge that changes frequently (policies, procedures, product docs), RAG is almost always better because:

No retraining needed when documents update — just re-index
Citations are traceable — the agent can point to the exact document and passage
Security is enforced at the index level — you control which documents each user can access
Cost is lower — indexing is cheaper than fine-tuning runs

The exam assumes RAG for enterprise grounding scenarios. Fine-tuning appears in Foundry model catalog contexts (Module 22).

How this differs from basic generative answers

You already learned about generative answers in the enterprise knowledge module — SharePoint, Dataverse, website knowledge sources. The Foundry + Azure AI Search pattern is the production-grade version.

Basic vs Foundry-grounded generative answers
Feature	Control level	Knowledge sources	Best for
Basic generative answers	Low — Copilot Studio manages everything (chunking, search, generation) behind the scenes	SharePoint, Dataverse, public websites, uploaded files — configured directly in Copilot Studio	Quick setup, internal docs, FAQ-style answers where default behavior is good enough
Azure AI Search + Foundry	Full — you control the index schema, chunking, embedding model, search strategy, generation model, and prompt template	Any documents indexed in Azure AI Search — PDFs, databases, APIs, custom data pipelines	Production workloads needing precision, custom ranking, citations, compliance, or domain-specific models

Configuration steps

Setting up the Foundry + Azure AI Search pipeline involves both Azure and Copilot Studio:

Azure side (your AI engineer sets this up):

Create an Azure AI Search resource — choose tier based on data volume and query load
Create a search index — define the schema (fields, types, searchable/filterable attributes)
Index your documents — use an indexer to chunk documents, generate embeddings, and populate the index
Test search quality — run sample queries to verify relevant results are returned
Create a Foundry project — connect it to the Azure AI Search index

Copilot Studio side (the agent developer connects it):

In the agent, navigate to Knowledge then Add knowledge then Azure AI Search with Foundry
Provide the Foundry project details — endpoint, API key, and index name
Configure the generative answers node — set it up in the topic where grounded answers are needed
Set citation behavior — choose whether to show source document names, URLs, or passage excerpts
Test with real questions — verify answers are grounded in the indexed documents, not hallucinated

Key concepts to remember for the exam

The exam tests configuration, not Azure infrastructure setup. Focus on:

RAG = Retrieval-Augmented Generation — the four-stage pipeline (index, retrieve, augment, generate)
Grounding = constraining answers to retrieved sources — reduces hallucination, enables citations
Azure AI Search + Foundry gives full control — basic generative answers use defaults you cannot customize
Citations are a feature of grounding — the agent can reference the exact document that informed its answer
This is different from Module 13 — Module 13 covers basic knowledge source connections; this module covers the production-grade Foundry pipeline

Scenario: Lena configures clinical answers grounded in research papers

Lena’s healthcare firm has a library of 12,000 peer-reviewed medical research papers and 3,000 internal clinical guidelines. Doctors need an agent that answers questions like “What are the current treatment protocols for Type 2 diabetes in patients over 65?” — and every answer must cite the specific paper or guideline it draws from.

Basic generative answers will not work here: Lena needs control over chunking (medical papers have complex structures), search ranking (recent papers should rank higher), and citations (doctors must verify every recommendation).

She builds the pipeline: Azure AI Search indexes all 15,000 documents with metadata (publication date, specialty, evidence level). The chunking strategy splits papers by section (abstract, methods, results, discussion) so retrieval is precise. She selects GPT-4o in Foundry for generation — it handles medical terminology better than smaller models.

The Copilot Studio developer connects this pipeline as a knowledge source in the hospital agent. When a doctor asks about diabetes treatment, the agent searches the index, retrieves the five most relevant paper sections, and generates an answer with inline citations: “According to Smith et al. (2024) in the Journal of Endocrinology, current first-line treatment for Type 2 diabetes in elderly patients is…”

Every answer is traceable. Every citation is verifiable. This is grounding in action.

Question

What does RAG stand for and what are its four stages?

Click or press Enter to reveal answer

Answer

Retrieval-Augmented Generation. Stages: 1) Index — chunk and embed documents into a search index. 2) Retrieve — find relevant chunks for the user's query. 3) Augment — inject retrieved chunks into the LLM prompt. 4) Generate — LLM produces a grounded answer from the context.

Click to flip back

Question

What is the difference between basic generative answers and Azure AI Search + Foundry grounding?

Click or press Enter to reveal answer

Answer

Basic: Copilot Studio manages everything with defaults — quick setup, limited control. Foundry: you control the search index, chunking, embedding model, generation model, and prompt — full production-grade customisation with citations.

Click to flip back

Question

What does 'grounding' mean in the context of AI agents?

Click or press Enter to reveal answer

Answer

Constraining an LLM's responses to information retrieved from authoritative sources, rather than relying on the model's training data. This reduces hallucination and enables traceable citations.

Click to flip back

Knowledge Check

Lena needs her medical agent to cite specific research papers when answering clinical questions. Which approach should she use?

Knowledge Check

In a RAG pipeline, what happens during the 'Augment' stage?

Knowledge Check

Why is RAG preferred over fine-tuning for enterprise knowledge grounding?