Grounded Answers: Azure AI Search with Foundry
Configure a production-grade RAG pipeline using Azure AI Search and Foundry to ground your agent's generative answers in trusted enterprise documents.
What is grounding β and why does it matter?
Imagine an AI agent answering medical questions from memory alone.
Sometimes it gets things right. Sometimes it confidently makes things up β and in healthcare, that is dangerous. Grounding is like giving the agent a textbook and saying βonly answer from this book.β Before the agent responds, it searches your trusted documents, finds the relevant passages, and uses those to craft its answer. If the textbook does not cover the topic, the agent says βI do not knowβ instead of guessing.
Azure AI Search is the search engine that indexes your documents. Foundry is the AI brain that reads the search results and generates an answer. Together, they create a RAG pipeline β Retrieval-Augmented Generation β the gold standard for trustworthy AI answers.
The RAG pipeline explained
Understanding the four stages is critical for the exam. Each stage has configuration decisions that affect answer quality.
| Stage | What happens | Key decisions |
|---|---|---|
| 1. Index | Documents are chunked into segments, converted to vector embeddings, and stored in Azure AI Search | Chunk size (smaller = precise, larger = more context), embedding model choice, which documents to include |
| 2. Retrieve | Userβs question is converted to an embedding and matched against the index using hybrid search (vector + keyword) | Number of results to return (top-k), search mode (vector, keyword, or hybrid), filters (metadata, date, category) |
| 3. Augment | Retrieved chunks are injected into the LLM prompt as context: βAnswer the userβs question using ONLY the following sourcesβ¦β | Prompt template design, how many chunks to include, whether to include metadata |
| 4. Generate | The Foundry model reads the augmented prompt and generates a grounded answer with citations | Model choice (GPT-4o for accuracy, GPT-4o mini for speed/cost), temperature setting, citation format |
Why RAG instead of fine-tuning?
Fine-tuning changes the model itself β you train it on your data. RAG keeps the model unchanged and feeds it relevant documents at query time. For enterprise knowledge that changes frequently (policies, procedures, product docs), RAG is almost always better because:
- No retraining needed when documents update β just re-index
- Citations are traceable β the agent can point to the exact document and passage
- Security is enforced at the index level β you control which documents each user can access
- Cost is lower β indexing is cheaper than fine-tuning runs
The exam assumes RAG for enterprise grounding scenarios. Fine-tuning appears in Foundry model catalog contexts (Module 22).
How this differs from basic generative answers
You already learned about generative answers in the enterprise knowledge module β SharePoint, Dataverse, website knowledge sources. The Foundry + Azure AI Search pattern is the production-grade version.
| Feature | Control level | Knowledge sources | Best for |
|---|---|---|---|
| Basic generative answers | Low β Copilot Studio manages everything (chunking, search, generation) behind the scenes | SharePoint, Dataverse, public websites, uploaded files β configured directly in Copilot Studio | Quick setup, internal docs, FAQ-style answers where default behavior is good enough |
| Azure AI Search + Foundry | Full β you control the index schema, chunking, embedding model, search strategy, generation model, and prompt template | Any documents indexed in Azure AI Search β PDFs, databases, APIs, custom data pipelines | Production workloads needing precision, custom ranking, citations, compliance, or domain-specific models |
Configuration steps
Setting up the Foundry + Azure AI Search pipeline involves both Azure and Copilot Studio:
Azure side (your AI engineer sets this up):
- Create an Azure AI Search resource β choose tier based on data volume and query load
- Create a search index β define the schema (fields, types, searchable/filterable attributes)
- Index your documents β use an indexer to chunk documents, generate embeddings, and populate the index
- Test search quality β run sample queries to verify relevant results are returned
- Create a Foundry project β connect it to the Azure AI Search index
Copilot Studio side (the agent developer connects it):
- In the agent, navigate to Knowledge then Add knowledge then Azure AI Search with Foundry
- Provide the Foundry project details β endpoint, API key, and index name
- Configure the generative answers node β set it up in the topic where grounded answers are needed
- Set citation behavior β choose whether to show source document names, URLs, or passage excerpts
- Test with real questions β verify answers are grounded in the indexed documents, not hallucinated
Key concepts to remember for the exam
The exam tests configuration, not Azure infrastructure setup. Focus on:
- RAG = Retrieval-Augmented Generation β the four-stage pipeline (index, retrieve, augment, generate)
- Grounding = constraining answers to retrieved sources β reduces hallucination, enables citations
- Azure AI Search + Foundry gives full control β basic generative answers use defaults you cannot customize
- Citations are a feature of grounding β the agent can reference the exact document that informed its answer
- This is different from Module 13 β Module 13 covers basic knowledge source connections; this module covers the production-grade Foundry pipeline
Scenario: Lena configures clinical answers grounded in research papers
Lenaβs healthcare firm has a library of 12,000 peer-reviewed medical research papers and 3,000 internal clinical guidelines. Doctors need an agent that answers questions like βWhat are the current treatment protocols for Type 2 diabetes in patients over 65?β β and every answer must cite the specific paper or guideline it draws from.
Basic generative answers will not work here: Lena needs control over chunking (medical papers have complex structures), search ranking (recent papers should rank higher), and citations (doctors must verify every recommendation).
She builds the pipeline: Azure AI Search indexes all 15,000 documents with metadata (publication date, specialty, evidence level). The chunking strategy splits papers by section (abstract, methods, results, discussion) so retrieval is precise. She selects GPT-4o in Foundry for generation β it handles medical terminology better than smaller models.
The Copilot Studio developer connects this pipeline as a knowledge source in the hospital agent. When a doctor asks about diabetes treatment, the agent searches the index, retrieves the five most relevant paper sections, and generates an answer with inline citations: βAccording to Smith et al. (2024) in the Journal of Endocrinology, current first-line treatment for Type 2 diabetes in elderly patients isβ¦β
Every answer is traceable. Every citation is verifiable. This is grounding in action.
Lena needs her medical agent to cite specific research papers when answering clinical questions. Which approach should she use?
In a RAG pipeline, what happens during the 'Augment' stage?
Why is RAG preferred over fine-tuning for enterprise knowledge grounding?
π¬ Video coming soon
Grounded Answers: Azure AI Search with Foundry