Building RAG Applications
Retrieval-Augmented Generation is the most important pattern in enterprise AI. Learn how to build RAG apps that ground model responses in your actual data — with citations, relevance, and accuracy.
What is RAG?
RAG is like open-book exam for AI — instead of answering from memory (which might be wrong), the AI first looks up the answer in your company’s documents, then writes a response based on what it found.
Without RAG, an AI model can only use what it learned during training — which might be outdated or wrong for your specific domain. With RAG, the model searches your actual data before answering, so responses are grounded in facts you control.
The RAG flow
| Step | What Happens | Service Used |
|---|---|---|
| 1. User query | User asks “What’s our refund policy for damaged goods?” | Your application |
| 2. Search | Query is sent to the search index | Azure AI Search |
| 3. Retrieve | Top relevant documents are returned | Azure AI Search |
| 4. Augment prompt | Retrieved docs are injected into the system prompt | Your application |
| 5. Generate | LLM generates a grounded response with citations | GPT-4o (Foundry) |
| 6. Return | Response with answer + source references | Your application |
Building a RAG app — key decisions
| Decision | Options | Recommendation |
|---|---|---|
| Search type | Keyword, semantic, vector, hybrid | Hybrid (best recall + precision) |
| Context window | How many retrieved chunks to include | 3-5 chunks (balance relevance vs token cost) |
| System prompt | Instructions for grounding behaviour | ”Answer ONLY from provided context. Cite sources.” |
| Citation format | How to reference sources | Inline references with document title and section |
| Fallback | What to do when no relevant docs found | ”I don’t have information about that” (not hallucinate) |
Exam tip: The grounding instruction in system prompts
The exam tests whether you know how to instruct the model to stay grounded. A common pattern:
“Answer the user’s question using ONLY the information in the provided context. If the context doesn’t contain the answer, say ‘I don’t have information about that.’ Always cite the source document.”
Without this instruction, the model may use its training data instead of your retrieved documents — defeating the purpose of RAG.
RAG quality factors
| Factor | What It Affects | How to Improve |
|---|---|---|
| Chunking strategy | Whether the right information is in a retrievable unit | Align chunks with natural document boundaries |
| Embedding quality | Whether similar content maps to similar vectors | Use latest embedding models, consistent pipeline |
| Search configuration | Whether the most relevant chunks are returned | Tune hybrid search weights, add semantic ranker |
| Prompt engineering | Whether the model uses context correctly | Strong grounding instructions, few-shot examples |
| Context window size | Balance between relevance and noise | Include top 3-5 chunks, not 20 |
Real-world example: NeuralMed's RAG patient chatbot
NeuralMed builds a patient information chatbot grounded in 10,000 medical articles:
- Index: Azure AI Search with hybrid search (keyword for drug names + vector for symptoms)
- Chunking: Paragraph-level, preserving article title and section as metadata
- Context: Top 5 retrieved chunks injected into the prompt
- Grounding prompt: “Answer using ONLY the provided medical articles. Cite the article title. If unsure, direct the patient to consult their doctor.”
- Fallback: “I don’t have specific information about that. Please consult your healthcare provider.”
- Evaluation: Groundedness score monitored in CI/CD — must stay above 0.85
Common RAG pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Over-chunking | Model gets 20 small fragments, none with enough context | Use larger chunks or include surrounding context |
| Under-chunking | Each chunk is an entire document — too much noise | Split into paragraphs or sections |
| No grounding instruction | Model uses training data instead of retrieved docs | Add explicit grounding instruction to system prompt |
| Stale index | Responses contain outdated information | Monitor indexer health, schedule regular refreshes |
| Wrong search type | Natural-language questions miss exact-term matches | Use hybrid search combining vector + keyword |
Key terms
Knowledge check
Atlas Financial's compliance chatbot occasionally cites regulations that don't exist — fabricated references that look plausible. What is the most likely cause?
NeuralMed's RAG chatbot returns accurate information for common conditions but fails to answer questions about rare diseases. The articles exist in the search index. What should they investigate?
🎬 Video coming soon