Retrieval, Indexing & Agent Memory
Vector search, hybrid search, semantic search — and how agents remember conversations. Learn the architecture decisions behind retrieval and memory before you build anything.
Choosing a retrieval strategy
Retrieval is how your AI finds the right information before answering a question.
Imagine you’re studying for an exam with 500 pages of notes. You could: search for exact words (keyword search), search by meaning (semantic search), search by “vibes” — finding notes that feel similar even if the words are different (vector search), or combine all three (hybrid search).
Each method has trade-offs. The exam tests whether you know when to pick which one.
The four search methods
| Feature | Keyword | Semantic | Vector | Hybrid |
|---|---|---|---|---|
| How it works | Matches exact words (BM25) | Understands meaning using a re-ranker | Compares embeddings in vector space | Combines keyword + vector + re-ranker |
| Strengths | Fast, precise for exact terms | Understands synonyms and intent | Finds conceptually similar content | Best of all worlds |
| Weaknesses | Misses synonyms ('car' won't find 'vehicle') | Slower than keyword alone | Needs embedding pipeline | Most complex to configure |
| Best for | Product codes, error IDs, exact names | Natural language questions | Finding similar documents | Production RAG applications |
| Azure AI Search feature | Full-text search (default) | Semantic ranker (add-on) | Vector search (configure embeddings) | Hybrid search (combine all) |
Exam tip: Hybrid search is usually the right answer
When the exam asks “which search method should you use for a RAG application?” and the scenario doesn’t have a specific constraint, hybrid search is almost always correct. It combines the precision of keyword search with the semantic understanding of vector search, plus re-ranking for relevance.
Only pick a single method when the scenario explicitly constrains you (e.g., “exact product SKU lookup” = keyword, “find conceptually similar research papers” = vector).
Indexing strategies
Before you can search, you need to index your content. Key decisions:
| Decision | Options | Impact |
|---|---|---|
| Chunking strategy | Fixed-size, paragraph, semantic, document | Affects retrieval precision — too big and you get noise, too small and you lose context |
| Embedding model | text-embedding-3-small, text-embedding-ada-002, custom | Affects vector search quality and cost |
| Metadata | Title, source URL, date, section headings | Enables filtering and improves citation quality |
| Refresh frequency | Real-time, scheduled, on-change | Balances freshness against indexing cost |
Real-world example: NeuralMed's indexing strategy
NeuralMed indexes 10,000 medical articles for their patient chatbot:
- Chunking: Paragraph-level (medical information needs context — a sentence alone is often meaningless)
- Embedding: text-embedding-3-small (good accuracy, lower cost than large)
- Metadata: Article title, publication date, medical specialty, source journal
- Refresh: Weekly batch (medical literature doesn’t change hourly)
- Search type: Hybrid (patients ask natural-language questions, but drug names need exact match)
Agent memory and knowledge integration
Agents need three types of memory:
| Memory Type | What It Stores | Scope |
|---|---|---|
| Conversation memory | Chat history within a session | Per-thread (one conversation) |
| Persistent memory | Facts learned across conversations | Per-user or per-agent |
| Knowledge | External data sources the agent can search | Shared across all conversations |
Tool and knowledge integration for agents
| Integration Type | Service | Use Case |
|---|---|---|
| Knowledge stores | Foundry IQ, Azure AI Search | Agent searches company docs to answer questions |
| Function calling | Custom functions, APIs | Agent calls external systems (CRM, database, calendar) |
| Code interpreter | Built-in Foundry tool | Agent writes and runs Python code to analyse data |
| Web search | Bing grounding | Agent searches the web for current information |
Exam tip: Memory vs knowledge
The exam distinguishes between memory (what the agent remembers from conversations) and knowledge (external data the agent can search). A common trap:
- “The agent needs to remember the user’s preferences across sessions” → Persistent memory
- “The agent needs to answer questions about company policies” → Knowledge integration (Foundry IQ or Search)
Memory is about the conversation. Knowledge is about the data.
Key terms
Knowledge check
Atlas Financial needs to search 50,000 regulatory documents. Compliance officers type natural-language questions like 'What are the capital requirements for commercial lending?' but also search for specific regulation numbers like 'Basel III Section 4.2'. Which search method should they use?
MediaForge's content agent needs to remember each client's brand guidelines across multiple conversations over weeks. Which type of memory should they implement?
🎬 Video coming soon