RAG with SQL: Prompt, Process, Respond
Build retrieval-augmented generation directly in your database β retrieve relevant data with search, construct prompts in T-SQL, call language models, and extract structured responses.
Making AI accurate with your data
Imagine a smart assistant who can answer any question β but makes things up when they do not know.
That is a language model without context. It sounds confident but may be completely wrong about YOUR data. RAG fixes this: before the assistant answers, you first search your database for relevant facts, hand those facts to the assistant, and say βanswer ONLY based on these facts.β Now the assistant gives accurate, grounded answers about your actual data.
The remarkable thing about DP-800: you can do this entire pipeline in T-SQL. Search your data, build the prompt, call the AI model, and extract the response β all without leaving the database.
When to use RAG
| Use Case | Why RAG? | Alternative Without RAG |
|---|---|---|
| Customer support chatbot | Answers grounded in YOUR knowledge base, not generic | Fine-tuned model (expensive, stale quickly) |
| Internal FAQ system | Responses cite specific company documents | Static FAQ page (does not handle variations) |
| Product recommendation | Explains WHY based on actual product data | Simple collaborative filtering (no explanations) |
| Report summarisation | Summarises YOUR data, not imaginary data | Template-based reports (rigid, no natural language) |
| Code review assistant | Reviews against YOUR coding standards | Generic code review (misses team conventions) |
When NOT to use RAG
- Simple lookups: βWhat is customer 42βs email?β β use SELECT, not RAG
- Aggregations: βTotal sales this monthβ β use SUM, not RAG
- Real-time decisions: Millisecond-critical paths β RAG adds model latency
- No relevant data: If your database does not have the answer, RAG cannot help
The RAG pipeline in T-SQL
Step 1: Retrieve relevant context
-- Vector search for semantically relevant documents
DECLARE @queryVector VECTOR(1536);
SET @queryVector = AI_GENERATE_EMBEDDINGS(EmbeddingModel, @userQuestion);
SELECT TOP 5 DocumentId, Title, Content, distance
INTO #Context
FROM VECTOR_SEARCH(KnowledgeBase, 'ContentEmbedding', @queryVector, 5, 'cosine');
Step 2: Convert to JSON for the model
-- Build context JSON from search results
DECLARE @contextJson NVARCHAR(MAX);
SELECT @contextJson = JSON_ARRAYAGG(
JSON_OBJECT('title': Title, 'content': Content)
)
FROM #Context;
Step 3: Construct the prompt and call the model
-- Build the API request body
DECLARE @requestBody NVARCHAR(MAX) = JSON_OBJECT(
'messages': JSON_ARRAY(
JSON_OBJECT(
'role': 'system',
'content': 'You are a helpful assistant. Answer questions ONLY based on the provided context. If the context does not contain the answer, say "I do not have enough information."'
),
JSON_OBJECT(
'role': 'user',
'content': 'Context: ' + @contextJson + CHAR(10) + CHAR(10) + 'Question: ' + @userQuestion
)
),
'temperature': 0.1,
'max_tokens': 500
);
-- Call the model
DECLARE @response NVARCHAR(MAX);
DECLARE @statusCode INT;
EXEC sp_invoke_external_rest_endpoint
@url = 'https://vaultbank-ai.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21',
@method = 'POST',
@headers = '{"Content-Type": "application/json"}',
@credential = [AzureOpenAICredential],
@payload = @requestBody,
@response = @response OUTPUT;
Step 4: Extract the response
-- Parse the model's response from JSON
SELECT
JSON_VALUE(@response, '$.result.choices[0].message.content') AS Answer,
JSON_VALUE(@response, '$.result.usage.total_tokens') AS TokensUsed;
sp_invoke_external_rest_endpoint
This system stored procedure is the bridge between T-SQL and external REST APIs:
EXEC sp_invoke_external_rest_endpoint
@url = N'https://...', -- Required: endpoint URL
@method = N'POST', -- GET, POST, PUT, PATCH, DELETE
@headers = N'{"key": "value"}', -- HTTP headers as JSON
@payload = @jsonBody, -- Request body (for POST/PUT)
@credential = [CredentialName], -- Database-scoped credential
@timeout = 120, -- Timeout in seconds
@response = @responseVar OUTPUT; -- Response as NVARCHAR(MAX)
Exam tip: Security for sp_invoke_external_rest_endpoint
The stored procedure can only call HTTPS endpoints. Authentication uses database-scoped credentials:
CREATE DATABASE SCOPED CREDENTIAL [AzureOpenAICredential]
WITH IDENTITY = 'HTTPEndpointHeaders',
SECRET = '{"api-key": "your-api-key-here"}';For production, use managed identity instead of API keys:
CREATE DATABASE SCOPED CREDENTIAL [ManagedIdentityCred]
WITH IDENTITY = 'Managed Identity';The exam expects you to know both authentication patterns and prefer managed identity.
Scenario: Priya builds a RAG assistant at Vault Bank
Vault Bankβs customer service receives thousands of questions about banking products. Priya builds a RAG system entirely in T-SQL:
- Knowledge base: 5,000 FAQ documents with embeddings stored in a
KnowledgeBasetable - Search: Vector search finds the 5 most relevant documents for each question
- Prompt: System message instructs the model to cite document titles in its answer
- Model: GPT-4o-mini for fast, cost-effective responses
- Response: Extracted, stored in a
ChatHistorytable for audit compliance
Total pipeline time: ~800ms (50ms search + 700ms model response). The entire pipeline runs in a single stored procedure β no external application needed.
Ingrid at Nordic Shield wants to build a system where claims adjusters can ask natural language questions about policy coverage. The system must answer ONLY based on actual policy documents (never make up coverage that does not exist). Which approach is correct?
π¬ Video coming soon
Congratulations! You have completed all 28 modules of the DP-800 study guide. You now have a comprehensive understanding of designing, developing, securing, optimising, and AI-enabling database solutions across SQL Server, Azure SQL, and SQL in Microsoft Fabric.
Ready to test your knowledge? Head to the practice questions to simulate the exam experience.