RAG with SQL: Prompt, Process, Respond

Making AI accurate with your data

Simple explanation

Imagine a smart assistant who can answer any question — but makes things up when they do not know.

That is a language model without context. It sounds confident but may be completely wrong about YOUR data. RAG fixes this: before the assistant answers, you first search your database for relevant facts, hand those facts to the assistant, and say “answer ONLY based on these facts.” Now the assistant gives accurate, grounded answers about your actual data.

The remarkable thing about DP-800: you can do this entire pipeline in T-SQL. Search your data, build the prompt, call the AI model, and extract the response — all without leaving the database.

When to use RAG

Use Case	Why RAG?	Alternative Without RAG
Customer support chatbot	Answers grounded in YOUR knowledge base, not generic	Fine-tuned model (expensive, stale quickly)
Internal FAQ system	Responses cite specific company documents	Static FAQ page (does not handle variations)
Product recommendation	Explains WHY based on actual product data	Simple collaborative filtering (no explanations)
Report summarisation	Summarises YOUR data, not imaginary data	Template-based reports (rigid, no natural language)
Code review assistant	Reviews against YOUR coding standards	Generic code review (misses team conventions)

When NOT to use RAG

Simple lookups: “What is customer 42’s email?” — use SELECT, not RAG
Aggregations: “Total sales this month” — use SUM, not RAG
Real-time decisions: Millisecond-critical paths — RAG adds model latency
No relevant data: If your database does not have the answer, RAG cannot help

The RAG pipeline in T-SQL

Step 1: Retrieve relevant context

-- Vector search for semantically relevant documents
DECLARE @queryVector VECTOR(1536);
SET @queryVector = AI_GENERATE_EMBEDDINGS(@userQuestion USE MODEL EmbeddingModel);

SELECT TOP 5 DocumentId, Title, Content,
    VECTOR_DISTANCE('cosine', ContentEmbedding, @queryVector) AS distance
INTO #Context
FROM KnowledgeBase
ORDER BY VECTOR_DISTANCE('cosine', ContentEmbedding, @queryVector);

Step 2: Convert to JSON for the model

-- Build context JSON from search results
DECLARE @contextJson NVARCHAR(MAX);
SELECT @contextJson = JSON_ARRAYAGG(
    JSON_OBJECT('title': Title, 'content': Content)
)
FROM #Context;

Step 3: Construct the prompt and call the model

-- Build the API request body
DECLARE @requestBody NVARCHAR(MAX) = JSON_OBJECT(
    'messages': JSON_ARRAY(
        JSON_OBJECT(
            'role': 'system',
            'content': 'You are a helpful assistant. Answer questions ONLY based on the provided context. If the context does not contain the answer, say "I do not have enough information."'
        ),
        JSON_OBJECT(
            'role': 'user',
            'content': 'Context: ' + @contextJson + CHAR(10) + CHAR(10) + 'Question: ' + @userQuestion
        )
    ),
    'temperature': 0.1,
    'max_tokens': 500
);

-- Call the model
DECLARE @response NVARCHAR(MAX);
DECLARE @statusCode INT;

EXEC sp_invoke_external_rest_endpoint
    @url = 'https://vaultbank-ai.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21',
    @method = 'POST',
    @headers = '{"Content-Type": "application/json"}',
    @credential = [https://vaultbank-ai.openai.azure.com/],
    @payload = @requestBody,
    @response = @response OUTPUT;

Step 4: Extract the response

-- Parse the model's response from JSON
SELECT
    JSON_VALUE(@response, '$.result.choices[0].message.content') AS Answer,
    JSON_VALUE(@response, '$.result.usage.total_tokens') AS TokensUsed;

sp_invoke_external_rest_endpoint

This system stored procedure is the bridge between T-SQL and external REST APIs:

EXEC sp_invoke_external_rest_endpoint
    @url = N'https://...',              -- Required: endpoint URL
    @method = N'POST',                  -- GET, POST, PUT, PATCH, DELETE
    @headers = N'{"key": "value"}',     -- HTTP headers as JSON
    @payload = @jsonBody,               -- Request body (for POST/PUT)
    @credential = [CredentialName],     -- Database-scoped credential
    @timeout = 120,                     -- Timeout in seconds
    @response = @responseVar OUTPUT;    -- Response as NVARCHAR(MAX)

Exam tip: Security for sp_invoke_external_rest_endpoint

The stored procedure can only call HTTPS endpoints. Authentication uses database-scoped credentials:

CREATE DATABASE SCOPED CREDENTIAL [https://vaultbank-ai.openai.azure.com/]
WITH IDENTITY = 'HTTPEndpointHeaders',
     SECRET = '{"api-key": "your-api-key-here"}';

For production, use managed identity instead of API keys:

CREATE DATABASE SCOPED CREDENTIAL [https://vaultbank-ai.openai.azure.com/]
WITH IDENTITY = 'Managed Identity',
     SECRET = '{"resourceid": "https://cognitiveservices.azure.com"}';

The exam expects you to know both authentication patterns and prefer managed identity.

Scenario: Priya builds a RAG assistant at Vault Bank

Vault Bank’s customer service receives thousands of questions about banking products. Priya builds a RAG system entirely in T-SQL:

Knowledge base: 5,000 FAQ documents with embeddings stored in a KnowledgeBase table
Search: Vector search finds the 5 most relevant documents for each question
Prompt: System message instructs the model to cite document titles in its answer
Model: GPT-4o-mini for fast, cost-effective responses
Response: Extracted, stored in a ChatHistory table for audit compliance

Total pipeline time: ~800ms (50ms search + 700ms model response). The entire pipeline runs in a single stored procedure — no external application needed.

Question

What is RAG and why is it better than using a language model alone?

Click or press Enter to reveal answer

Answer

RAG (Retrieval-Augmented Generation) retrieves relevant data from your database and includes it as context when prompting a language model. This grounds the response in actual facts, reducing hallucinations. Without RAG, the model may generate plausible-sounding but factually wrong answers about your specific data.

Click to flip back

Question

What is sp_invoke_external_rest_endpoint used for in RAG?

Click or press Enter to reveal answer

Answer

It is a system stored procedure that calls external REST APIs (like Azure OpenAI) directly from T-SQL. In RAG, it sends the constructed prompt (with retrieved context) to the language model and receives the generated response. It supports HTTPS only and uses database-scoped credentials for authentication.

Click to flip back

Question

Why set temperature to a low value (0.1) in RAG prompts?

Click or press Enter to reveal answer

Answer

Low temperature makes the model more deterministic and focused on the provided context. High temperature (0.7-1.0) encourages creative, diverse responses — which is exactly what you do NOT want in RAG. You want accurate, grounded answers based on the retrieved data, not creative interpretations.

Click to flip back

Knowledge Check

Ingrid at Nordic Shield wants to build a system where claims adjusters can ask natural language questions about policy coverage. The system must answer ONLY based on actual policy documents (never make up coverage that does not exist). Which approach is correct?

Congratulations! You have completed all 28 modules of the DP-800 study guide. You now have a comprehensive understanding of designing, developing, securing, optimising, and AI-enabling database solutions across SQL Server, Azure SQL, and SQL in Microsoft Fabric.

Ready to test your knowledge? Head to the practice questions to simulate the exam experience.