SQL Queries in Cosmos DB

SQL in Cosmos DB

Simple explanation

It looks like SQL, but it’s not the SQL you know. Cosmos DB’s query language borrows familiar syntax — SELECT, WHERE, ORDER BY — but works on JSON documents instead of table rows. The biggest surprise? JOIN doesn’t join two tables. It joins a document with its own nested arrays.

Think of it as “JSON-flavoured SQL” — powerful for querying documents, but different enough from SQL Server that Sophie needs to rewire her brain.

Basic queries

SELECT and WHERE

-- Get all active projects for a tenant
SELECT c.id, c.name, c.status
FROM c
WHERE c.tenantId = 'tenant-abc'
  AND c.type = 'project'
  AND c.status = 'active'

The c alias refers to items in the container. You can also use * to return the full document:

SELECT * FROM c WHERE c.tenantId = 'tenant-abc' AND c.type = 'task'

ORDER BY

-- Tasks ordered by creation date (newest first)
SELECT c.title, c.createdAt
FROM c
WHERE c.tenantId = 'tenant-abc' AND c.type = 'task'
ORDER BY c.createdAt DESC

Important: ORDER BY on a single property uses the default range index. Multi-field ORDER BY requires a composite index (covered below).

Aggregations

-- Count tasks per status
SELECT c.status, COUNT(1) AS taskCount
FROM c
WHERE c.tenantId = 'tenant-abc' AND c.type = 'task'
GROUP BY c.status

-- Average estimated hours
SELECT AVG(c.estimatedHours) AS avgHours
FROM c
WHERE c.tenantId = 'tenant-abc' AND c.type = 'task'

Available aggregations: COUNT, SUM, AVG, MIN, MAX.

Intra-document JOIN

Critical concept: JOIN in Cosmos DB is NOT a cross-document join like SQL Server. It’s a self-join that flattens nested arrays within a single document.

{
  "id": "proj-001",
  "tenantId": "tenant-abc",
  "name": "Website Redesign",
  "tags": ["frontend", "priority", "q1"],
  "milestones": [
    { "name": "Design", "dueDate": "2025-03-01" },
    { "name": "Build", "dueDate": "2025-06-01" }
  ]
}

-- Flatten milestones array
SELECT p.name AS project, m.name AS milestone, m.dueDate
FROM p
JOIN m IN p.milestones
WHERE p.tenantId = 'tenant-abc' AND p.type = 'project'

Result:

project	milestone	dueDate
Website Redesign	Design	2025-03-01
Website Redesign	Build	2025-06-01

Sophie’s confusion: Sophie tried SELECT * FROM projects p JOIN tasks t ON p.id = t.projectId — this doesn’t work. There’s no cross-document JOIN. To get a project and its tasks (stored as separate documents), you run two queries.

Single-partition vs cross-partition queries

Aspect	Single-partition query	Cross-partition query
Filter	Includes the partition key in WHERE	Doesn't include the partition key
RU cost	Low — hits one partition only	High — fans out to all physical partitions
Latency	Fast — single network hop	Slow — parallel fan-out, waits for slowest partition
Example	WHERE c.tenantId = 'abc' AND c.type = 'task'	WHERE c.type = 'task' AND c.status = 'overdue'
When to use	Always when possible — design for this	Rarely — analytics, reports, admin queries
SDK header	x-ms-documentdb-query-enablecrosspartition: false	x-ms-documentdb-query-enablecrosspartition: true

// Single-partition query (fast, cheap)
var query = new QueryDefinition(
    "SELECT * FROM c WHERE c.tenantId = @tenant AND c.type = 'task'")
    .WithParameter("@tenant", "tenant-abc");

// The SDK automatically routes to the correct partition
using FeedIterator<TaskItem> feed = container.GetItemQueryIterator<TaskItem>(
    query,
    requestOptions: new QueryRequestOptions
    {
        PartitionKey = new PartitionKey("tenant-abc")  // explicit routing
    });

Query metrics

Enable query metrics to understand performance:

var options = new QueryRequestOptions
{
    PopulateIndexMetrics = true
};

using FeedIterator<TaskItem> feed = container.GetItemQueryIterator<TaskItem>(query, requestOptions: options);

while (feed.HasMoreResults)
{
    FeedResponse<TaskItem> response = await feed.ReadNextAsync();
    Console.WriteLine($"RU charge: {response.RequestCharge}");
    Console.WriteLine($"Index metrics: {response.IndexMetrics}");
    // Shows which indexes were used and suggests composite indexes
}

Parameterised queries

Always use parameters — never concatenate user input into queries:

// ✅ Parameterised — safe from injection, cached execution plan
var query = new QueryDefinition(
    "SELECT * FROM c WHERE c.tenantId = @tenant AND c.status = @status")
    .WithParameter("@tenant", tenantId)
    .WithParameter("@status", "active");

// ❌ String concatenation — injection risk, no plan caching
var unsafeQuery = $"SELECT * FROM c WHERE c.tenantId = '{tenantId}'";

Ravi’s mistake: Ravi concatenated user input directly into queries. A tenant named tenant' OR '1'='1 returned all tenants’ data. Parameterised queries prevent this.

Exam tip: JOIN is intra-document only

This is one of the most tested concepts. Cosmos DB JOIN flattens nested arrays within a single document. It does NOT join across documents or containers like SQL Server’s JOIN. If a question asks how to “join” two separate document types, the answer involves two separate queries or embedding the data.

Composite indexes

By default, Cosmos DB creates range indexes on each property individually. For multi-field ORDER BY, you need a composite index:

{
  "indexingPolicy": {
    "compositeIndexes": [
      [
        { "path": "/tenantId", "order": "ascending" },
        { "path": "/createdAt", "order": "descending" }
      ],
      [
        { "path": "/status", "order": "ascending" },
        { "path": "/priority", "order": "descending" }
      ]
    ]
  }
}

When you need a composite index:

ORDER BY c.status ASC, c.priority DESC → needs composite index on [status ASC, priority DESC]
ORDER BY c.createdAt DESC → single field, default index works
Filters with ORDER BY on a different property → composite improves performance

Exam tip: ORDER BY multiple fields

A query with ORDER BY c.field1, c.field2 requires a composite index that matches the exact fields and sort directions. Without it, the query fails with an error — it doesn’t silently fall back to a scan. The exam tests this: if you see multi-field ORDER BY, check for a composite index.

🎬 Video walkthrough

Flashcards

Question

What does JOIN do in Cosmos DB SQL?

Click or press Enter to reveal answer

Answer

It's an intra-document self-join that flattens nested arrays within a single document. It does NOT join across documents or containers. Example: JOIN m IN c.milestones flattens each milestone into a separate row.

Click to flip back

Question

What happens if you use ORDER BY on two fields without a composite index?

Click or press Enter to reveal answer

Answer

The query fails with an error. Multi-field ORDER BY requires a composite index that matches the exact fields and sort directions. A single-field range index is not sufficient.

Click to flip back

Question

Why should you use parameterised queries instead of string concatenation?

Click or press Enter to reveal answer

Answer

Two reasons: (1) Security — prevents SQL injection attacks. (2) Performance — parameterised queries reuse cached execution plans, while concatenated strings create a new plan each time.

Click to flip back

Question

What makes a cross-partition query expensive?

Click or press Enter to reveal answer

Answer

It fans out to ALL physical partitions in parallel, each consuming RU/s independently. The total cost is the sum of all partition costs. Latency equals the slowest partition's response. It's typically 5-10× more expensive than a single-partition query.

Click to flip back

Knowledge check

Knowledge Check

Sophie writes: SELECT * FROM projects p JOIN tasks t ON p.id = t.projectId. What happens?

Knowledge Check

Priya's top query is: SELECT * FROM c WHERE c.tenantId = 'abc' AND c.type = 'task' ORDER BY c.createdAt DESC. What index does she need?

Knowledge Check

A query runs without a partition key filter on a container with 8 physical partitions. Each partition charges 5 RU. What's the total cost?

Next up: SDK Query Pagination — learn how to use LINQ, FeedIterator, continuation tokens, and MaxItemCount for efficient query pagination.