Cosmos DB & Semi-Structured Data
Consistency models, partitioning strategies, and API choices β design a globally distributed NoSQL solution that balances performance, consistency, and cost.
Why Cosmos DB design matters
Cosmos DB is like a global postal service that guarantees delivery speed β but you choose how βfreshβ the letter needs to be.
Itβs Azureβs globally distributed NoSQL database. The three big design decisions: Which API? (NoSQL, MongoDB, Cassandra, Gremlin, Table), Which consistency level? (strong to eventual β a tradeoff between correctness and speed), and How to partition? (the partition key determines performance and cost).
Cosmos DB consistency models
This is one of the most exam-tested topics in AZ-305. The five consistency levels form a spectrum:
| Level | Guarantee | Latency | Throughput | Best For |
|---|---|---|---|---|
| Strong | Reads always return most recent committed write | Highest (cross-region round-trip) | Lowest | Financial transactions, inventory counts |
| Bounded Staleness | Reads lag behind writes by at most K versions or T time | High | Medium-low | Global apps needing near-strong consistency with better perf |
| Session (default) | Reads are consistent within a session (your own writes) | Medium | Medium | Most applications β user sees their own updates immediately |
| Consistent Prefix | Reads never see out-of-order writes | Low | Medium-high | Social feeds, activity logs (order matters, staleness is OK) |
| Eventual | Reads may return older data, eventually converges | Lowest | Highest | View counters, likes, non-critical telemetry |
π¦ Elenaβs consistency decision: FinSecureβs trading platform needs Strong consistency for account balance reads β a trade must see the latest balance. But their customer activity feed uses Session consistency β each user sees their own activity immediately, but seeing other usersβ activity with a slight delay is acceptable.
Exam tip: Session consistency is the default and most common answer
If the exam scenario doesnβt mention a specific consistency requirement, Session is almost always correct. It provides the βread your own writesβ guarantee that most applications need, with good performance. Choose Strong only when the scenario explicitly mentions βmust always read the latest data across all users/regionsβ β and be ready for the performance/cost tradeoff.
API selection
| API | Data Model | Query Language | Best For |
|---|---|---|---|
| NoSQL | JSON documents | SQL-like queries | New cloud-native apps, most common choice |
| MongoDB | BSON documents | MongoDB query language | Migrating existing MongoDB applications |
| Cassandra | Wide-column (tables) | CQL (Cassandra Query Language) | Migrating Cassandra workloads, high-write IoT |
| Gremlin | Graph (vertices + edges) | Gremlin traversal language | Social networks, recommendation engines, fraud detection |
| Table | Key-value pairs | OData queries | Migrating Azure Table Storage (better perf + global distribution) |
Design rule: Choose the API based on existing application investment, not personal preference. If the app is already built on MongoDB, use the MongoDB API β donβt rewrite for NoSQL API. For new apps, NoSQL API is the recommended default.
Partition key design
The partition key is the most important Cosmos DB design decision. A bad partition key causes hot partitions, poor query performance, and wasted RUs.
| Principle | Good Example | Bad Example |
|---|---|---|
| High cardinality | userId (millions of unique values) | country (only ~200 values β hot partitions) |
| Even distribution | tenantId (uniform data per tenant) | createdDate (all todayβs data in one partition) |
| Query alignment | Key used in most WHERE clauses | Key rarely used in queries (forces cross-partition queries) |
π Marcusβs partition strategy: NovaSaaS uses tenantId as the partition key for most containers β queries are always scoped to a tenant, data is evenly distributed, and cross-partition queries are rare.
Design decision: Hierarchical partition keys
For very large datasets, Cosmos DB supports hierarchical partition keys (up to 3 levels). Example: /tenantId/userId/sessionId. This allows:
- Queries filtered by tenantId β scoped to that tenantβs partitions
- Sub-filtering by userId β further narrowed
- Even data distribution across the hierarchy
Use when a single partition key would create partitions that exceed the 20 GB logical partition limit.
Throughput models
| Factor | Provisioned (Manual) | Provisioned (Autoscale) | Serverless |
|---|---|---|---|
| RU allocation | Fixed RU/s you set | Scales between 10% and max RU/s | No pre-allocation β pay per request |
| Billing | Per-hour for allocated RU/s | Per-hour for the highest RU/s the system scaled to within that hour | Per-RU consumed |
| Minimum cost | 400 RU/s minimum | 10% of max (e.g., 400 if max is 4000) | Zero when idle |
| Best for | Predictable, steady workloads | Variable workloads with known peaks | Dev/test, infrequent access, spiky traffic |
| Max throughput | Unlimited (manual scaling) | Unlimited (set max) | 5,000 RU/s per container, 1 TB storage per container |
Azure Table Storage vs Cosmos DB Table API
| Factor | Azure Table Storage | Cosmos DB Table API |
|---|---|---|
| Performance | Variable latency | Single-digit ms guaranteed |
| Global distribution | Single region (GRS for DR only) | Multi-region active-active |
| Throughput | Per-partition limits | Provisioned or serverless RU/s |
| Secondary indexes | Primary key only | Automatic indexing on all properties |
| Cost | Very low | Higher (premium performance) |
| Best for | Simple key-value, cost-sensitive, low traffic | High-performance global key-value |
Knowledge check
π NovaSaaS is designing a Cosmos DB container for user session data. Each tenant has 100-10,000 users. Most queries filter by tenant first, then by user. Session data per user is typically 2-5 KB. Which partition key should Marcus recommend?
π¦ Elena needs Cosmos DB for a global trading platform. Account balance reads must always return the latest committed write β even across regions. Which consistency level should she recommend?
π¬ Video coming soon
Next up: Semi-structured data is designed β now letβs handle unstructured data β Blob, Data Lake & Azure Files.