Cost Optimization: Throughput Modes and RU Strategy
Choose between serverless, provisioned manual, and autoscale throughput β then apply cost reduction strategies including TTL cleanup, reserved capacity, consistency choices, and indexing optimisation.
The cost equation
Think of throughput modes like phone plans. Serverless is pay-as-you-go (great for light use, expensive at scale). Provisioned is a fixed plan (predictable cost, wasted if you donβt use it). Autoscale is a plan that flexes between a minimum and maximum (best of both worlds, slightly more expensive at peak).
Marcusβs cost challenge
βοΈ Marcus at FinSecure manages three environments with very different patterns:
- Production: Predictable traffic with a 2Γ spike during market hours (9am-4pm)
- Staging: Used 8 hours/day, idle 16 hours
- Dev/test: Sporadic use, often idle for days
Each environment needs a different throughput strategy.
Throughput modes comparison
| Aspect | Serverless | Provisioned (Manual) | Autoscale |
|---|---|---|---|
| Billing | Per-RU consumed | Per-RU/s provisioned (hourly) | Per max RU/s used in each hour |
| Throughput range | 5,000 RU/s per physical partition | Fixed (100+ RU/s) | 10%β100% of configured max |
| Scaling | Automatic burst | Manual adjustment | Automatic within range |
| Minimum cost | $0 when idle | Always pay for provisioned RU/s | Pay for 10% of max when idle |
| Regions | Single region only | Multi-region supported | Multi-region supported |
| SLA | SLA with availability zones (single-region) | 99.99% / 99.999% | 99.99% / 99.999% |
| Storage limit | 50 GB per logical partition (same as provisioned) | Unlimited | Unlimited |
| Best for | Dev/test, low/sporadic traffic | Predictable, steady workloads | Variable but somewhat predictable traffic |
Serverless deep dive
Serverless pricing:
- Pay per RU consumed (not provisioned)
- 5,000 RU/s per physical partition (scales with partitions)
- Single region only
- SLA with availability zones in designated regions
- 50 GB per logical partition (standard Cosmos DB limit)
- Can convert to provisioned throughput (NoSQL API)
Marcusβs choice: Serverless for dev/test β zero cost when idle, no SLA needed.
Autoscale deep dive
Autoscale example:
Max RU/s configured: 10,000
Minimum (10%): 1,000 RU/s
Idle hours: billed at 1,000 RU/s
Peak hours: scales up to 10,000 RU/s as needed
Cost savings vs manual 10,000 RU/s:
If traffic is at 10% for 16 hours/day β ~50% cost reduction
Marcusβs choice: Autoscale for production β handles the 2Γ market-hours spike automatically.
Exam tip: autoscale minimum is 10%
Autoscale always provisions at least 10% of the maximum. If you set max = 10,000 RU/s, the minimum is 1,000 RU/s β you always pay for at least 1,000 even when completely idle. This is why serverless is cheaper for truly sporadic workloads.
The exam tests this: βA developer sets autoscale max to 100,000 RU/s. Whatβs the minimum billed throughput?β β 10,000 RU/s.
Cost factors beyond throughput
| Factor | Cost Impact | Optimisation |
|---|---|---|
| Multi-region | Multiply RU/s cost by number of write regions | Use read replicas, not multi-write, unless needed |
| Consistency | Strong/Bounded = 2Γ read RU | Use Session for most workloads |
| Indexing | More indexed paths = higher write RU | Exclude unused paths |
| Document size | Larger docs = more RU per operation | Keep documents lean |
| Cross-partition queries | Fan-out multiplies cost | Design for single-partition queries |
TTL for automatic cleanup
TTL (Time to Live) automatically deletes expired documents β no background jobs needed:
// Enable TTL on the container (allow per-item TTL)
ContainerProperties props = new ContainerProperties("sessions", "/userId")
{
DefaultTimeToLive = -1 // container TTL enabled, no default expiry
};
// Set TTL per item (in seconds)
var session = new {
id = "session-123",
userId = "user-456",
data = "...",
ttl = 3600 // expire after 1 hour
};
| Container TTL | Item TTL | Behaviour |
|---|---|---|
| Not set | Any | TTL disabled for entire container |
-1 | Not set | Items never expire (opt-in per item) |
-1 | 3600 | Item expires after 1 hour |
86400 | Not set | Items expire after 1 day (container default) |
86400 | 3600 | Item expires after 1 hour (item overrides container) |
Cost benefit: Expired documents free storage and reduce backup costs. For provisioned throughput accounts, TTL deletes use leftover RUs not consumed by user requests (no extra billing). For serverless accounts, TTL deletes are charged at the same RU rate as explicit delete operations.
Reserved capacity
For long-term predictable workloads, reserved capacity offers significant discounts:
| Term | Discount |
|---|---|
| 1 year | ~20% off pay-as-you-go |
| 3 years | ~30% off pay-as-you-go |
Marcusβs choice: 3-year reservation for production (predictable baseline), autoscale for the spike portion.
π¬ Video walkthrough
π¬ Video coming soon
Cost Optimization β DP-420 Module 26
Cost Optimization β DP-420 Module 26
~16 minFlashcards
Knowledge Check
Marcus's staging environment is used 8 hours/day and completely idle for 16 hours. Currently provisioned at 5,000 RU/s (manual). What's the most cost-effective change?
A developer configures autoscale with max 50,000 RU/s. During off-peak hours, traffic drops to near zero. What throughput is billed?
Marcus wants to reduce storage costs for his session data that's only relevant for 24 hours. What should he configure?
Next up: DevOps β Infrastructure as Code with Bicep/ARM, deployment patterns, and CI/CD pipelines for Cosmos DB.