Cost Optimization: Throughput Modes and RU Strategy

The cost equation

Simple explanation

Think of throughput modes like phone plans. Serverless is pay-as-you-go (great for light use, expensive at scale). Provisioned is a fixed plan (predictable cost, wasted if you don’t use it). Autoscale is a plan that flexes between a minimum and maximum (best of both worlds, slightly more expensive at peak).

Marcus’s cost challenge

⚙️ Marcus at FinSecure manages three environments with very different patterns:

Production: Predictable traffic with a 2× spike during market hours (9am-4pm)
Staging: Used 8 hours/day, idle 16 hours
Dev/test: Sporadic use, often idle for days

Each environment needs a different throughput strategy.

Throughput modes comparison

Aspect	Serverless	Provisioned (Manual)	Autoscale
Billing	Per-RU consumed	Per-RU/s provisioned (hourly)	Per max RU/s used in each hour
Throughput range	5,000 RU/s per physical partition	Fixed (100+ RU/s)	10%–100% of configured max
Scaling	Automatic burst	Manual adjustment	Automatic within range
Minimum cost	$0 when idle	Always pay for provisioned RU/s	Pay for 10% of max when idle
Regions	Single region only	Multi-region supported	Multi-region supported
SLA	SLA with availability zones (single-region)	99.99% / 99.999%	99.99% / 99.999%
Storage limit	50 GB per logical partition (same as provisioned)	Unlimited	Unlimited
Best for	Dev/test, low/sporadic traffic	Predictable, steady workloads	Variable but somewhat predictable traffic

Serverless deep dive

Serverless pricing:
  - Pay per RU consumed (not provisioned)
  - 5,000 RU/s per physical partition (scales with partitions)
  - Single region only
  - SLA with availability zones in designated regions
  - 50 GB per logical partition (standard Cosmos DB limit)
  - Can convert to provisioned throughput (NoSQL API)

Marcus’s choice: Serverless for dev/test — zero cost when idle, no SLA needed.

Autoscale deep dive

Autoscale example:
  Max RU/s configured: 10,000
  Minimum (10%): 1,000 RU/s
  
  Idle hours: billed at 1,000 RU/s
  Peak hours: scales up to 10,000 RU/s as needed
  
  Cost savings vs manual 10,000 RU/s:
    If traffic is at 10% for 16 hours/day → ~50% cost reduction

Marcus’s choice: Autoscale for production — handles the 2× market-hours spike automatically.

Exam tip: autoscale minimum is 10%

Autoscale always provisions at least 10% of the maximum. If you set max = 10,000 RU/s, the minimum is 1,000 RU/s — you always pay for at least 1,000 even when completely idle. This is why serverless is cheaper for truly sporadic workloads.

The exam tests this: “A developer sets autoscale max to 100,000 RU/s. What’s the minimum billed throughput?” → 10,000 RU/s.

Cost factors beyond throughput

Factor	Cost Impact	Optimisation
Multi-region	Multiply RU/s cost by number of write regions	Use read replicas, not multi-write, unless needed
Consistency	Strong/Bounded = 2× read RU	Use Session for most workloads
Indexing	More indexed paths = higher write RU	Exclude unused paths
Document size	Larger docs = more RU per operation	Keep documents lean
Cross-partition queries	Fan-out multiplies cost	Design for single-partition queries

TTL for automatic cleanup

TTL (Time to Live) automatically deletes expired documents — no background jobs needed:

// Enable TTL on the container (allow per-item TTL)
ContainerProperties props = new ContainerProperties("sessions", "/userId")
{
    DefaultTimeToLive = -1  // container TTL enabled, no default expiry
};

// Set TTL per item (in seconds)
var session = new {
    id = "session-123",
    userId = "user-456",
    data = "...",
    ttl = 3600  // expire after 1 hour
};

Container TTL	Item TTL	Behaviour
Not set	Any	TTL disabled for entire container
`-1`	Not set	Items never expire (opt-in per item)
`-1`	`3600`	Item expires after 1 hour
`86400`	Not set	Items expire after 1 day (container default)
`86400`	`3600`	Item expires after 1 hour (item overrides container)

Cost benefit: Expired documents free storage and reduce backup costs. For provisioned throughput accounts, TTL deletes use leftover RUs not consumed by user requests (no extra billing). For serverless accounts, TTL deletes are charged at the same RU rate as explicit delete operations.

Reserved capacity

For long-term predictable workloads, reserved capacity offers significant discounts:

Term	Discount
1 year	~20% off pay-as-you-go
3 years	~30% off pay-as-you-go

Marcus’s choice: 3-year reservation for production (predictable baseline), autoscale for the spike portion.

🎬 Video walkthrough

Flashcards

Question

What is the minimum billing for autoscale at max 10,000 RU/s?

Click or press Enter to reveal answer

Answer

1,000 RU/s (10% of the maximum). Autoscale always provisions at least 10% of the configured max. Even when completely idle, you pay for the 10% floor.

Click to flip back

Question

What are key characteristics of serverless Cosmos DB?

Click or press Enter to reveal answer

Answer

1) Single region only. 2) 5,000 RU/s per physical partition (scales with more partitions). 3) SLA available with availability zones. Can now convert to provisioned throughput (NoSQL API). No dedicated gateway or integrated cache.

Click to flip back

Question

Do TTL-expired document deletions consume RU/s?

Click or press Enter to reveal answer

Answer

For provisioned throughput, TTL deletions use leftover RUs not consumed by user requests — no extra billing, but they compete for capacity. For serverless, TTL deletes are charged at the same RU rate as explicit delete operations.

Click to flip back

Question

Which consistency levels increase read cost?

Click or press Enter to reveal answer

Answer

Strong and Bounded Staleness — both cost 2× RU for reads. Session, Consistent Prefix, and Eventual cost 1× (standard). Choosing a weaker consistency level saves money on read-heavy workloads.

Click to flip back

Knowledge Check

Marcus's staging environment is used 8 hours/day and completely idle for 16 hours. Currently provisioned at 5,000 RU/s (manual). What's the most cost-effective change?

Knowledge Check

A developer configures autoscale with max 50,000 RU/s. During off-peak hours, traffic drops to near zero. What throughput is billed?

Knowledge Check

Marcus wants to reduce storage costs for his session data that's only relevant for 24 hours. What should he configure?

Next up: DevOps — Infrastructure as Code with Bicep/ARM, deployment patterns, and CI/CD pipelines for Cosmos DB.