Optimize Streaming: Real-Time Performance
Tune Eventstreams, Eventhouses, and streaming queries for maximum throughput. Optimize KQL queries, manage retention, and scale real-time workloads.
Streaming optimization
Think of a highway during rush hour.
Traffic jams happen when too many cars enter too fast (ingestion overload), when lanes merge poorly (processing bottleneck), or when everyone exits at the same ramp (hot partition). Optimizing streaming is about widening the highway, smoothing the merges, and distributing exits.
Optimizing Eventstreams
| Technique | Impact | How |
|---|---|---|
| Add Event Hub partitions | More parallel consumers | Increase partition count on the source Event Hub (cannot decrease after creation) |
| Simplify transformations | Reduce processing overhead | Move complex transforms to downstream notebooks, keep Eventstream transforms light |
| Filter early | Less data to process and route | Drop unnecessary events at the Eventstream level before routing to destinations |
| Multiple destinations | Fan-out without duplication | One Eventstream β KQL DB + lakehouse + derived stream (single read from source) |
| Monitor lag | Catch bottlenecks early | If processing lag grows, the stream canβt keep up β scale or simplify |
Optimizing Eventhouses
Ingestion optimization
| Technique | Effect |
|---|---|
| Batch ingestion | Combine small events into larger batches before ingestion (reduces overhead per event) |
| Streaming ingestion | For lowest latency β events available for query within seconds (higher resource cost) |
| Ingestion mapping | Define explicit column mappings to avoid schema inference overhead |
| Partitioning | Distribute data across partitions for parallel ingestion |
Retention policies
// Set retention to 30 days on a table
.alter table PlaybackEvents policy retention
{ "SoftDeletePeriod": "30.00:00:00" }
Trade-off: Longer retention = more storage + slower full-table scans. Shorter retention = less data available for historical queries.
Strategy: Set hot data retention (30-90 days) on the Eventhouse. Archive older data to a lakehouse via an export pipeline for long-term storage.
Materialized views (pre-aggregation)
.create materialized-view HourlyPlaybacks on table PlaybackEvents
{
PlaybackEvents
| summarize ViewCount = count(), AvgDuration = avg(WatchDuration)
by bin(Timestamp, 1h), VideoId
}
Materialized views pre-compute aggregations incrementally. Dashboard queries that would scan billions of rows instead read pre-aggregated results β instant response.
KQL query optimization
Time filters first
// Good: time filter first β engine prunes data immediately
PlaybackEvents
| where Timestamp > ago(1h) // Prune to last hour first
| where EventType == "video_play" // Then filter by type
| summarize count() by VideoTitle
// Bad: filter by type first β scans more data before time pruning
PlaybackEvents
| where EventType == "video_play"
| where Timestamp > ago(1h)
| summarize count() by VideoTitle
KQLβs engine is optimized for time-based partitioning. Always put time filters first.
Query optimization patterns
| Pattern | Slow | Fast |
|---|---|---|
| Time filter position | After other filters | First filter in the query |
| Columns selected | project * (all columns) | project only needed columns |
| Repeated aggregations | Compute same aggregation in every query | Use materialized views for common patterns |
| String operations | contains (substring search) | has (word-boundary match β uses index) |
| Join direction | Large table | join SmallTable | SmallTable | join LargeTable (small on left) |
Scenario: Zoe optimizes WaveMedia's dashboard
Zoeβs real-time dashboard runs 4 KQL queries every 10 seconds. With 500M events in the Eventhouse, queries take 3-5 seconds each β too slow for a real-time experience.
Optimizations:
- Materialized views for the 3 most common aggregation patterns β queries drop from seconds to milliseconds
- Time filter first on the 4th query (was filtering by VideoTitle first) β 4x faster
- Retention policy set to 7 days (only needs recent data for the dashboard; historical data archived to lakehouse) β reduces scan range
- project only needed columns (was using
project *) β 30% less data transfer
Total effect: all 4 queries complete in under 200ms. Dashboard feels instant.
End-to-end optimization strategy
| Layer | Optimize | Key Metric |
|---|---|---|
| Source | Event Hub partitions, producer batching | Events/second at source |
| Eventstream | Simple transforms, early filtering | Processing lag |
| Eventhouse | Ingestion mapping, retention, materialized views | Query latency |
| KQL queries | Time filters first, project needed columns, use has not contains | Query duration |
| Dashboard | Refresh interval, query count | End-user response time |
Zoe's Eventhouse dashboard queries take 5+ seconds on a table with 2 billion rows. The most common query aggregates hourly view counts. What single optimization would have the most impact?
A KQL query uses 'contains' to search for video titles. Switching to 'has' improves performance by 8x. Why?
π¬ Video coming soon
π Congratulations! Youβve completed all 26 modules of the DP-700 study guide. Head back to the DP-700 landing page to review your progress and start the practice questions.