Troubleshoot Streaming & Shortcuts
Identify and resolve Eventhouse, Eventstream, and OneLake shortcut errors — ingestion failures, processing lag, missing data, and connectivity issues.
Streaming and shortcut errors
Think of a live TV broadcast that stutters, freezes, or shows a blank screen.
Streaming errors in Fabric are similar: events stop arriving (Eventstream errors), queries against real-time data fail (Eventhouse errors), or data behind a shortcut becomes unreachable (shortcut errors). Each has different causes and fixes.
Eventstream errors
| Error Pattern | Cause | Resolution |
|---|---|---|
| Processing lag increasing | Events arriving faster than processing capacity | Scale up Eventstream capacity, simplify transformations, add partitions to Event Hub |
| Events not arriving | Source disconnected, Event Hub key expired, consumer group full | Check source connectivity, refresh credentials, verify consumer group isn't at max readers |
| Schema validation errors | Incoming events don't match expected schema | Update schema in Eventstream, add error handling for malformed events |
| Destination write failures | Target KQL database or lakehouse is unavailable or full | Check destination status, verify permissions, check capacity |
| Duplicate events | At-least-once delivery combined with source retries | Implement deduplication at the destination (MERGE or distinct on event ID) |
Eventhouse errors
| Error | Cause | Resolution |
|---|---|---|
| Ingestion failure | Data format mismatch, column mapping error | Check ingestion mapping, verify source schema matches target table |
| Query timeout | Query scanning too much data, missing materialized views | Add time filters, create materialized views for common queries |
| Table hot limit | Too many concurrent ingestions to one table | Spread ingestion across multiple tables or increase capacity |
| Retention policy conflict | Data deleted by retention before queries expect it | Extend retention period or archive data to lakehouse before deletion |
OneLake shortcut errors
| Error | Cause | Resolution |
|---|---|---|
| Access denied | Source credentials expired or user lacks source permissions | Re-authenticate the shortcut connection, verify source-side permissions |
| Source unavailable | External storage (S3, ADLS, GCS) is down or unreachable | Check source service health, verify network connectivity |
| Schema change | Source Delta table schema changed (columns added/removed) | Refresh the shortcut metadata, verify downstream queries handle new schema |
| Performance degradation | Large cross-cloud reads (S3/GCS latency) | Enable query acceleration, or consider mirroring for frequently accessed data |
Scenario: Zoe troubleshoots a streaming gap
WaveMedia’s real-time dashboard shows a 10-minute gap in playback data. Zoe investigates:
- Eventstream health: Processing lag spiked to 8 minutes at 2:15 PM, then events stopped
- Source check: Event Hub shows events are still being produced (sender metrics normal)
- Consumer group: The Eventstream’s consumer group shows “connection closed” at 2:15 PM
- Root cause: The Event Hub access key was rotated at 2:15 PM as part of a scheduled security rotation
- Fix: Update the Eventstream’s Event Hub connection with the new key → events resume flowing
Lesson: Coordinate key rotations with downstream consumers. Better: use managed identity instead of keys.
Zoe's Eventstream suddenly stops ingesting events. The Event Hub shows events are still being produced. What should she check first?
A KQL query against an Eventhouse table returns 'query timeout' on a table with 2 billion rows. The query has no time filter. What is the best fix?
🎬 Video coming soon
Next up: Optimize Lakehouse Tables: Delta Tuning — use OPTIMIZE, VACUUM, Z-ordering, and V-ordering to make your Delta tables fast.