Batch vs Streaming: Two Speeds of Data
Some data arrives in scheduled batches. Other data flows in continuously. Understanding the difference is key to designing the right analytics solution.
Two ways data arrives
Batch data is like the morning newspaper. Streaming data is like a live news ticker.
The newspaper arrives once a day with yesterday’s news — that’s batch. You read it over breakfast. The news ticker scrolls constantly with updates as they happen — that’s streaming. You glance at it throughout the day.
Both deliver news. The difference is timing: batch comes in chunks on a schedule; streaming flows continuously in real time.
Batch processing
Batch processing collects data over time and processes it all at once on a schedule.
Priya’s FreshMart example: Sales data from 50 stores is collected throughout the day. At 2 AM, a pipeline extracts the day’s data, transforms it, and loads it into the data warehouse. By morning, yesterday’s dashboard is ready.
Characteristics:
- Data processed in scheduled chunks (hourly, daily, weekly)
- Higher latency (minutes to hours between event and insight)
- Can handle very large volumes efficiently
- Simpler to implement and debug
- Lower cost for high-volume processing
Common batch scenarios:
- Nightly sales reports
- Monthly billing calculations
- Weekly inventory reconciliation
- Historical trend analysis
Stream processing
Stream processing handles data as it arrives — event by event, in real time.
Tom’s Pacific Freight example: GPS data from 200 delivery trucks streams in every 10 seconds. A real-time engine processes each update immediately — tracking live positions, detecting delays, and alerting dispatchers.
Characteristics:
- Data processed event-by-event as it arrives
- Very low latency (milliseconds to seconds)
- Handles continuous data flows (IoT, logs, social media)
- More complex to build and maintain
- Higher cost per event than batch
Common streaming scenarios:
- Live vehicle tracking
- Fraud detection (flag suspicious transactions instantly)
- Social media sentiment monitoring
- IoT sensor alerts (temperature exceeds threshold)
- Real-time dashboards (live website traffic)
| Feature | Batch Processing | Stream Processing |
|---|---|---|
| Data arrival | Collected over time, processed together | Continuous flow, processed immediately |
| Latency | Minutes to hours | Milliseconds to seconds |
| Volume per run | Large chunks | Individual events |
| Complexity | Simpler | More complex (ordering, exactly-once) |
| Cost | Lower per unit of data | Higher per event |
| Azure services | Data Factory, Fabric pipelines, Databricks | Stream Analytics, Fabric Real-Time, Event Hubs |
| Example | Nightly sales report | Live GPS tracking |
Lambda and Kappa architectures
Some systems need both batch and streaming. Two common patterns:
- Lambda architecture: Two parallel paths — a batch layer for historical accuracy and a speed layer for real-time results. Merge the outputs for queries. More complex but handles both needs.
- Kappa architecture: Single streaming pipeline that handles everything. Simplifies the architecture but requires reprocessing capability.
For DP-900, just know these exist — the exam tests concepts, not architecture patterns in detail.
Exam tip: batch vs streaming recognition
The exam describes a scenario and asks which processing type is appropriate:
- “Report on yesterday’s sales” → Batch
- “Alert when a truck deviates from its route” → Streaming
- “Process data every night at 2 AM” → Batch
- “Show live website visitor count” → Streaming
- “Monthly billing calculation” → Batch
- “Detect credit card fraud in real time” → Streaming
Flashcards
Knowledge check
FreshMart wants to alert store managers immediately when a product's stock drops below the reorder threshold. Inventory sensors update every 30 seconds. Which processing approach?
Pacific Freight generates a monthly performance report comparing delivery times across all 200 drivers over the past 30 days. Which processing approach?
🎬 Video coming soon
Next up: Real-Time Analytics on Azure — which Azure services handle streaming data?