Choosing the Right Data Store
Lakehouse, warehouse, or Eventhouse? Learn the decision framework for picking the right Fabric storage item for every scenario β an exam favourite.
Which data store should I use?
Think of it like choosing transport.
Need to haul containers across the ocean? Use a cargo ship (lakehouse) β it handles any type of cargo, in bulk, and you can sort it when it arrives. Need to deliver precisely packaged goods to a retail store? Use a delivery truck (warehouse) β everything is labelled, sorted, and ready to shelve. Need to stream live sports to millions of viewers? Use a satellite broadcast (Eventhouse) β optimised for real-time, high-speed data.
The exam tests your ability to match the right transport to the right cargo. Get comfortable with the decision framework β it appears in nearly every scenario question.
The three data stores compared
| Feature | Lakehouse | Warehouse | Eventhouse |
|---|---|---|---|
| Query language | PySpark + SQL (read-only) | T-SQL (full DML) | KQL (Kusto Query Language) |
| Write method | Spark notebooks, pipelines, Dataflows Gen2 | SQL INSERT, UPDATE, DELETE, MERGE, COPY INTO | Streaming ingestion, batch ingestion, connectors |
| Best data types | Semi-structured (JSON, CSV), raw files, Delta tables | Structured (relational tables, star schemas) | Time-series, events, logs, telemetry |
| Schema | Schema-on-read (flexible) with Delta enforcement | Schema-on-write (strict, defined upfront) | Schema-on-write (strict, optimised for append) |
| Stored procedures | Not available | Full T-SQL stored procedures | Not available (KQL functions instead) |
| Power BI connection | SQL analytics endpoint (auto-generated) | Default semantic model (auto-generated) | KQL queryset or Direct Query |
| Real-time ingestion | Not designed for real-time | Not designed for real-time | Built for real-time β sub-second ingestion |
| Typical users | Data engineers, data scientists | BI professionals, SQL developers, analysts | IoT engineers, DevOps, security analysts |
Decision framework
Use this flowchart when answering exam questions:
Step 1: Is the data streaming or batch?
- Streaming (real-time events, IoT, logs) β Eventhouse
- Batch β Go to Step 2
Step 2: Does the team primarily use SQL with DML needs?
- Yes (stored procedures, MERGE, UPDATE) β Warehouse
- No β Go to Step 3
Step 3: Is the data semi-structured, unstructured, or needs Spark?
- Yes (JSON, CSV, Parquet, Python/PySpark transformations) β Lakehouse
- No (structured, SQL-focused but no DML) β Lakehouse (SQL analytics endpoint covers read-only SQL needs)
Exam tip: The 'both' answer
Many exam scenarios have answers where you use both a lakehouse and a warehouse. This is often correct. A common pattern:
- Land raw data in a lakehouse (Bronze/Silver layers via Spark)
- Create business-ready tables in a warehouse (Gold layer with stored procedures)
- Connect Power BI to the warehouse (or create a custom semantic model)
Do not assume you must pick only one. Cross-database queries make multi-store architectures seamless.
When each character picks what
| Character | Primary Data Store | Why |
|---|---|---|
| π Anita (FreshCart) | Lakehouse + Warehouse | Lakehouse for raw POS data ingestion (CSV, 15M rows/day via Spark); Warehouse for Gold-layer star schema with stored procedures |
| π’ James (Summit) | Warehouse | SQL-first team, stored procedures for client transforms, cross-database queries to access shared reference data |
| π₯ Dr. Sarah (Pacific Health) | Lakehouse | Semi-structured clinical data (HL7/FHIR JSON), PySpark transformations, sensitivity labels on raw data |
| π° Raj (Atlas Capital) | Warehouse + Eventhouse | Warehouse for financial reporting; Eventhouse for real-time market data feeds and trade monitoring |
Eventhouse: The real-time option
The Eventhouse is less heavily tested in DP-600 than lakehouses and warehouses, but you need to know when to choose it.
What makes Eventhouse different?
| Capability | Details |
|---|---|
| Ingestion speed | Sub-second ingestion of streaming data |
| Query language | KQL (Kusto Query Language) β optimised for time-series queries |
| Data pattern | Append-optimised β data arrives constantly and is rarely updated |
| Integration | Connects to Event Hubs, IoT Hub, Change Data Capture |
| OneLake integration | Can expose data to OneLake via OneLake availability, then accessible via shortcuts |
| Use cases | IoT telemetry, application logs, security events, real-time dashboards |
Scenario: Raj monitors real-time trades
Raj at Atlas Capital receives 50,000 trade events per second from the trading platform. He needs to:
- Detect anomalous trading patterns within seconds
- Query the last 24 hours of trades interactively
- Feed alerts into a real-time Power BI dashboard
This is a perfect Eventhouse scenario: streaming ingestion, time-series queries with KQL, and real-time dashboards. The warehouse handles the nightly P&L calculations; the Eventhouse handles the live monitoring.
OneLake integration for Eventhouse and semantic models
The exam specifically tests OneLake integration for Eventhouse and semantic models (a bullet under βGet dataβ):
- Eventhouse β OneLake: Eventhouse data can be made available in OneLake using OneLake availability (a setting per database/table). This lets other Fabric items (lakehouses, warehouses, semantic models) access Eventhouse data via shortcuts.
- Semantic model β OneLake: Semantic models in Direct Lake mode read directly from Delta tables in OneLake. This is covered in detail in Domain 3.
Dr. Sarah at Pacific Health Network receives patient data in HL7 FHIR format (nested JSON documents). She needs to parse, flatten, and clean this data before building patient outcome reports. Which data store should she land the raw data in?
Raj at Atlas Capital has two workloads: (1) nightly financial reporting with complex stored procedures, and (2) real-time trade monitoring with 50,000 events per second. What is the best Fabric architecture?
Anita at FreshCart stores daily POS data in a lakehouse. Her finance team wants to run UPDATE statements to correct accounting entries. The SQL analytics endpoint does not support UPDATE. What is the simplest solution?
π¬ Video coming soon
Next up: Data Connections & OneLake Catalog β discover data across your organisation and connect to external sources.