Choosing the Right Data Store

Which data store should I use?

Simple explanation

Think of it like choosing transport.

Need to haul containers across the ocean? Use a cargo ship (lakehouse) — it handles any type of cargo, in bulk, and you can sort it when it arrives. Need to deliver precisely packaged goods to a retail store? Use a delivery truck (warehouse) — everything is labelled, sorted, and ready to shelve. Need to stream live sports to millions of viewers? Use a satellite broadcast (Eventhouse) — optimised for real-time, high-speed data.

The exam tests your ability to match the right transport to the right cargo. Get comfortable with the decision framework — it appears in nearly every scenario question.

The three data stores compared

Lakehouses and warehouses store data natively in OneLake. Eventhouse exposes data to OneLake when OneLake availability is enabled.
Feature	Lakehouse	Warehouse	Eventhouse
Query language	PySpark + SQL (read-only)	T-SQL (full DML)	KQL (Kusto Query Language)
Write method	Spark notebooks, pipelines, Dataflows Gen2	SQL INSERT, UPDATE, DELETE, MERGE, COPY INTO	Streaming ingestion, batch ingestion, connectors
Best data types	Semi-structured (JSON, CSV), raw files, Delta tables	Structured (relational tables, star schemas)	Time-series, events, logs, telemetry
Schema	Schema-on-read (flexible) with Delta enforcement	Schema-on-write (strict, defined upfront)	Schema-on-write (strict, optimised for append)
Stored procedures	Not available	Full T-SQL stored procedures	Not available (KQL functions instead)
Power BI connection	SQL analytics endpoint (auto-generated)	Default semantic model (auto-generated)	KQL queryset or Direct Query
Real-time ingestion	Not designed for real-time	Not designed for real-time	Built for real-time — sub-second ingestion
Typical users	Data engineers, data scientists	BI professionals, SQL developers, analysts	IoT engineers, DevOps, security analysts

Decision framework

Use this flowchart when answering exam questions:

Step 1: Is the data streaming or batch?

Streaming (real-time events, IoT, logs) → Eventhouse
Batch → Go to Step 2

Step 2: Does the team primarily use SQL with DML needs?

Yes (stored procedures, MERGE, UPDATE) → Warehouse
No → Go to Step 3

Step 3: Is the data semi-structured, unstructured, or needs Spark?

Yes (JSON, CSV, Parquet, Python/PySpark transformations) → Lakehouse
No (structured, SQL-focused but no DML) → Lakehouse (SQL analytics endpoint covers read-only SQL needs)

Exam tip: The 'both' answer

Many exam scenarios have answers where you use both a lakehouse and a warehouse. This is often correct. A common pattern:

Land raw data in a lakehouse (Bronze/Silver layers via Spark)
Create business-ready tables in a warehouse (Gold layer with stored procedures)
Connect Power BI to the warehouse (or create a custom semantic model)

Do not assume you must pick only one. Cross-database queries make multi-store architectures seamless.

When each character picks what

Character	Primary Data Store	Why
🛒 Anita (FreshCart)	Lakehouse + Warehouse	Lakehouse for raw POS data ingestion (CSV, 15M rows/day via Spark); Warehouse for Gold-layer star schema with stored procedures
🏢 James (Summit)	Warehouse	SQL-first team, stored procedures for client transforms, cross-database queries to access shared reference data
🏥 Dr. Sarah (Pacific Health)	Lakehouse	Semi-structured clinical data (HL7/FHIR JSON), PySpark transformations, sensitivity labels on raw data
💰 Raj (Atlas Capital)	Warehouse + Eventhouse	Warehouse for financial reporting; Eventhouse for real-time market data feeds and trade monitoring

Eventhouse: The real-time option

The Eventhouse is less heavily tested in DP-600 than lakehouses and warehouses, but you need to know when to choose it.

What makes Eventhouse different?

Capability	Details
Ingestion speed	Sub-second ingestion of streaming data
Query language	KQL (Kusto Query Language) — optimised for time-series queries
Data pattern	Append-optimised — data arrives constantly and is rarely updated
Integration	Connects to Event Hubs, IoT Hub, Change Data Capture
OneLake integration	Can expose data to OneLake via OneLake availability, then accessible via shortcuts
Use cases	IoT telemetry, application logs, security events, real-time dashboards

Scenario: Raj monitors real-time trades

Raj at Atlas Capital receives 50,000 trade events per second from the trading platform. He needs to:

Detect anomalous trading patterns within seconds
Query the last 24 hours of trades interactively
Feed alerts into a real-time Power BI dashboard

This is a perfect Eventhouse scenario: streaming ingestion, time-series queries with KQL, and real-time dashboards. The warehouse handles the nightly P&L calculations; the Eventhouse handles the live monitoring.

OneLake integration for Eventhouse and semantic models

The exam specifically tests OneLake integration for Eventhouse and semantic models (a bullet under “Get data”):

Eventhouse → OneLake: Eventhouse data can be made available in OneLake using OneLake availability (a setting per database/table). This lets other Fabric items (lakehouses, warehouses, semantic models) access Eventhouse data via shortcuts.
Semantic model → OneLake: Semantic models in Direct Lake mode read directly from Delta tables in OneLake. This is covered in detail in Domain 3.

Question

Name the three primary data stores in Microsoft Fabric.

Click or press Enter to reveal answer

Answer

1. Lakehouse — Spark-based, Delta tables, read-only SQL endpoint. 2. Warehouse — full T-SQL DML, stored procedures, relational analytics. 3. Eventhouse — KQL-based, real-time streaming, time-series analytics. All three store data in OneLake.

Click to flip back

Question

When should you choose a Fabric Warehouse over a lakehouse?

Click or press Enter to reveal answer

Answer

Choose a warehouse when: (1) your team works primarily in SQL, (2) you need stored procedures, (3) you need INSERT/UPDATE/DELETE/MERGE, or (4) you are building a SQL-based star schema with full DML. Choose a lakehouse for Spark, semi-structured data, or Python transformations.

Click to flip back

Question

What is an Eventhouse best suited for?

Click or press Enter to reveal answer

Answer

Real-time streaming analytics. Eventhouse uses KQL (Kusto Query Language) and is optimised for append-heavy, time-series workloads — IoT telemetry, application logs, security events, and real-time dashboards. It supports sub-second ingestion.

Click to flip back

Knowledge Check

Dr. Sarah at Pacific Health Network receives patient data in HL7 FHIR format (nested JSON documents). She needs to parse, flatten, and clean this data before building patient outcome reports. Which data store should she land the raw data in?

Knowledge Check

Raj at Atlas Capital has two workloads: (1) nightly financial reporting with complex stored procedures, and (2) real-time trade monitoring with 50,000 events per second. What is the best Fabric architecture?

Knowledge Check

Anita at FreshCart stores daily POS data in a lakehouse. Her finance team wants to run UPDATE statements to correct accounting entries. The SQL analytics endpoint does not support UPDATE. What is the simplest solution?

Next up: Data Connections & OneLake Catalog — discover data across your organisation and connect to external sources.