🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided DP-700 Domain 1
Domain 1 — Module 1 of 8 13%
1 of 26 overall

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance

DP-700 Study Guide

Domain 1: Implement and Manage an Analytics Solution

  • Workspace Settings: Your Fabric Foundation
  • Version Control: Git in Fabric
  • Deployment Pipelines: Dev to Production
  • Access Controls: Who Gets In
  • Data Security: Control Who Sees What
  • Governance: Labels, Endorsement & Audit
  • Orchestration: Pick the Right Tool
  • Pipeline Patterns: Parameters & Expressions

Domain 2: Ingest and Transform Data

  • Delta Lake: The Heart of Fabric Free
  • Loading Patterns: Full, Incremental & Streaming Free
  • Dimensional Modeling: Prep for Analytics Free
  • Data Stores & Tools: Make the Right Choice Free
  • OneLake Shortcuts: Data Without Duplication
  • Mirroring: Real-Time Database Replication
  • PySpark Transformations: Code Your Pipeline
  • Transform Data with SQL & KQL
  • Eventstreams & Spark Streaming: Real-Time Ingestion
  • Real-Time Intelligence: KQL & Windowing

Domain 3: Monitor and Optimize an Analytics Solution

  • Monitoring & Alerts: Catch Problems Early
  • Troubleshoot Pipelines & Dataflows
  • Troubleshoot Notebooks & SQL
  • Troubleshoot Streaming & Shortcuts
  • Optimize Lakehouse Tables: Delta Tuning
  • Optimize Spark: Speed Up Your Code
  • Optimize Pipelines & Warehouses
  • Optimize Streaming: Real-Time Performance
Domain 1: Implement and Manage an Analytics Solution Premium ⏱ ~14 min read

Workspace Settings: Your Fabric Foundation

Configure Spark, domains, OneLake, and Dataflows Gen2 settings to build a workspace that works the way your team needs.

What are workspace settings?

☕ Simple explanation

Think of a workspace like a shared office floor.

Before your team moves in, someone decides: How many desks? What software on each computer? Who has a key? What’s the Wi-Fi password?

A Fabric workspace is that shared floor — but for data. You configure how much compute power Spark gets, which departments own the workspace, where data lives in OneLake, and how Dataflows Gen2 behave. Get these settings right, and your team works smoothly. Get them wrong, and you’ll spend your days firefighting capacity issues and permission errors.

A Microsoft Fabric workspace is a logical container for items — lakehouses, warehouses, notebooks, pipelines, dataflows, reports, and more. Workspace settings control the behaviour of compute engines, data storage, governance boundaries, and integration tools.

Four categories of workspace settings appear on the DP-700 exam: Spark settings (runtime, libraries, pools), domain settings (organisational grouping and governance), OneLake settings (storage, caching, BCDR), and Dataflows Gen2 settings (compute engine, staging lakehouse). Each affects how data engineers build, run, and manage analytics solutions.

Spark workspace settings

Apache Spark is the distributed compute engine behind Fabric notebooks and Spark jobs. Workspace-level Spark settings control what happens before a single line of PySpark runs.

What you configure

SettingWhat It ControlsWhy It Matters
Runtime versionSpark version, pre-installed libraries (Delta Lake, pandas, etc.)Newer runtimes have performance improvements and security patches. Pinning a version prevents surprise breaking changes.
Spark environmentCustom library sets (PyPI, Conda, JARs)Your team needs great_expectations or azure-storage-blob? Define them once in an environment, apply to all notebooks.
Default poolStarter pool, custom pool, or workspace defaultControls how many nodes spin up and how quickly. Starter pools are shared and free; custom pools are dedicated.
High concurrency modeWhether multiple users share a single Spark sessionSaves capacity when many analysts run small queries. Bad for heavy ETL — one long job blocks everyone.
Automatic log publishingSends Spark logs to a lakehouse for analysisEssential for troubleshooting failed jobs (covered in Domain 3).
💡 Scenario: Ibrahim's capacity crisis

Ibrahim Al-Rashid is the Data Platform Lead at Nexus Financial Group, a 15,000-person financial services firm. His team of 12 data engineers shares one Fabric workspace on an F64 capacity.

On Monday morning, three engineers run heavy PySpark jobs simultaneously. Spark spins up three separate sessions, each requesting the default pool size. Capacity usage spikes to 95%, and everyone else’s notebooks time out.

Ibrahim’s fix: he enables high concurrency mode for the analytics team’s workspace (they run lightweight queries), but keeps it disabled for the ETL workspace (heavy jobs need dedicated resources). He also switches the analytics workspace to the starter pool (shared, auto-scaled) and gives the ETL workspace a custom pool with a 10-node minimum.

Spark environments vs pools

Environments control software; pools control hardware
FeatureSpark EnvironmentSpark Pool
What it controlsLibraries and dependencies (Python packages, JARs, R libraries)Compute resources (node count, size, timeout)
ScopeAttached to a workspace or individual notebookAttached to a workspace — all Spark jobs use it
ExampleInstall pandas 2.2, great_expectations, custom ML librarySet min 4 / max 20 nodes, auto-scale, 30-min timeout
When to changeWhen your code needs new libraries or a different Spark versionWhen jobs are too slow (scale up) or too expensive (scale down)
Exam focusKnowing which libraries come pre-installed vs which need an environmentUnderstanding capacity implications of pool sizing

Domain workspace settings

Domains are an organisational layer above workspaces. They let you group workspaces by business area — Finance, Marketing, Operations — and apply governance policies at scale.

Domain SettingPurpose
Domain assignmentTag a workspace as belonging to “Finance” or “Supply Chain”
Override sensitivity labelsDomain admins can enforce a minimum label (e.g., all Finance workspaces must be at least “Confidential”)
Trusted workspace delegationAllow certified workspaces in the domain to access data across OneLake without per-item permissions
Default domainNew workspaces auto-inherit this domain unless overridden
💡 Exam tip: Domains vs workspaces

The exam tests whether you understand the hierarchy: Tenant → Capacity → Domain → Workspace → Items. Domains are a governance boundary, not a security boundary. You still set permissions at the workspace and item level. Domains add consistent labelling, override policies, and admin delegation.

OneLake workspace settings

OneLake is Fabric’s unified storage layer — think of it as a single data lake that every workspace writes to. Workspace-level OneLake settings control caching, access, and data residency.

SettingWhat It Controls
OneLake data accessWhether items in this workspace can be accessed via OneLake APIs, Azure Storage APIs, or shortcuts from other workspaces
OneLake cachingLocal SSD caching for frequently accessed Delta tables — speeds up reads but consumes capacity storage
OneLake folder structureHow items are organised in OneLake (lakehouse files/, tables/ structure)
💡 Scenario: Ibrahim locks down trading data

The trading desk at Nexus Financial processes real-time market data. Ibrahim disables OneLake data access for the trading workspace so that no external shortcuts, ADLS connections, or third-party tools can read the data. Only items within the workspace can access it.

For the marketing analytics workspace, he enables OneLake data access — marketing analysts use Power BI Desktop to connect directly to OneLake tables via the ADLS Gen2 endpoint.

Dataflows Gen2 workspace settings

Dataflows Gen2 bring Power Query’s visual ETL into Fabric. Workspace settings control how they execute.

SettingPurpose
Compute engineChoose between the standard engine and the enhanced compute engine (faster for large datasets)
Staging lakehouseDataflows Gen2 can stage intermediate data in a lakehouse for better performance. You set the default staging lakehouse per workspace.
Data destination defaultsSet default output destinations (lakehouse, warehouse, KQL database)
Refresh settingsDefault refresh schedules and retry policies
ℹ️ Why staging matters for performance

Without a staging lakehouse, Dataflows Gen2 load data directly from source to destination. For large datasets, this means the entire dataset sits in memory during transformation.

With a staging lakehouse, Dataflows Gen2 first write intermediate results to Delta tables, then transform from there. This enables the enhanced compute engine to process large datasets efficiently using the staged data, instead of pulling everything into the Power Query mashup engine’s memory.

Exam pattern: If a question mentions slow Dataflows Gen2 performance, look for “staging lakehouse” in the answer options.


Question

What is the difference between a Spark environment and a Spark pool?

Click or press Enter to reveal answer

Answer

A Spark environment controls software (libraries, runtime version). A Spark pool controls hardware (node count, size, auto-scaling, timeout). You can have multiple environments on the same pool.

Click to flip back

Question

What is a Fabric domain?

Click or press Enter to reveal answer

Answer

An organisational grouping layer above workspaces. Domains let you tag workspaces by business area (Finance, Marketing) and apply governance policies like minimum sensitivity labels and admin delegation.

Click to flip back

Question

What does disabling OneLake data access do?

Click or press Enter to reveal answer

Answer

It prevents items in other workspaces from creating shortcuts to this workspace's data. It also blocks external tools (ADLS APIs, Azure Storage Explorer) from reading OneLake files. Only items within the workspace can access the data.

Click to flip back

Question

What is a staging lakehouse in Dataflows Gen2?

Click or press Enter to reveal answer

Answer

An intermediate storage location where Dataflows Gen2 write temporary Delta tables during transformation. This enables query folding at scale and improves performance for large datasets.

Click to flip back


Knowledge Check

Ibrahim needs to ensure all workspaces owned by the Finance division have a minimum sensitivity label of 'Confidential'. Where does he configure this?

Knowledge Check

A data engineer's PySpark notebook is failing because it cannot import the `great_expectations` library. The library is not pre-installed in the Fabric runtime. What should the engineer configure?

Knowledge Check

Dataflows Gen2 in a workspace are running slowly on large datasets. The admin investigates and finds no staging lakehouse is configured. What will adding a staging lakehouse improve?

🎬 Video coming soon

Next up: Version Control: Git in Fabric — connect your workspace to a Git repo and track every change.

Next →

Version Control: Git in Fabric

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.