Workspace Settings: Your Fabric Foundation

What are workspace settings?

Simple explanation

Think of a workspace like a shared office floor.

Before your team moves in, someone decides: How many desks? What software on each computer? Who has a key? What’s the Wi-Fi password?

A Fabric workspace is that shared floor — but for data. You configure how much compute power Spark gets, which departments own the workspace, where data lives in OneLake, and how Dataflows Gen2 behave. Get these settings right, and your team works smoothly. Get them wrong, and you’ll spend your days firefighting capacity issues and permission errors.

Spark workspace settings

Apache Spark is the distributed compute engine behind Fabric notebooks and Spark jobs. Workspace-level Spark settings control what happens before a single line of PySpark runs.

What you configure

Setting	What It Controls	Why It Matters
Runtime version	Spark version, pre-installed libraries (Delta Lake, pandas, etc.)	Newer runtimes have performance improvements and security patches. Pinning a version prevents surprise breaking changes.
Spark environment	Custom library sets (PyPI, Conda, JARs)	Your team needs `great_expectations` or `azure-storage-blob`? Define them once in an environment, apply to all notebooks.
Default pool	Starter pool, custom pool, or workspace default	Controls how many nodes spin up and how quickly. Starter pools are shared and free; custom pools are dedicated.
High concurrency mode	Whether multiple users share a single Spark session	Saves capacity when many analysts run small queries. Bad for heavy ETL — one long job blocks everyone.
Automatic log publishing	Sends Spark logs to a lakehouse for analysis	Essential for troubleshooting failed jobs (covered in Domain 3).

Scenario: Ibrahim's capacity crisis

Ibrahim Al-Rashid is the Data Platform Lead at Nexus Financial Group, a 15,000-person financial services firm. His team of 12 data engineers shares one Fabric workspace on an F64 capacity.

On Monday morning, three engineers run heavy PySpark jobs simultaneously. Spark spins up three separate sessions, each requesting the default pool size. Capacity usage spikes to 95%, and everyone else’s notebooks time out.

Ibrahim’s fix: he enables high concurrency mode for the analytics team’s workspace (they run lightweight queries), but keeps it disabled for the ETL workspace (heavy jobs need dedicated resources). He also switches the analytics workspace to the starter pool (shared, auto-scaled) and gives the ETL workspace a custom pool with a 10-node minimum.

Spark environments vs pools

Environments control software; pools control hardware
Feature	Spark Environment	Spark Pool
What it controls	Libraries and dependencies (Python packages, JARs, R libraries)	Compute resources (node count, size, timeout)
Scope	Attached to a workspace or individual notebook	Attached to a workspace — all Spark jobs use it
Example	Install pandas 2.2, great_expectations, custom ML library	Set min 4 / max 20 nodes, auto-scale, 30-min timeout
When to change	When your code needs new libraries or a different Spark version	When jobs are too slow (scale up) or too expensive (scale down)
Exam focus	Knowing which libraries come pre-installed vs which need an environment	Understanding capacity implications of pool sizing

Domain workspace settings

Domains are an organisational layer above workspaces. They let you group workspaces by business area — Finance, Marketing, Operations — and apply governance policies at scale.

Domain Setting	Purpose
Domain assignment	Tag a workspace as belonging to “Finance” or “Supply Chain”
Override sensitivity labels	Domain admins can enforce a minimum label (e.g., all Finance workspaces must be at least “Confidential”)
Trusted workspace delegation	Allow certified workspaces in the domain to access data across OneLake without per-item permissions
Default domain	New workspaces auto-inherit this domain unless overridden

Exam tip: Domains vs workspaces

The exam tests whether you understand the hierarchy: Tenant → Capacity → Domain → Workspace → Items. Domains are a governance boundary, not a security boundary. You still set permissions at the workspace and item level. Domains add consistent labelling, override policies, and admin delegation.

OneLake workspace settings

OneLake is Fabric’s unified storage layer — think of it as a single data lake that every workspace writes to. Workspace-level OneLake settings control caching, access, and data residency.

Setting	What It Controls
OneLake data access	Whether items in this workspace can be accessed via OneLake APIs, Azure Storage APIs, or shortcuts from other workspaces
OneLake caching	Local SSD caching for frequently accessed Delta tables — speeds up reads but consumes capacity storage
OneLake folder structure	How items are organised in OneLake (lakehouse files/, tables/ structure)

Scenario: Ibrahim locks down trading data

The trading desk at Nexus Financial processes real-time market data. Ibrahim disables OneLake data access for the trading workspace so that no external shortcuts, ADLS connections, or third-party tools can read the data. Only items within the workspace can access it.

For the marketing analytics workspace, he enables OneLake data access — marketing analysts use Power BI Desktop to connect directly to OneLake tables via the ADLS Gen2 endpoint.

Dataflows Gen2 workspace settings

Dataflows Gen2 bring Power Query’s visual ETL into Fabric. Workspace settings control how they execute.

Setting	Purpose
Compute engine	Choose between the standard engine and the enhanced compute engine (faster for large datasets)
Staging lakehouse	Dataflows Gen2 can stage intermediate data in a lakehouse for better performance. You set the default staging lakehouse per workspace.
Data destination defaults	Set default output destinations (lakehouse, warehouse, KQL database)
Refresh settings	Default refresh schedules and retry policies

Why staging matters for performance

Without a staging lakehouse, Dataflows Gen2 load data directly from source to destination. For large datasets, this means the entire dataset sits in memory during transformation.

With a staging lakehouse, Dataflows Gen2 first write intermediate results to Delta tables, then transform from there. This enables the enhanced compute engine to process large datasets efficiently using the staged data, instead of pulling everything into the Power Query mashup engine’s memory.

Exam pattern: If a question mentions slow Dataflows Gen2 performance, look for “staging lakehouse” in the answer options.

Question

What is the difference between a Spark environment and a Spark pool?

Click or press Enter to reveal answer

Answer

A Spark environment controls software (libraries, runtime version). A Spark pool controls hardware (node count, size, auto-scaling, timeout). You can have multiple environments on the same pool.

Click to flip back

Question

What is a Fabric domain?

Click or press Enter to reveal answer

Answer

An organisational grouping layer above workspaces. Domains let you tag workspaces by business area (Finance, Marketing) and apply governance policies like minimum sensitivity labels and admin delegation.

Click to flip back

Question

What does disabling OneLake data access do?

Click or press Enter to reveal answer

Answer

It prevents items in other workspaces from creating shortcuts to this workspace's data. It also blocks external tools (ADLS APIs, Azure Storage Explorer) from reading OneLake files. Only items within the workspace can access the data.

Click to flip back

Question

What is a staging lakehouse in Dataflows Gen2?

Click or press Enter to reveal answer

Answer

An intermediate storage location where Dataflows Gen2 write temporary Delta tables during transformation. This enables query folding at scale and improves performance for large datasets.

Click to flip back

Knowledge Check

Ibrahim needs to ensure all workspaces owned by the Finance division have a minimum sensitivity label of 'Confidential'. Where does he configure this?

Knowledge Check

A data engineer's PySpark notebook is failing because it cannot import the `great_expectations` library. The library is not pre-installed in the Fabric runtime. What should the engineer configure?

Knowledge Check

Dataflows Gen2 in a workspace are running slowly on large datasets. The admin investigates and finds no staging lakehouse is configured. What will adding a staging lakehouse improve?

Next up: Version Control: Git in Fabric — connect your workspace to a Git repo and track every change.