Workspace Settings: Your Fabric Foundation
Configure Spark, domains, OneLake, and Dataflows Gen2 settings to build a workspace that works the way your team needs.
What are workspace settings?
Think of a workspace like a shared office floor.
Before your team moves in, someone decides: How many desks? What software on each computer? Who has a key? What’s the Wi-Fi password?
A Fabric workspace is that shared floor — but for data. You configure how much compute power Spark gets, which departments own the workspace, where data lives in OneLake, and how Dataflows Gen2 behave. Get these settings right, and your team works smoothly. Get them wrong, and you’ll spend your days firefighting capacity issues and permission errors.
Spark workspace settings
Apache Spark is the distributed compute engine behind Fabric notebooks and Spark jobs. Workspace-level Spark settings control what happens before a single line of PySpark runs.
What you configure
| Setting | What It Controls | Why It Matters |
|---|---|---|
| Runtime version | Spark version, pre-installed libraries (Delta Lake, pandas, etc.) | Newer runtimes have performance improvements and security patches. Pinning a version prevents surprise breaking changes. |
| Spark environment | Custom library sets (PyPI, Conda, JARs) | Your team needs great_expectations or azure-storage-blob? Define them once in an environment, apply to all notebooks. |
| Default pool | Starter pool, custom pool, or workspace default | Controls how many nodes spin up and how quickly. Starter pools are shared and free; custom pools are dedicated. |
| High concurrency mode | Whether multiple users share a single Spark session | Saves capacity when many analysts run small queries. Bad for heavy ETL — one long job blocks everyone. |
| Automatic log publishing | Sends Spark logs to a lakehouse for analysis | Essential for troubleshooting failed jobs (covered in Domain 3). |
Scenario: Ibrahim's capacity crisis
Ibrahim Al-Rashid is the Data Platform Lead at Nexus Financial Group, a 15,000-person financial services firm. His team of 12 data engineers shares one Fabric workspace on an F64 capacity.
On Monday morning, three engineers run heavy PySpark jobs simultaneously. Spark spins up three separate sessions, each requesting the default pool size. Capacity usage spikes to 95%, and everyone else’s notebooks time out.
Ibrahim’s fix: he enables high concurrency mode for the analytics team’s workspace (they run lightweight queries), but keeps it disabled for the ETL workspace (heavy jobs need dedicated resources). He also switches the analytics workspace to the starter pool (shared, auto-scaled) and gives the ETL workspace a custom pool with a 10-node minimum.
Spark environments vs pools
| Feature | Spark Environment | Spark Pool |
|---|---|---|
| What it controls | Libraries and dependencies (Python packages, JARs, R libraries) | Compute resources (node count, size, timeout) |
| Scope | Attached to a workspace or individual notebook | Attached to a workspace — all Spark jobs use it |
| Example | Install pandas 2.2, great_expectations, custom ML library | Set min 4 / max 20 nodes, auto-scale, 30-min timeout |
| When to change | When your code needs new libraries or a different Spark version | When jobs are too slow (scale up) or too expensive (scale down) |
| Exam focus | Knowing which libraries come pre-installed vs which need an environment | Understanding capacity implications of pool sizing |
Domain workspace settings
Domains are an organisational layer above workspaces. They let you group workspaces by business area — Finance, Marketing, Operations — and apply governance policies at scale.
| Domain Setting | Purpose |
|---|---|
| Domain assignment | Tag a workspace as belonging to “Finance” or “Supply Chain” |
| Override sensitivity labels | Domain admins can enforce a minimum label (e.g., all Finance workspaces must be at least “Confidential”) |
| Trusted workspace delegation | Allow certified workspaces in the domain to access data across OneLake without per-item permissions |
| Default domain | New workspaces auto-inherit this domain unless overridden |
Exam tip: Domains vs workspaces
The exam tests whether you understand the hierarchy: Tenant → Capacity → Domain → Workspace → Items. Domains are a governance boundary, not a security boundary. You still set permissions at the workspace and item level. Domains add consistent labelling, override policies, and admin delegation.
OneLake workspace settings
OneLake is Fabric’s unified storage layer — think of it as a single data lake that every workspace writes to. Workspace-level OneLake settings control caching, access, and data residency.
| Setting | What It Controls |
|---|---|
| OneLake data access | Whether items in this workspace can be accessed via OneLake APIs, Azure Storage APIs, or shortcuts from other workspaces |
| OneLake caching | Local SSD caching for frequently accessed Delta tables — speeds up reads but consumes capacity storage |
| OneLake folder structure | How items are organised in OneLake (lakehouse files/, tables/ structure) |
Scenario: Ibrahim locks down trading data
The trading desk at Nexus Financial processes real-time market data. Ibrahim disables OneLake data access for the trading workspace so that no external shortcuts, ADLS connections, or third-party tools can read the data. Only items within the workspace can access it.
For the marketing analytics workspace, he enables OneLake data access — marketing analysts use Power BI Desktop to connect directly to OneLake tables via the ADLS Gen2 endpoint.
Dataflows Gen2 workspace settings
Dataflows Gen2 bring Power Query’s visual ETL into Fabric. Workspace settings control how they execute.
| Setting | Purpose |
|---|---|
| Compute engine | Choose between the standard engine and the enhanced compute engine (faster for large datasets) |
| Staging lakehouse | Dataflows Gen2 can stage intermediate data in a lakehouse for better performance. You set the default staging lakehouse per workspace. |
| Data destination defaults | Set default output destinations (lakehouse, warehouse, KQL database) |
| Refresh settings | Default refresh schedules and retry policies |
Why staging matters for performance
Without a staging lakehouse, Dataflows Gen2 load data directly from source to destination. For large datasets, this means the entire dataset sits in memory during transformation.
With a staging lakehouse, Dataflows Gen2 first write intermediate results to Delta tables, then transform from there. This enables the enhanced compute engine to process large datasets efficiently using the staged data, instead of pulling everything into the Power Query mashup engine’s memory.
Exam pattern: If a question mentions slow Dataflows Gen2 performance, look for “staging lakehouse” in the answer options.
Ibrahim needs to ensure all workspaces owned by the Finance division have a minimum sensitivity label of 'Confidential'. Where does he configure this?
A data engineer's PySpark notebook is failing because it cannot import the `great_expectations` library. The library is not pre-installed in the Fabric runtime. What should the engineer configure?
Dataflows Gen2 in a workspace are running slowly on large datasets. The admin investigates and finds no staging lakehouse is configured. What will adding a staging lakehouse improve?
🎬 Video coming soon
Next up: Version Control: Git in Fabric — connect your workspace to a Git repo and track every change.