Unity Catalog: The Three-Level Namespace
Unity Catalog organises every data asset in a three-level hierarchy: catalog > schema > object. Master the naming conventions, structure, and governance that underpins every other domain in DP-750.
What is Unity Catalog?
Unity Catalog is the filing system for your entire lakehouse.
Imagine a massive office building. Unity Catalog is the building directory. It tells you:
- Which floor (catalog) — e.g., “Sales Department,” “Finance Department”
- Which room (schema) — e.g., “Sales Raw Data,” “Sales Reports”
- Which file cabinet (table, view, volume) — the actual data
Without Unity Catalog, data is scattered across clusters with no central directory. With it, every table, view, and file has a registered address that anyone authorised can find.
The three-level namespace
Every data object in Unity Catalog has a fully qualified name:
catalog.schema.object
| Level | Purpose | Analogy | Example |
|---|---|---|---|
| Catalog | Top-level container — typically one per business domain or environment | Floor in a building | sales, finance, dev_sandbox |
| Schema (database) | Groups related objects within a catalog | Room on a floor | sales.raw, sales.curated, sales.reports |
| Object | The actual data — tables, views, volumes, functions | File cabinet in a room | sales.curated.daily_revenue |
-- Fully qualified reference
SELECT * FROM sales.curated.daily_revenue;
-- Set default catalog and schema to avoid typing the full path
USE CATALOG sales;
USE SCHEMA curated;
SELECT * FROM daily_revenue; -- now just the table name
Naming conventions
The exam tests your ability to design naming conventions for different requirements:
By environment (isolation)
| Pattern | Example | Use Case |
|---|---|---|
| Separate catalogs per environment | dev_sales, staging_sales, prod_sales | Full data isolation between dev/staging/prod |
| Separate schemas per environment | sales.dev_raw, sales.staging_raw, sales.prod_raw | Shared catalog, environment isolation at schema level |
| Separate catalogs per team | data_engineering, data_science, analytics | Team-based isolation |
Dr. Sarah Okafor at Athena Group chooses separate catalogs per environment (dev, staging, prod) because her security team requires complete isolation — developers must never accidentally query production data.
By external sharing
When sharing data with external partners, create a dedicated sharing catalog:
-- Catalog specifically for data shared via Delta Sharing
CREATE CATALOG IF NOT EXISTS shared_external;
CREATE SCHEMA IF NOT EXISTS shared_external.partner_freshmart;
Exam tip: Naming conventions are tested via scenarios. If the question mentions “isolation” or “prevent accidental access” — think separate catalogs. If it mentions “sharing” — think dedicated sharing catalog.
Real-world naming patterns
Common patterns seen in production:
| Pattern | Structure | Pros | Cons |
|---|---|---|---|
env_domain | prod_sales.raw.orders | Clear environment + domain | Lots of catalogs |
domain with env schemas | sales.prod_raw.orders | Fewer catalogs | Less isolation |
domain with env catalogs | prod.sales.orders | Clean, hierarchical | Requires strict permissions |
Most enterprise teams use environment-first (prod_sales, dev_sales) because Unity Catalog permissions are inherited — setting permissions at the catalog level propagates down to all schemas and tables.
Creating catalogs
-- Create a catalog
CREATE CATALOG IF NOT EXISTS prod_sales
COMMENT 'Production sales data for all regions';
-- View all catalogs
SHOW CATALOGS;
-- Describe a catalog
DESCRIBE CATALOG prod_sales;
A catalog is bound to a storage location — where its managed tables physically store data. By default, catalogs use the metastore’s root storage. You can override this:
-- Catalog with custom storage
CREATE CATALOG prod_sales
MANAGED LOCATION 'abfss://sales-container@adlsaccount.dfs.core.windows.net/prod';
Creating schemas
-- Create a schema within a catalog
CREATE SCHEMA IF NOT EXISTS prod_sales.raw
COMMENT 'Raw ingested data before any transformation';
CREATE SCHEMA IF NOT EXISTS prod_sales.curated
COMMENT 'Cleaned, validated, and conformed data';
CREATE SCHEMA IF NOT EXISTS prod_sales.aggregated
COMMENT 'Business-level aggregates for reporting';
Schemas can also have a managed location that overrides the catalog’s default:
CREATE SCHEMA prod_sales.sensitive
MANAGED LOCATION 'abfss://secure-container@adlsaccount.dfs.core.windows.net/sensitive';
Volumes: the file storage layer
Volumes are Unity Catalog’s way of managing files (not tables). Think of volumes as governed file directories:
| Volume Type | Description | Use Case |
|---|---|---|
| Managed volume | Files stored in Unity Catalog’s managed storage | Internal data files, staging area |
| External volume | Points to existing ADLS/S3 location | Landing zone for incoming data files |
-- Create a managed volume
CREATE VOLUME IF NOT EXISTS prod_sales.raw.landing_files
COMMENT 'Landing zone for CSV/JSON files from partners';
-- Create an external volume pointing to existing storage
CREATE EXTERNAL VOLUME prod_sales.raw.partner_uploads
LOCATION 'abfss://partner-landing@adlsaccount.dfs.core.windows.net/uploads';
Ravi uses volumes at DataPulse to manage the CSV files that clients upload before they’re ingested into Delta tables.
Volumes vs. tables: when to use which
| Use | Tables | Volumes |
|---|---|---|
| Structured data (rows and columns) | ✅ | ❌ |
| Files (CSV, JSON, images, PDFs) | ❌ | ✅ |
| Queryable with SQL | ✅ | Read with read_files() or COPY INTO |
| Schema enforcement | ✅ (Delta) | ❌ (files are as-is) |
| Governed by Unity Catalog | ✅ | ✅ |
Exam tip: Volumes are for files, tables are for data. If the question mentions “landing zone for raw files” or “file-based ingestion” — think volumes.
🎬 Video coming soon
Knowledge check
Dr. Sarah Okafor is designing Athena Group's Unity Catalog structure. She needs to ensure developers can never accidentally query or modify production data. Which naming strategy should she recommend?
Mei Lin needs to set up a landing zone at Freshmart where external suppliers upload CSV files before they're ingested into Delta tables. The files should be governed by Unity Catalog. What should she create?
Tomás is creating the data catalog structure for NovaPay's fraud detection system. He has three data layers: raw transactions, enriched data, and fraud alerts. Which Unity Catalog structure follows best practices?
Next up: Tables, Views & External Catalogs — managed vs external tables, views, materialized views, foreign catalogs, DDL, and AI/BI Genie.