Tables, Views & External Catalogs
Managed tables, external tables, views, materialized views, foreign catalogs, DDL operations, and AI/BI Genie — the objects that live inside your Unity Catalog namespace.
Managed vs external tables
Think of managed tables as company-owned furniture and external tables as rented furniture.
A managed table is like furniture the company bought. When you delete the table, Unity Catalog throws the furniture (data) away too — it owns everything.
An external table is like rented furniture. Unity Catalog knows where it is and who can use it, but when you delete the table registration, the actual data stays put — it belongs to someone else (your storage account).
| Feature | Managed Table | External Table |
|---|---|---|
| Data location | UC managed storage | Your ADLS/S3 path |
| DROP behaviour | Deletes metadata AND data | Deletes metadata ONLY |
| Lifecycle owner | Unity Catalog | You (external storage) |
| Default format | Delta Lake | Any (Delta, Parquet, CSV, JSON) |
| Best for | Standard lakehouse tables | Shared data, legacy integration |
| VACUUM / OPTIMIZE | Fully supported | Supported for Delta external tables |
| Cost | Storage in UC-managed account | Storage in your account |
-- Create a managed table (default)
CREATE TABLE prod_sales.curated.daily_revenue (
sale_date DATE,
region STRING,
revenue DECIMAL(12,2)
)
COMMENT 'Daily revenue aggregated by region';
-- Create an external table pointing to existing data
CREATE TABLE prod_sales.raw.partner_transactions
LOCATION 'abfss://raw-data@adlsaccount.dfs.core.windows.net/partner/transactions'
COMMENT 'Partner transaction files — data owned by partner team';
Exam decision tree: If you own the data and it’s part of your lakehouse → managed table. If the data is shared, pre-existing, or owned by another team/system → external table.
Views and materialized views
Standard views
A view is a saved SQL query. It doesn’t store data — it runs the query each time you access it:
CREATE VIEW prod_sales.reports.top_regions AS
SELECT region, SUM(revenue) AS total_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 10;
Materialized views
A materialized view stores the query result physically and refreshes it periodically:
CREATE MATERIALIZED VIEW prod_sales.reports.monthly_summary AS
SELECT
DATE_TRUNC('month', sale_date) AS month,
region,
SUM(revenue) AS monthly_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY 1, 2;
| Feature | Standard View | Materialized View |
|---|---|---|
| Stores data? | No — runs query each time | Yes — stores precomputed results |
| Query speed | Depends on underlying tables | Fast (precomputed) |
| Data freshness | Always current | Stale until refreshed |
| Refresh | N/A | Automatic or manual |
| Cost | Only compute at query time | Storage + compute for refresh |
| Use case | Simple abstractions, security layers | Dashboards, repeated heavy queries |
| Supported in | All compute types | Lakeflow Spark Declarative Pipelines, SQL warehouses |
Mei Lin uses materialized views at Freshmart for the store manager dashboard — it queries pre-aggregated monthly sales instead of scanning millions of raw transactions each time.
Exam tip: Materialized views in Declarative Pipelines
Materialized views are first-class objects in Lakeflow Spark Declarative Pipelines. You define them declaratively and the pipeline handles incremental refresh automatically. This is a key exam concept — if a question asks about “automatically refreshing precomputed results in a pipeline,” materialized views in Declarative Pipelines are the answer.
DDL operations on tables
The exam tests your knowledge of DDL (Data Definition Language) for both managed and external tables:
Common DDL operations
-- Add a column
ALTER TABLE prod_sales.curated.daily_revenue
ADD COLUMN discount_pct DECIMAL(5,2);
-- Rename a column
ALTER TABLE prod_sales.curated.daily_revenue
RENAME COLUMN discount_pct TO discount_percentage;
-- Drop a column
ALTER TABLE prod_sales.curated.daily_revenue
DROP COLUMN discount_percentage;
-- Change column type (only widening allowed, e.g., INT to BIGINT)
ALTER TABLE prod_sales.curated.daily_revenue
ALTER COLUMN revenue TYPE DECIMAL(15,2);
-- Add table comment
COMMENT ON TABLE prod_sales.curated.daily_revenue
IS 'Daily revenue by region — refreshed nightly at 3 AM';
-- Drop a managed table (DELETES DATA)
DROP TABLE prod_sales.curated.daily_revenue;
-- Drop an external table (data files remain)
DROP TABLE prod_sales.raw.partner_transactions;
Key exam fact: Schema evolution in Delta Lake supports adding columns and widening types. Narrowing types (BIGINT to INT) or renaming columns requires enabling column mapping.
Schema evolution and column mapping
Delta Lake supports two levels of schema evolution:
| Operation | Default Behaviour | With Column Mapping |
|---|---|---|
| Add column | ✅ Supported | ✅ Supported |
| Widen type (INT → BIGINT) | ✅ Supported | ✅ Supported |
| Rename column | ❌ Not supported | ✅ Supported |
| Drop column | ❌ Not supported | ✅ Supported |
| Reorder columns | ❌ Not supported | ✅ Supported |
Enable column mapping:
ALTER TABLE my_table SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name');Exam tip: If the question involves renaming or dropping columns, you need column mapping enabled. This is a common trap.
Foreign catalogs
A foreign catalog connects Unity Catalog to external database systems, making their tables visible in the three-level namespace:
-- Step 1: Create a connection to an external system
CREATE CONNECTION sqlserver_erp
TYPE SQLSERVER
OPTIONS (
host 'erp-server.database.windows.net',
port '1433',
user SECRET('kv-scope', 'erp-user'),
password SECRET('kv-scope', 'erp-password')
);
-- Step 2: Create a foreign catalog using that connection
CREATE FOREIGN CATALOG erp_data
USING CONNECTION sqlserver_erp;
Once created, Ravi can query the ERP system’s tables as if they were local:
-- Query a foreign catalog table like any other UC table
SELECT * FROM erp_data.dbo.customers WHERE region = 'APAC';
Supported connection types include SQL Server, MySQL, PostgreSQL, Snowflake, BigQuery, and others.
Exam scenario: “Ravi needs to join DataPulse’s lakehouse data with customer records in an on-premises SQL Server.” → Create a connection + foreign catalog.
AI/BI Genie
AI/BI Genie is a natural-language interface for data exploration. Business users type questions in plain English and Genie generates SQL queries against your lakehouse tables.
As a data engineer, your job is to configure Genie instructions — metadata that helps Genie understand your data:
| Configuration | Purpose | Example |
|---|---|---|
| Table descriptions | Tell Genie what each table contains | ”Daily revenue by region, refreshed nightly” |
| Column descriptions | Explain non-obvious columns | ”region_code: ISO 3166-1 alpha-2 country code” |
| Sample questions | Teach Genie expected query patterns | ”What was last month’s revenue in APAC?” |
| Trusted assets | Mark verified tables that Genie should prefer | Flag prod_sales.reports.* as trusted |
-- Add descriptions that help Genie
COMMENT ON TABLE prod_sales.curated.daily_revenue
IS 'Daily revenue aggregated by region. Updated nightly at 3 AM NZST. Use for revenue reporting queries.';
COMMENT ON COLUMN prod_sales.curated.daily_revenue.region
IS 'Business region: APAC, EMEA, Americas';
Exam tip: Genie is about metadata, not coding
The exam won’t ask you to build Genie from scratch. It tests whether you understand that data engineers enable Genie by writing good descriptions, sample questions, and marking trusted assets. Think of it as documenting your data for AI consumption.
🎬 Video coming soon
Knowledge check
Dr. Sarah Okafor is migrating Athena Group's legacy data warehouse to the lakehouse. She wants to register existing Parquet files in ADLS Gen2 without moving them. If the team later removes the table from Unity Catalog, the files must remain intact. Which table type should she use?
Mei Lin wants to improve dashboard performance at Freshmart. The store manager dashboard runs a heavy aggregation query that scans 200 million rows. The same query runs 50 times per day with identical results (data refreshes nightly). What should she create?
Tomás needs NovaPay's fraud detection system to join lakehouse data with customer records stored in an Azure SQL Database. The SQL Database is managed by a separate team and cannot be migrated. How should Tomás make the SQL Database tables queryable in Unity Catalog?
Next up: Securing Unity Catalog: Who Gets What — privileges, table and column-level access control, and row-level security.