Tables, Views & External Catalogs

Managed vs external tables

Simple explanation

Think of managed tables as company-owned furniture and external tables as rented furniture.

A managed table is like furniture the company bought. When you delete the table, Unity Catalog throws the furniture (data) away too — it owns everything.

An external table is like rented furniture. Unity Catalog knows where it is and who can use it, but when you delete the table registration, the actual data stays put — it belongs to someone else (your storage account).

Feature	Managed Table	External Table
Data location	UC managed storage	Your ADLS/S3 path
DROP behaviour	Deletes metadata AND data	Deletes metadata ONLY
Lifecycle owner	Unity Catalog	You (external storage)
Default format	Delta Lake	Any (Delta, Parquet, CSV, JSON)
Best for	Standard lakehouse tables	Shared data, legacy integration
VACUUM / OPTIMIZE	Fully supported	Supported for Delta external tables
Cost	Storage in UC-managed account	Storage in your account

-- Create a managed table (default)
CREATE TABLE prod_sales.curated.daily_revenue (
  sale_date DATE,
  region STRING,
  revenue DECIMAL(12,2)
)
COMMENT 'Daily revenue aggregated by region';

-- Create an external table pointing to existing data
CREATE TABLE prod_sales.raw.partner_transactions
  USING DELTA
  LOCATION 'abfss://raw-data@adlsaccount.dfs.core.windows.net/partner/transactions'
  COMMENT 'Partner transaction files — data owned by partner team';

Important: External table paths must be covered by a preconfigured external location in Unity Catalog, backed by a storage credential (typically an Access Connector with managed identity). Without this, the CREATE TABLE statement fails with an access denied error.

Exam decision tree: If you own the data and it’s part of your lakehouse → managed table. If the data is shared, pre-existing, or owned by another team/system → external table.

Views and materialized views

Standard views

A view is a saved SQL query. It doesn’t store data — it runs the query each time you access it:

CREATE VIEW prod_sales.reports.top_regions AS
SELECT region, SUM(revenue) AS total_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 10;

Materialized views

A materialized view stores the query result physically and refreshes it periodically:

CREATE MATERIALIZED VIEW prod_sales.reports.monthly_summary AS
SELECT
  DATE_TRUNC('month', sale_date) AS month,
  region,
  SUM(revenue) AS monthly_revenue
FROM prod_sales.curated.daily_revenue
GROUP BY 1, 2;

Feature	Standard View	Materialized View
Stores data?	No — runs query each time	Yes — stores precomputed results
Query speed	Depends on underlying tables	Fast (precomputed)
Data freshness	Always current	Stale until refreshed
Refresh	N/A	Automatic or manual
Cost	Only compute at query time	Storage + compute for refresh
Use case	Simple abstractions, security layers	Dashboards, repeated heavy queries
Supported in	All compute types	Lakeflow Spark Declarative Pipelines, SQL warehouses

Mei Lin uses materialized views at Freshmart for the store manager dashboard — it queries pre-aggregated monthly sales instead of scanning millions of raw transactions each time.

Exam tip: Materialized views in Declarative Pipelines

Materialized views are first-class objects in Lakeflow Spark Declarative Pipelines. You define them declaratively and the pipeline handles incremental refresh automatically. This is a key exam concept — if a question asks about “automatically refreshing precomputed results in a pipeline,” materialized views in Declarative Pipelines are the answer.

DDL operations on tables

The exam tests your knowledge of DDL (Data Definition Language) for both managed and external tables:

Common DDL operations

-- Add a column
ALTER TABLE prod_sales.curated.daily_revenue
  ADD COLUMN discount_pct DECIMAL(5,2);

-- Rename a column
ALTER TABLE prod_sales.curated.daily_revenue
  RENAME COLUMN discount_pct TO discount_percentage;

-- Drop a column
ALTER TABLE prod_sales.curated.daily_revenue
  DROP COLUMN discount_percentage;

-- Change column type (only widening allowed, e.g., INT to BIGINT)
ALTER TABLE prod_sales.curated.daily_revenue
  ALTER COLUMN revenue TYPE DECIMAL(15,2);

-- Add table comment
COMMENT ON TABLE prod_sales.curated.daily_revenue
  IS 'Daily revenue by region — refreshed nightly at 3 AM';

-- Drop a managed table (DELETES DATA)
DROP TABLE prod_sales.curated.daily_revenue;

-- Drop an external table (data files remain)
DROP TABLE prod_sales.raw.partner_transactions;

Key exam fact: Schema evolution in Delta Lake supports adding columns and widening types. Narrowing types (BIGINT to INT) or renaming columns requires enabling column mapping.

Schema evolution and column mapping

Delta Lake supports two levels of schema evolution:

Operation	Default Behaviour	With Column Mapping
Add column	✅ Supported	✅ Supported
Widen type (INT → BIGINT)	✅ Supported	✅ Supported
Rename column	❌ Not supported	✅ Supported
Drop column	❌ Not supported	✅ Supported
Reorder columns	❌ Not supported	✅ Supported

Enable column mapping:

ALTER TABLE my_table SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name');

Exam tip: If the question involves renaming or dropping columns, you need column mapping enabled. This is a common trap.

Foreign catalogs

A foreign catalog connects Unity Catalog to external database systems, making their tables visible in the three-level namespace:

-- Step 1: Create a connection to an external system
CREATE CONNECTION sqlserver_erp
  TYPE SQLSERVER
  OPTIONS (
    host 'erp-server.database.windows.net',
    port '1433',
    user SECRET('kv-scope', 'erp-user'),
    password SECRET('kv-scope', 'erp-password')
  );

-- Step 2: Create a foreign catalog using that connection
CREATE FOREIGN CATALOG erp_data
  USING CONNECTION sqlserver_erp;

Once created, Ravi can query the ERP system’s tables as if they were local:

-- Query a foreign catalog table like any other UC table
SELECT * FROM erp_data.dbo.customers WHERE region = 'APAC';

Supported connection types include SQL Server, MySQL, PostgreSQL, Snowflake, BigQuery, and others.

Exam scenario: “Ravi needs to join DataPulse’s lakehouse data with customer records in an on-premises SQL Server.” → Create a connection + foreign catalog.

AI/BI Genie

AI/BI Genie is a natural-language interface for data exploration. Business users type questions in plain English and Genie generates SQL queries against your lakehouse tables.

As a data engineer, your job is to configure Genie instructions — metadata that helps Genie understand your data:

Configuration	Purpose	Example
Table descriptions	Tell Genie what each table contains	”Daily revenue by region, refreshed nightly”
Column descriptions	Explain non-obvious columns	”region_code: ISO 3166-1 alpha-2 country code”
Sample questions	Teach Genie expected query patterns	”What was last month’s revenue in APAC?”
Trusted assets	Mark verified tables that Genie should prefer	Flag `prod_sales.reports.*` as trusted

-- Add descriptions that help Genie
COMMENT ON TABLE prod_sales.curated.daily_revenue
  IS 'Daily revenue aggregated by region. Updated nightly at 3 AM NZST. Use for revenue reporting queries.';

COMMENT ON COLUMN prod_sales.curated.daily_revenue.region
  IS 'Business region: APAC, EMEA, Americas';

Exam tip: Genie is about metadata, not coding

The exam won’t ask you to build Genie from scratch. It tests whether you understand that data engineers enable Genie by writing good descriptions, sample questions, and marking trusted assets. Think of it as documenting your data for AI consumption.

Question

What happens when you DROP a managed table vs. an external table?

Click or press Enter to reveal answer

Answer

Managed table: both metadata AND underlying data files are deleted. External table: only the metadata in Unity Catalog is removed — the data files in your storage location remain untouched.

Click to flip back

Question

What is a materialized view and where is it used?

Click or press Enter to reveal answer

Answer

A materialized view stores precomputed query results physically (unlike a standard view, which re-runs the query each time). Used in Lakeflow Spark Declarative Pipelines and SQL warehouses for dashboards and repeated heavy queries. Must be refreshed to stay current.

Click to flip back

Question

What is a foreign catalog in Unity Catalog?

Click or press Enter to reveal answer

Answer

A foreign catalog connects UC to an external database system (SQL Server, MySQL, PostgreSQL, Snowflake, etc.) via a CONNECTION object. External tables appear in the UC namespace and can be queried like local tables.

Click to flip back

Question

How do data engineers configure AI/BI Genie for data discovery?

Click or press Enter to reveal answer

Answer

Write descriptive COMMENT ON TABLE/COLUMN statements, provide sample questions, and mark trusted assets. Genie uses this metadata to generate accurate SQL from natural-language questions.

Click to flip back

Knowledge check

Knowledge Check

Dr. Sarah Okafor is migrating Athena Group's legacy data warehouse to the lakehouse. She wants to register existing Parquet files in ADLS Gen2 without moving them. If the team later removes the table from Unity Catalog, the files must remain intact. Which table type should she use?

Knowledge Check

Mei Lin wants to improve dashboard performance at Freshmart. The store manager dashboard runs a heavy aggregation query that scans 200 million rows. The same query runs 50 times per day with identical results (data refreshes nightly). What should she create?

Knowledge Check

Tomás needs NovaPay's fraud detection system to join lakehouse data with customer records stored in an Azure SQL Database. The SQL Database is managed by a separate team and cannot be migrated. How should Tomás make the SQL Database tables queryable in Unity Catalog?

Next up: Securing Unity Catalog: Who Gets What — privileges, table and column-level access control, and row-level security.