Lineage, Audit Logs & Delta Sharing
Track how data flows through your lakehouse, audit who accessed what, and securely share data across organisations — the governance capstone.
Why lineage and audit matter
Lineage is like a food supply chain label. Audit is like a security camera.
Lineage tells you where your data came from and where it went. If the daily revenue report shows wrong numbers, lineage lets you trace back: “This table was built from that table, which was loaded from that CSV file.” You can find the problem source quickly.
Audit logs record who did what and when. “Ravi queried the customer PII table at 3:15 PM.” If there’s a data breach investigation, audit logs are your evidence.
Delta Sharing is like a secure read-only portal. You can share specific tables with an external partner without giving them access to your entire lakehouse — and they don’t even need Databricks.
Data lineage
What lineage tracks
Unity Catalog automatically captures lineage when notebooks, jobs, or pipelines read from and write to tables:
| Lineage Element | What It Shows |
|---|---|
| Table-level lineage | Which tables feed into which tables |
| Column-level lineage | Which source columns map to which destination columns |
| Notebook/job lineage | Which notebook or job created/modified a table |
| Owner | Who owns the table (user or group) |
| History | When the table was created, last modified, version history |
| Dependencies | Upstream tables this table depends on |
Viewing lineage in Catalog Explorer
In Catalog Explorer, select any table and click the Lineage tab to see:
- Upstream — tables and columns that feed into this table
- Downstream — tables and dashboards that consume this table
- Notebooks/jobs — the code that created the relationship
When Mei Lin investigates a data quality issue in Freshmart’s daily revenue report, she traces lineage back through:
gold.daily_revenue ← silver.cleaned_transactions ← bronze.raw_pos_data
The issue is in the bronze layer — a partner changed their CSV format.
Querying lineage via system tables
-- View table lineage
SELECT * FROM system.access.table_lineage
WHERE target_table_full_name = 'prod_sales.curated.daily_revenue';
-- View column lineage
SELECT * FROM system.access.column_lineage
WHERE target_table_full_name = 'prod_sales.curated.daily_revenue';
Exam tip: Lineage is automatic but not universal
Unity Catalog captures lineage automatically for:
- ✅ Spark SQL queries (SELECT INTO, INSERT INTO, MERGE)
- ✅ DataFrame operations (read → transform → write)
- ✅ Lakeflow Spark Declarative Pipeline dependencies
- ✅ Notebook-driven ETL
Lineage is NOT captured for:
- ❌ External tools that bypass Spark (direct file access)
- ❌ Legacy Hive metastore tables (not registered in Unity Catalog)
If the exam asks “how to ensure lineage is tracked,” the answer is: use Unity Catalog tables and run transformations through Databricks compute.
Audit logging
What gets logged
Unity Catalog logs every significant action to system tables:
| Event Type | Examples |
|---|---|
| Data access | SELECT queries, table reads |
| Data modification | INSERT, UPDATE, DELETE, MERGE |
| Permission changes | GRANT, REVOKE |
| Schema changes | CREATE TABLE, ALTER TABLE, DROP TABLE |
| Authentication | Login events, service principal access |
| Admin actions | Cluster creation, job scheduling |
Querying audit logs
-- Who accessed the customers table in the last 7 days?
SELECT
event_time,
user_identity.email AS user_email,
action_name,
request_params.full_name_arg AS table_name
FROM system.access.audit
WHERE request_params.full_name_arg = 'prod_sales.curated.customers'
AND event_time > CURRENT_TIMESTAMP() - INTERVAL 7 DAYS
ORDER BY event_time DESC;
Dr. Sarah Okafor runs audit queries weekly at Athena Group to verify that only authorised users accessed sensitive financial tables.
Audit log retention and export
System audit tables have a default retention period. For long-term compliance:
- Export audit logs to a Delta table in your own storage for permanent retention
- Stream to Azure Monitor for real-time alerting on suspicious activity
- Integrate with SIEM tools for security operations
Exam tip: If the question mentions “long-term audit retention” or “compliance archive,” the answer involves exporting audit logs to your own managed storage — not relying on the system table defaults.
Delta Sharing
What is Delta Sharing?
Delta Sharing is an open protocol for secure, live data sharing:
- Provider — the organisation sharing data (creates shares, adds tables, generates recipient tokens)
- Recipient — the organisation receiving data (connects using a credential file or Databricks-to-Databricks sharing)
- Share — a named collection of tables made available to recipients
Two sharing modes
| Feature | Open Sharing | Databricks-to-Databricks |
|---|---|---|
| Recipient needs Databricks? | No — any Spark/pandas/Power BI | Yes — Databricks workspace |
| Authentication | Bearer token (credential file) | Unity Catalog identity |
| Governance | Read-only access to shared tables | Full UC governance (lineage, audit) |
| Best for | External partners, customers | Internal cross-workspace, Databricks partners |
| Live data? | Yes — reads current version | Yes — reads current version |
Setting up Delta Sharing
-- Step 1: Create a share
CREATE SHARE IF NOT EXISTS partner_freshmart
COMMENT 'Product catalog shared with Freshmart suppliers';
-- Step 2: Add tables to the share
ALTER SHARE partner_freshmart
ADD TABLE prod_sales.curated.product_catalog;
-- Step 3: Create a recipient
CREATE RECIPIENT IF NOT EXISTS freshmart_supplier_a
COMMENT 'Supplier A needs product catalog for inventory planning';
-- Step 4: Grant the recipient access to the share
GRANT SELECT ON SHARE partner_freshmart TO RECIPIENT freshmart_supplier_a;
Secure sharing strategy
When designing a Delta Sharing strategy, consider:
| Decision | Recommendation |
|---|---|
| What to share | Only curated/gold tables — never raw data |
| Granularity | One share per partner or use case |
| Column filtering | Share views that exclude sensitive columns |
| Audit | Monitor share access via audit logs |
| Revocation | Remove recipient access immediately when partnership ends |
Tomás shares anonymised fraud pattern data with NovaPay’s banking partners via Delta Sharing. Partners get a read-only view of fraud trends without accessing NovaPay’s customer data.
🎬 Video coming soon
Knowledge check
Mei Lin discovers incorrect revenue figures in Freshmart's executive dashboard. She needs to trace the data back to its source to find where the error was introduced. Which Unity Catalog feature should she use?
Dr. Sarah Okafor needs to share product inventory data with Athena Group's logistics partner. The partner uses Snowflake, not Databricks. The data must be live (not a copy) and read-only. Which approach should she use?
Tomás needs to prove to NovaPay's compliance auditor that no unauthorised users accessed the fraud_alerts table in the last 30 days. Which Unity Catalog feature provides this evidence?
Next up: Data Modeling: Ingestion Design — choosing ingestion tools, loading methods, table formats, and managed vs external tables.