Mirroring: Real-Time Database Replication

What is mirroring?

Simple explanation

Think of a live TV broadcast of a sports match.

The match is happening at the stadium (your operational database). The TV broadcast (mirroring) shows everything happening in near real-time on your screen (Fabric). You don’t need to go to the stadium — the broadcast brings the action to you, with just a few seconds of delay.

Mirroring in Fabric does exactly this for databases. It continuously replicates data from an operational database (Azure SQL, Cosmos DB, Snowflake, etc.) into your Fabric lakehouse as Delta tables. No pipelines to build, no scheduling to configure — it just stays in sync.

Supported mirroring sources

Each source uses its native change-tracking mechanism — Fabric handles the rest
Source	CDC Method	Key Consideration
Azure SQL Database	Change Data Capture (CDC)	Requires CDC to be enabled on source tables
Azure Cosmos DB	Cosmos DB change feed	Replicates all containers in a database; supports NoSQL API
Snowflake	Snowflake streams	Cross-cloud replication; egress costs may apply from Snowflake
Azure Database for PostgreSQL	Logical replication (pgoutput)	Requires wal_level = logical on the server
Azure Database for MySQL	Binary log (binlog) replication	Requires binlog_format = ROW
SQL Server (on-premises)	CDC or Fabric change feed (SQL Server 2025)	Supports SQL Server 2016-2025; requires on-prem data gateway

How mirroring works

The flow

Source Database (Azure SQL)
  │
  ├── CDC captures changes (inserts, updates, deletes)
  │
  ├── Fabric reads the CDC stream
  │
  ├── Changes applied to OneLake as Delta Lake tables
  │
  └── Data accessible via SQL endpoint + Spark

What you configure

Create a mirrored database in Fabric
Connect to the source — provide connection string and credentials
Select tables — choose which tables to mirror (or all)
Mirroring starts — initial snapshot loads all data, then CDC keeps it in sync

What you get

Feature	Detail
Delta tables in OneLake	Each source table becomes a Delta table — queryable via Spark and SQL endpoint
Near real-time sync	Changes typically appear within minutes
Automatic schema sync	New columns in the source are automatically added to the mirror
Read-only	The mirror is a read-only replica — you cannot write back to the source
No ETL code	Zero pipelines, zero notebooks, zero scheduling — mirroring is fully managed

Scenario: Carlos mirrors SAP data

Precision Manufacturing runs SAP on Azure SQL Database. Carlos’s team used to maintain a nightly ETL pipeline with 14 activities to copy production data into the lakehouse. Pipeline failures were common, and data was always at least 12 hours stale.

Carlos replaces the entire pipeline with mirroring:

Creates a mirrored database in Fabric, connected to the Azure SQL Database
Selects the 8 production tables he needs
Mirroring starts — initial snapshot takes 20 minutes for 500M rows
After that, changes appear in Fabric within 3-5 minutes

He retires 14 pipeline activities, saves 4 hours of maintenance per month, and production managers see data that’s minutes old instead of 12 hours old.

Mirroring vs other ingestion methods

Mirroring for databases, shortcuts for storage, pipelines for everything else
Factor	Mirroring	Pipeline (Copy Activity)	OneLake Shortcut
Data copied?	Yes — replicated to OneLake	Yes — copied to target	No — reads from source
Latency	Minutes (near real-time CDC)	Depends on schedule (hourly/daily)	Real-time (reads source directly)
ETL code needed?	None — fully managed	Yes — pipeline activities, expressions	None
Supported sources	Databases only (SQL, Cosmos, Snowflake, etc.)	Any supported connector (150+)	Storage only (ADLS, S3, GCS, Fabric)
Offline access?	Yes — replica in OneLake	Yes — copied data	No — depends on source availability
Write to source?	No (read-only)	No (one-way copy)	No (read-only)
Maintenance	Low — Fabric manages replication	High — monitor and fix pipeline failures	Low — no moving parts

Exam tip: When to choose mirroring

Exam questions about mirroring typically describe a scenario where:

The source is a relational database (not files)
The requirement is near real-time or continuous replication
The team wants to reduce pipeline maintenance
The data needs to be available in OneLake (not just accessible via shortcut)

If the scenario describes a file-based source → shortcut. If it describes a database and wants zero-code, near real-time → mirroring. If it needs complex transformation during ingestion → pipeline.

Mirroring considerations

Consideration	Detail
Source requirements	CDC must be enabled on the source (varies by database type)
Initial load	First sync loads all data — can take minutes to hours depending on volume
Schema changes	New columns are automatically added; column removals may require re-sync
Cost	OneLake storage for the replica + source egress charges (especially for Snowflake)
Limits	Table count limits per mirrored database (check current documentation)
Monitoring	Use the Monitoring Hub to track replication lag and errors

Question

What is Fabric mirroring?

Click or press Enter to reveal answer

Answer

Continuous, CDC-based replication of an operational database into Fabric as Delta Lake tables in OneLake. Supported sources: Azure SQL, Cosmos DB, Snowflake, PostgreSQL, MySQL. No ETL code needed — Fabric manages the replication automatically.

Click to flip back

Question

Is mirrored data writable?

Click or press Enter to reveal answer

Answer

No. Mirrored databases in Fabric are read-only replicas. You can query them via the SQL analytics endpoint or Spark, but you cannot write data back to the source through the mirror.

Click to flip back

Question

What is the key prerequisite for mirroring an Azure SQL Database?

Click or press Enter to reveal answer

Answer

Change Data Capture (CDC) must be enabled on the source tables. Fabric uses the CDC stream to detect and replicate inserts, updates, and deletes.

Click to flip back

Knowledge Check

Carlos wants to replace his nightly ETL pipeline that copies data from Azure SQL Database to Fabric. He needs data freshness within 5 minutes and wants minimal maintenance. Which approach is best?

Knowledge Check

A data engineer creates a mirrored database for an Azure SQL source. The initial sync completes, but after a few hours, no new changes appear in Fabric. What is the most likely cause?

Next up: PySpark Transformations: Code Your Pipeline — write PySpark to clean, shape, denormalize, and aggregate data at scale.