Mirroring: Real-Time Database Replication
Replicate operational databases into Fabric using mirroring β continuous CDC-based replication from Azure SQL, Cosmos DB, Snowflake, and more, with zero ETL code.
What is mirroring?
Think of a live TV broadcast of a sports match.
The match is happening at the stadium (your operational database). The TV broadcast (mirroring) shows everything happening in near real-time on your screen (Fabric). You donβt need to go to the stadium β the broadcast brings the action to you, with just a few seconds of delay.
Mirroring in Fabric does exactly this for databases. It continuously replicates data from an operational database (Azure SQL, Cosmos DB, Snowflake, etc.) into your Fabric lakehouse as Delta tables. No pipelines to build, no scheduling to configure β it just stays in sync.
Supported mirroring sources
| Source | CDC Method | Key Consideration |
|---|---|---|
| Azure SQL Database | Change Data Capture (CDC) | Requires CDC to be enabled on source tables |
| Azure Cosmos DB | Cosmos DB change feed | Replicates all containers in a database; supports NoSQL API |
| Snowflake | Snowflake streams | Cross-cloud replication; egress costs may apply from Snowflake |
| Azure Database for PostgreSQL | Logical replication (pgoutput) | Requires wal_level = logical on the server |
| Azure Database for MySQL | Binary log (binlog) replication | Requires binlog_format = ROW |
| SQL Server (on-premises) | CDC or Fabric change feed (SQL Server 2025) | Supports SQL Server 2016-2025; requires on-prem data gateway |
How mirroring works
The flow
Source Database (Azure SQL)
β
βββ CDC captures changes (inserts, updates, deletes)
β
βββ Fabric reads the CDC stream
β
βββ Changes applied to OneLake as Delta Lake tables
β
βββ Data accessible via SQL endpoint + Spark
What you configure
- Create a mirrored database in Fabric
- Connect to the source β provide connection string and credentials
- Select tables β choose which tables to mirror (or all)
- Mirroring starts β initial snapshot loads all data, then CDC keeps it in sync
What you get
| Feature | Detail |
|---|---|
| Delta tables in OneLake | Each source table becomes a Delta table β queryable via Spark and SQL endpoint |
| Near real-time sync | Changes typically appear within minutes |
| Automatic schema sync | New columns in the source are automatically added to the mirror |
| Read-only | The mirror is a read-only replica β you cannot write back to the source |
| No ETL code | Zero pipelines, zero notebooks, zero scheduling β mirroring is fully managed |
Scenario: Carlos mirrors SAP data
Precision Manufacturing runs SAP on Azure SQL Database. Carlosβs team used to maintain a nightly ETL pipeline with 14 activities to copy production data into the lakehouse. Pipeline failures were common, and data was always at least 12 hours stale.
Carlos replaces the entire pipeline with mirroring:
- Creates a mirrored database in Fabric, connected to the Azure SQL Database
- Selects the 8 production tables he needs
- Mirroring starts β initial snapshot takes 20 minutes for 500M rows
- After that, changes appear in Fabric within 3-5 minutes
He retires 14 pipeline activities, saves 4 hours of maintenance per month, and production managers see data thatβs minutes old instead of 12 hours old.
Mirroring vs other ingestion methods
| Factor | Mirroring | Pipeline (Copy Activity) | OneLake Shortcut |
|---|---|---|---|
| Data copied? | Yes β replicated to OneLake | Yes β copied to target | No β reads from source |
| Latency | Minutes (near real-time CDC) | Depends on schedule (hourly/daily) | Real-time (reads source directly) |
| ETL code needed? | None β fully managed | Yes β pipeline activities, expressions | None |
| Supported sources | Databases only (SQL, Cosmos, Snowflake, etc.) | Any supported connector (150+) | Storage only (ADLS, S3, GCS, Fabric) |
| Offline access? | Yes β replica in OneLake | Yes β copied data | No β depends on source availability |
| Write to source? | No (read-only) | No (one-way copy) | No (read-only) |
| Maintenance | Low β Fabric manages replication | High β monitor and fix pipeline failures | Low β no moving parts |
Exam tip: When to choose mirroring
Exam questions about mirroring typically describe a scenario where:
- The source is a relational database (not files)
- The requirement is near real-time or continuous replication
- The team wants to reduce pipeline maintenance
- The data needs to be available in OneLake (not just accessible via shortcut)
If the scenario describes a file-based source β shortcut. If it describes a database and wants zero-code, near real-time β mirroring. If it needs complex transformation during ingestion β pipeline.
Mirroring considerations
| Consideration | Detail |
|---|---|
| Source requirements | CDC must be enabled on the source (varies by database type) |
| Initial load | First sync loads all data β can take minutes to hours depending on volume |
| Schema changes | New columns are automatically added; column removals may require re-sync |
| Cost | OneLake storage for the replica + source egress charges (especially for Snowflake) |
| Limits | Table count limits per mirrored database (check current documentation) |
| Monitoring | Use the Monitoring Hub to track replication lag and errors |
Carlos wants to replace his nightly ETL pipeline that copies data from Azure SQL Database to Fabric. He needs data freshness within 5 minutes and wants minimal maintenance. Which approach is best?
A data engineer creates a mirrored database for an Azure SQL source. The initial sync completes, but after a few hours, no new changes appear in Fabric. What is the most likely cause?
π¬ Video coming soon
Next up: PySpark Transformations: Code Your Pipeline β write PySpark to clean, shape, denormalize, and aggregate data at scale.