Analytical Workloads: Synapse Link and Fabric Mirroring
Implement HTAP analytics on Cosmos DB data using Azure Synapse Link's auto-synced analytical store and Microsoft Fabric mirroring β no ETL pipelines required.
The problem: analytics vs transactions
Running analytics on your live database is like doing a full inventory count while the store is open. Customers bump into counters, shelves get blocked, everything slows down.
Synapse Link creates a second copy of your data in a format optimised for analytics (columns instead of documents). This copy auto-syncs from your live data, so analysts query the copy while your app runs at full speed.
Amaraβs analytics challenge
π‘ Amara at SensorFlow ingests 500M sensor events per day. Her data scientist TomΓ‘s wants to run daily aggregations β average temperature per device, anomaly detection, trend analysis. But running these queries against the transactional store would consume massive RU/s and slow down real-time ingestion.
Synapse Link is the answer: TomΓ‘s queries the analytical store via Synapse or Fabric while Amaraβs ingestion runs unimpacted.
Enabling Synapse Link
β οΈ Important (2025): Azure Synapse Link for Cosmos DB is no longer supported for new projects. Microsoft recommends Azure Cosmos DB Mirroring for Microsoft Fabric instead, which is now GA and provides the same zero-ETL benefits. The exam may still test Synapse Link concepts, but know that Fabric Mirroring is the recommended replacement.
Step 1: Enable Synapse Link on the Cosmos DB account (one-time, irreversible):
az cosmosdb update --name sensorflow-cosmos \
--resource-group rg-sensorflow \
--enable-analytical-storage true
Step 2: Enable the analytical store on each container:
ContainerProperties props = new ContainerProperties("readings", "/deviceId")
{
AnalyticalStoreTimeToLiveInSeconds = -1 // no expiry
};
await database.CreateContainerAsync(props);
| TTL Value | Behaviour |
|---|---|
-1 | Analytical store enabled, data retained indefinitely |
0 or null | Analytical store disabled |
N (positive) | Data retained for N seconds in analytical store |
Exam tip: analytical store TTL is independent
The analytical store TTL is separate from the transactional store TTL. You can keep data in the analytical store longer than in the transactional store:
- Transactional TTL = 30 days (keep recent data for the app)
- Analytical TTL = -1 (keep all data forever for analytics)
When a document expires from the transactional store, it persists in the analytical store until its own TTL expires. This is a common exam scenario.
How the analytical store works
Transactional Store (row-oriented) Analytical Store (column-oriented)
ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β { id, deviceId, temp, ts } βββautoβββ β id β deviceId β temp βtsβ
β { id, deviceId, temp, ts } β sync β ββββββ β ββββββββ β ββββ ββββ
β { id, deviceId, temp, ts } β (~2min) β val β val β val β β
ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
App reads/writes Synapse / Fabric queries
(RU budget) (no RU impact)
- Auto-sync latency: Typically under 2 minutes, can be up to 5 minutes
- Schema: Fully-faithful, auto-inferred from the transactional store
- Nested properties: Flattened into columns automatically
- No RU consumption: Analytical sync and queries donβt consume transactional RU/s
Querying via Synapse
TomΓ‘s queries the analytical store using Synapse serverless SQL or Spark:
-- Synapse serverless SQL pool
SELECT deviceId,
AVG(temperature) as avg_temp,
MAX(temperature) as max_temp,
COUNT(*) as reading_count
FROM OPENROWSET(
'CosmosDB',
'Account=sensorflow-cosmos;Database=sensorflow;Key=...',
readings
) WITH (
deviceId VARCHAR(50),
temperature FLOAT,
_ts BIGINT
) AS readings
GROUP BY deviceId
HAVING AVG(temperature) > 80
Fabric mirroring (recommended for new projects)
Microsoft Fabric mirroring is the recommended replacement for Synapse Link. It is GA and continuously replicates Cosmos DB data into Fabric OneLake:
| Feature | Synapse Link | Fabric Mirroring |
|---|---|---|
| Data location | Cosmos DB analytical store | Fabric OneLake (Delta tables) |
| Query engines | Synapse SQL/Spark pools | Fabric Spark, SQL endpoint, Power BI |
| Latency | ~2 min auto-sync | Near real-time continuous sync |
| Schema handling | Auto-inferred columns | Delta tables with schema evolution |
| Cost model | Synapse compute + analytical store | Fabric capacity units |
| Setup complexity | Enable on account + container | Configure in Fabric workspace |
| Best for | Synapse-centric architectures | Fabric/Power BI-centric architectures |
Exam tip: Synapse Link is the DP-420 focus
The DP-420 exam focuses primarily on Synapse Link and the analytical store. Fabric mirroring may appear in βwhich tool to chooseβ questions but deep configuration questions will be about Synapse Link. Know how to enable it, set TTL, and understand the auto-sync behaviour.
π¬ Video walkthrough
π¬ Video coming soon
Analytical Workloads β DP-420 Module 16
Analytical Workloads β DP-420 Module 16
~14 minFlashcards
Knowledge Check
TomΓ‘s needs to run daily aggregation queries on 500M sensor events. Running these directly on the transactional store would consume 50,000+ RU/s. What's the best approach?
Amara sets transactional store TTL to 7 days and analytical store TTL to -1. A document is ingested on Monday. What happens after the following Monday?
Which is required before you can enable the analytical store on a container?
Next up: Data Movement β Azure Data Factory, Kafka connectors, and Spark connectors for moving data into and out of Cosmos DB.