Blob, Data Lake & Azure Files
Blob Storage, Azure Data Lake Storage, and Azure Files β choose the right unstructured storage service based on access patterns, performance needs, and cost constraints.
Choosing unstructured storage
Blob Storage is a filing cabinet. Data Lake is a warehouse. Azure Files is a shared network drive.
All three store unstructured data, but they serve different purposes: Blob for application data (images, documents, backups), Data Lake for big data analytics (Hadoop, Spark, Synapse), Azure Files for lift-and-shift file shares (SMB/NFS β replaces on-prem file servers).
Service comparison
| Factor | Blob Storage | Data Lake Storage Gen2 | Azure Files |
|---|---|---|---|
| Namespace | Flat (container/blob) | Hierarchical (directories/files) | Hierarchical (shares/directories/files) |
| Protocol | REST API, SDKs | REST API, ABFS driver (Hadoop) | SMB 3.0, NFS 4.1, REST API |
| Access tiers | Hot, Cool, Cold, Archive | Hot, Cool, Cold, Archive | Hot, Cool (Transaction Optimised, Premium) |
| Analytics | Basic β needs external compute | Optimised β native Synapse/Spark/Databricks integration | Not designed for analytics |
| POSIX ACLs | No | Yes β fine-grained directory/file-level permissions | Yes (NFS shares) |
| Windows mapping | No β API access only | No β API access only | Yes β map as drive letter (SMB) |
| Best for | App data, media, backups, static websites | Big data analytics, data lake patterns | Lift-and-shift file shares, shared config |
ποΈ Priyaβs storage architecture:
- Blob Storage: Application documents, user uploads, backup archives
- ADLS Gen2: Data lake for analytics β raw data β curated data β reporting (medallion architecture)
- Azure Files: Migrated 15 on-prem file shares (SMB) β mapped as network drives for Windows users
Exam tip: ADLS Gen2 IS Blob Storage with hierarchical namespace
ADLS Gen2 is not a separate service β itβs a storage account with the hierarchical namespace feature enabled. This means you get all Blob Storage features (tiers, lifecycle management, redundancy) PLUS directory-level operations and POSIX ACLs. If the scenario needs analytics AND Blob features, recommend ADLS Gen2.
Storage redundancy
| Option | Copies | Region Scope | Durability | Best For |
|---|---|---|---|---|
| LRS | 3 copies in one data centre | Single region, single zone | 11 nines (99.999999999%) | Dev/test, non-critical data |
| ZRS | 3 copies across 3 availability zones | Single region, three zones | 12 nines | Production β survives data centre failure |
| GRS | 6 copies: 3 local (LRS) + 3 in paired region (LRS) | Two regions | 16 nines | DR β survives regional disaster |
| GZRS | 6 copies: 3 across zones (ZRS) + 3 in paired region (LRS) | Two regions, primary zone-redundant | 16 nines | Maximum durability β zone + region protection |
| RA-GRS/RA-GZRS | Same as GRS/GZRS + read access to secondary | Two regions, secondary readable | 16 nines | Read offloading + DR readiness |
π¦ Elenaβs redundancy choice: FinSecure Bank uses GZRS for all production storage β survives both a single data centre failure (zone redundancy) and a regional disaster (geo-redundancy). Customer-facing reports are served from the RA-GZRS secondary endpoint for read offloading (acceptable for reports that tolerate replication lag β reads from secondary are eventually consistent).
Access tiers and lifecycle management
| Tier | Storage Cost | Access Cost | Min Retention | Best For |
|---|---|---|---|---|
| Hot | Highest | Lowest | None | Frequently accessed data |
| Cool | Lower | Higher | 30 days | Infrequent access (monthly) |
| Cold | Even lower | Even higher | 90 days | Rare access (quarterly) |
| Archive | Lowest | Highest (rehydration delay) | 180 days | Compliance archive, rarely if ever accessed |
Lifecycle management rules
Automate tier transitions to optimise cost:
Rule: "age-based-tiering"
- Move to Cool after 30 days without access
- Move to Cold after 90 days
- Move to Archive after 180 days
- Delete after 2,555 days (7 years β compliance)
Well-Architected Framework connection
Cost Optimisation: Storage is one of the easiest places to save money. Lifecycle management rules can reduce storage costs by 50-80% by automatically moving data to cheaper tiers.
Reliability: Choose redundancy based on RPO requirements. GRS provides ~15 minutes RPO for regional failover. ZRS provides zone-level HA with zero RPO within the region.
Security: Immutable blobs (WORM storage) prevent deletion or modification β required for SEC 17a-4, FINRA, and similar regulations.
Data protection features
| Feature | What It Does | Use Case |
|---|---|---|
| Soft delete | Recovers deleted blobs/containers within retention period | Accidental deletion recovery |
| Blob versioning | Keeps previous versions of blobs automatically | Track changes, recover previous versions |
| Immutable storage (WORM) | Prevents modification or deletion for a set period | Compliance: SEC 17a-4, FINRA, legal hold |
| Point-in-time restore | Restores block blobs to a previous state | Recover from corruption or accidental overwrite |
Knowledge check
ποΈ GlobalTech is migrating their data analytics platform. They need hierarchical directory structure, POSIX ACLs for team-level permissions, and native Spark/Synapse integration. Data also needs lifecycle tiering from Hot to Archive. Which service should Priya recommend?
π¦ Elena must store financial audit logs that cannot be modified or deleted for 7 years (SEC 17a-4 compliance). The logs are written once and rarely read. Which storage design should she recommend?
π¬ Video coming soon
Next up: Storage is designed β now letβs connect the data together β Data Integration & Analytics.