Backup, Recovery, and Business Continuity
Design backup strategies, data retention policies, and recovery architectures that keep organisations running when the worst happens. Covers secure backup configurations, archival strategy, and RTO/RPO design.
Why backup is a security architecture decision
Backup is your safety net — but only if nobody can cut the ropes.
Think of a trapeze act. The performers practice dangerous moves because there’s a safety net below. But what if someone could sneak in and remove the net before the show? That’s exactly what ransomware attackers do — they target your backups first, then encrypt everything.
A cybersecurity architect doesn’t just say “back up your data.” They design the entire safety system: where the backups live (somewhere attackers can’t reach), who can access them (not the same people who manage production), how long they’re kept (long enough to recover from slow-moving attacks), and how often they’re tested (regularly, not just “we assume it works”).
Designing a backup strategy
RTO and RPO — the architect’s starting point
Every backup strategy begins with two business requirements:
| Metric | What It Means | Example |
|---|---|---|
| RTO (Recovery Time Objective) | Maximum acceptable downtime after an incident | ”Payment systems must be back within 4 hours” |
| RPO (Recovery Point Objective) | Maximum acceptable data loss measured in time | ”We can afford to lose at most 1 hour of transactions” |
The architect maps these to backup frequency and recovery methods:
| RTO/RPO Requirement | Backup Approach | Microsoft Technology |
|---|---|---|
| RTO < 1 hour, RPO < 15 min | Continuous replication, instant failover | Azure Site Recovery, SQL Always On, Cosmos DB multi-region |
| RTO 1-4 hours, RPO < 1 hour | Frequent snapshots, automated recovery | Azure Backup (hourly), VM snapshots, managed disk snapshots |
| RTO 4-24 hours, RPO < 24 hours | Daily backups, documented recovery procedures | Azure Backup (daily), Azure Files backup |
| RTO > 24 hours, RPO = last full backup | Weekly full + daily incremental | Azure Backup vault, offline media for air-gapped recovery |
💰 Scenario: Ingrid's tiered recovery design
Ingrid Svensson at Nordic Capital Partners classifies systems into three tiers:
- Tier 1 (trading platform): RTO 1 hour, RPO 15 minutes → Azure Site Recovery with continuous replication + SQL Always On
- Tier 2 (client portal): RTO 4 hours, RPO 1 hour → Azure Backup with hourly snapshots
- Tier 3 (internal tools): RTO 24 hours, RPO 24 hours → Daily Azure Backup
Each tier has a different cost profile. Ingrid justifies the investment by mapping RTO/RPO to business impact — trading platform downtime costs millions per hour, while internal tools can wait.
Yuki Tanaka, the IAM lead, asks: “Why don’t we just put everything on Tier 1?” Ingrid explains: “Continuous replication for 500 systems costs more than the downtime risk for Tier 3 systems. We spend money where the business risk demands it.”
What to back up — the architect’s scope
Backup strategy goes beyond just data. The architect considers:
| Category | What Needs Backup | Why |
|---|---|---|
| Data | Databases, files, blob storage | Core business data — the obvious target |
| Configuration | Azure Policy, NSG rules, Conditional Access policies | Rebuilding configuration from scratch takes days |
| Identity | Entra ID configuration, Conditional Access, PIM settings | Identity is the control plane — losing it means losing access control |
| Secrets | Key Vault contents, certificates, connection strings | Without secrets, applications can’t authenticate to anything |
| Infrastructure as Code | ARM templates, Bicep files, Terraform state | Enables rapid redeployment of entire environments |
Exam tip: Configuration backup is often overlooked
Exam questions may describe a scenario where data is recovered successfully but the organisation still can’t operate because configuration was lost — Conditional Access policies, network rules, application settings.
The architect’s answer: treat configuration as code (IaC), store it in version-controlled repositories, and include it in the recovery plan. Azure Resource Graph and Azure Policy export support this approach. For Entra ID, export configurations using IaC tools (e.g., Terraform, Microsoft365DSC) and store them in source control — native Entra backup/recovery capabilities are limited and evolving, so version-controlled IaC remains the primary recovery mechanism for identity configuration.
Secure backup configurations
The 3-2-1-1-0 backup rule
Modern backup best practice extends the classic 3-2-1 rule:
| Rule Component | What It Means |
|---|---|
| 3 copies | Production data + 2 backup copies |
| 2 different media | At least two different storage types (disk + tape, disk + cloud object storage, or local backup appliance + cloud repository) |
| 1 offsite | At least one copy in a different physical location |
| 1 offline or immutable | At least one copy that attackers cannot modify or delete |
| 0 errors | Verified — recovery tested with zero errors |
Immutable backup vaults
Immutable storage is the architect’s strongest protection against ransomware-targeted backup deletion:
| Feature | What It Does |
|---|---|
| Azure Backup immutable vault | Once enabled, backup data cannot be deleted before retention period expires — even by the backup admin |
| Soft delete | Deleted backup data is retained for 14 additional days before permanent removal |
| Multi-user authorisation | Critical operations (disable immutability, reduce retention) require approval from a second security admin |
| Azure Blob immutability policies | Time-based retention or legal hold prevents modification of blob data |
| Feature | Standard Backup Vault | Immutable Backup Vault |
|---|---|---|
| Admin can delete backups early? | Yes — any vault admin can delete | No — data retained until retention period expires |
| Attacker with admin creds can destroy backups? | Yes — full control if credentials are compromised | No — immutability policy cannot be bypassed |
| Can immutability be disabled? | N/A | Only with multi-user authorisation (requires second approver) |
| Soft delete protection | Optional — can be turned off | Mandatory — always enabled, cannot be disabled |
| Best for | General purpose backup with lower cost | Ransomware-resilient, compliance-required backup |
Data retention and archival strategy
Retention isn’t just about backup — it’s about how long data must be kept for regulatory, legal, and operational purposes.
Retention tiers
| Tier | Retention Period | Purpose | Storage |
|---|---|---|---|
| Operational | 1-30 days | Quick recovery from accidental deletion or corruption | Hot/cool storage, Azure Backup daily |
| Compliance | 1-7 years | Meet regulatory requirements (SOX, GDPR, PCI DSS) | Cool/archive storage, retention policies |
| Legal hold | Indefinite until released | Preserve data for litigation or investigation | Legal hold policies, immutable storage |
| Archival | 7+ years | Long-term historical records, audit trails | Archive storage tier, offline media |
Designing retention with Microsoft tools
| Requirement | Microsoft Solution |
|---|---|
| M365 data retention | Purview retention labels and policies (Exchange, SharePoint, Teams, OneDrive) |
| Azure resource backup retention | Azure Backup vault retention policies (daily, weekly, monthly, yearly) |
| Azure storage data archival | Blob storage lifecycle management policies (hot → cool → archive → delete) |
| Legal hold | Purview eDiscovery holds, blob storage legal hold |
| Audit log retention | Microsoft Purview Audit (standard: 180 days, premium: up to 10 years) |
Exam tip: Retention conflicts
The exam may present scenarios where different regulations require different retention periods for the same data — for example, GDPR’s data minimisation principle (delete when no longer needed) vs. financial regulations requiring 7-year retention.
The architect’s approach: the longest mandatory retention period wins, but you must also implement data minimisation for non-regulated attributes. Classify data by regulation, apply the strictest requirement, and document the justification.
Also watch for: litigation hold overrides everything. If data is under legal hold, you cannot delete it regardless of what any retention policy says.
🌐 Scenario: Elena's retention matrix
Elena builds a retention matrix for Meridian Global:
| Data Type | GDPR | Industry Regulation | Meridian Policy | Applied Retention |
|---|---|---|---|---|
| Customer PII | Delete when consent withdrawn | N/A | 3 years after last interaction | 3 years (consent-based) |
| Financial transactions | N/A | 7 years (SOX) | 7 years | 7 years |
| Employee records | 6 years post-employment (UK) | N/A | 7 years | 7 years |
| Audit logs | N/A | 5 years | 10 years | 10 years |
| Manufacturing IP | N/A | N/A | Indefinite | Indefinite (company asset) |
She implements this through Purview retention labels (M365 data) and Azure Backup policies (infrastructure data), with legal hold capability for any data that enters litigation.
Li Wei asks: “Can’t we just keep everything forever?” Elena explains: “Keeping data longer than required increases our exposure surface. Under GDPR, holding customer PII beyond the retention period is itself a compliance violation.”
Business continuity architecture
Business continuity extends beyond backup. The architect designs for continued operations during incidents:
| Component | What It Covers | Design Consideration |
|---|---|---|
| Disaster recovery | Failover to secondary site/region | Azure Site Recovery, paired regions, multi-region deployment |
| High availability | Redundancy within a region | Availability zones, load balancers, geo-redundant storage |
| Backup and restore | Data and configuration recovery | Backup strategy, immutable vaults, configuration as code |
| Communication plan | Stakeholder notification during incidents | Out-of-band communication channels (not dependent on affected systems) |
| Recovery runbooks | Step-by-step recovery procedures | Documented, tested, accessible offline |
💰 Scenario: Ingrid's communication plan gap
During a tabletop exercise at Nordic Capital Partners, Ingrid’s team discovers a critical gap: the incident communication plan relies on Microsoft Teams. If Azure or M365 is the affected system, they can’t communicate.
Ingrid designs an out-of-band communication plan:
- Primary: Microsoft Teams (normal operations)
- Secondary: SMS-based group notification (vendor-hosted, not Azure-dependent)
- Tertiary: Personal mobile phone tree for executive team
Harald Eriksen asks: “Isn’t this over-engineering?” Ingrid replies: “In our last tabletop, the simulated Azure outage knocked out our entire communication chain. The board couldn’t reach IT for 45 minutes. That’s not a cost we can accept.”
Exam tip: Business continuity is more than DR
Watch for exam questions that conflate backup, DR, and business continuity. They’re related but distinct:
- Backup = data recovery (can I get my data back?)
- DR = system recovery (can I bring my systems back online in another location?)
- Business continuity = operational recovery (can the business keep running?) — includes communication, runbooks, alternate processes, and stakeholder management
The architect designs all three. An answer that only addresses backup or DR is incomplete.
🎬 Video coming soon
Key takeaways
Knowledge check
Ingrid's trading platform has an RTO of 1 hour and RPO of 15 minutes. Which backup and recovery approach meets these requirements?
Elena discovers that Meridian's Azure Backup vault does not have immutability enabled. An attacker who compromises the backup admin account could delete all backups. Which combination of controls should Elena implement?
During a tabletop exercise, Nordic Capital Partners discovers their incident communication plan relies entirely on Microsoft Teams. Why is this a business continuity risk, and what should Ingrid recommend?
Next up: Evaluating Security Architecture Decisions — the capstone module where you practise making and justifying architecture trade-offs across all the frameworks.