🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AZ-400 Domain 2
Domain 2 — Module 3 of 3 100%
6 of 25 overall

AZ-400 Study Guide

Domain 1: Design and Implement Processes and Communications

  • Work Item Tracking: Boards, GitHub & Flow
  • DevOps Metrics: Dashboards That Drive Decisions
  • Collaboration: Wikis, Teams & Release Notes

Domain 2: Design and Implement a Source Control Strategy

  • Branching Strategies: Trunk-Based, Feature & Release
  • Pull Requests: Policies, Protections & Merge Rules
  • Repository Management: LFS, Permissions & Recovery

Domain 3: Design and Implement Build and Release Pipelines

  • Package Management: Feeds, Versioning & Upstream
  • Testing Strategy: Quality Gates & Release Gates
  • Test Implementation: Code Coverage & Pipeline Tests
  • Azure Pipelines: YAML from Scratch Free
  • GitHub Actions: Workflows from Scratch Free
  • Pipeline Agents: Self-Hosted, Hybrid & VM Templates
  • Multi-Stage Pipelines: Templates, Variables & Approvals
  • Deployment Strategies: Blue-Green, Canary & Ring Free
  • Safe Rollouts: Slots, Dependencies & Hotfix Paths
  • Deployment Implementations: Containers, Scripts & Databases
  • Infrastructure as Code: ARM vs Bicep vs Terraform
  • IaC in Practice: Desired State & Deployment Environments
  • Pipeline Maintenance: Health, Migration & Retention

Domain 4: Develop a Security and Compliance Plan

  • Pipeline Identity: Service Principals, Managed IDs & OIDC Free
  • Authorization & Access: GitHub Roles & Azure DevOps Security
  • Secrets & Secure Pipelines: Key Vault & Workload Federation
  • Security Scanning: GHAS, Defender & Dependabot

Domain 5: Implement an Instrumentation Strategy

  • Monitoring for DevOps: Azure Monitor & App Insights
  • Metrics & KQL: Analysing Telemetry & Traces

AZ-400 Study Guide

Domain 1: Design and Implement Processes and Communications

  • Work Item Tracking: Boards, GitHub & Flow
  • DevOps Metrics: Dashboards That Drive Decisions
  • Collaboration: Wikis, Teams & Release Notes

Domain 2: Design and Implement a Source Control Strategy

  • Branching Strategies: Trunk-Based, Feature & Release
  • Pull Requests: Policies, Protections & Merge Rules
  • Repository Management: LFS, Permissions & Recovery

Domain 3: Design and Implement Build and Release Pipelines

  • Package Management: Feeds, Versioning & Upstream
  • Testing Strategy: Quality Gates & Release Gates
  • Test Implementation: Code Coverage & Pipeline Tests
  • Azure Pipelines: YAML from Scratch Free
  • GitHub Actions: Workflows from Scratch Free
  • Pipeline Agents: Self-Hosted, Hybrid & VM Templates
  • Multi-Stage Pipelines: Templates, Variables & Approvals
  • Deployment Strategies: Blue-Green, Canary & Ring Free
  • Safe Rollouts: Slots, Dependencies & Hotfix Paths
  • Deployment Implementations: Containers, Scripts & Databases
  • Infrastructure as Code: ARM vs Bicep vs Terraform
  • IaC in Practice: Desired State & Deployment Environments
  • Pipeline Maintenance: Health, Migration & Retention

Domain 4: Develop a Security and Compliance Plan

  • Pipeline Identity: Service Principals, Managed IDs & OIDC Free
  • Authorization & Access: GitHub Roles & Azure DevOps Security
  • Secrets & Secure Pipelines: Key Vault & Workload Federation
  • Security Scanning: GHAS, Defender & Dependabot

Domain 5: Implement an Instrumentation Strategy

  • Monitoring for DevOps: Azure Monitor & App Insights
  • Metrics & KQL: Analysing Telemetry & Traces
Domain 2: Design and Implement a Source Control Strategy Premium ⏱ ~12 min read

Repository Management: LFS, Permissions & Recovery

Manage large files with Git LFS, scale repositories with Scalar, configure permissions and tags, and recover or purge data using Git commands.

Why Repository Management Matters

☕ Simple explanation

Think of a warehouse.

A small shop keeps everything on shelves — easy to find, quick to access. But when the shop grows into a massive warehouse, you need systems: large items go in special storage (LFS), access badges control who enters which area (permissions), labels on shelves help you find things (tags), and there’s a process for recovering dropped items or disposing of expired stock (recovery and purging).

Repository management is warehouse logistics for your code. As repositories grow in size, contributors, and history, you need strategies to keep them fast, secure, and organised.

Git repositories were designed for text files — source code, configuration, documentation. When you add large binary files (images, videos, compiled binaries, ML models), Git’s performance degrades because it stores every version of every file in the repository history. A 100MB model file changed 50 times means 5GB of history.

The AZ-400 exam tests your ability to design strategies for large file management (Git LFS, git-fat), repository scaling (Scalar, cross-repository sharing), permissions (Azure Repos and GitHub), tagging (lightweight vs annotated), data recovery (git reflog, cherry-pick), and data removal (git filter-repo, BFG Repo-Cleaner).

Git Large File Storage (LFS)

Git LFS replaces large files in your repository with small pointer files while storing the actual file content on a separate LFS server. When you clone or checkout, Git LFS downloads only the large files you need for your current branch.

How Git LFS Works

Without LFS:
  repo (5GB) = code (50MB) + large files full history (4.95GB)
  Every clone downloads 5GB

With LFS:
  repo (50MB) = code (50MB) + pointer files (few KB)
  LFS server stores actual large files
  Clone downloads 50MB + only current version of needed large files

Setup:

  1. Install Git LFS: git lfs install
  2. Track file patterns: git lfs track "*.psd" (updates .gitattributes)
  3. Commit the .gitattributes file
  4. Add and commit large files normally — Git LFS intercepts and replaces them with pointers

What a pointer file looks like:

version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345678

When to Use Git LFS

File TypeUse LFS?Why
PSD/AI design filesYesLarge, binary, change frequently
Video/audio filesYesLarge, binary
Compiled binaries (DLLs, JARs)YesBinary, shouldn’t be in source anyway — consider packages instead
ML model filesYesOften 100MB+
SQLite database filesYesBinary format, large
Images for documentationMaybeSmall PNGs are fine in Git; large PSDs need LFS
Source codeNoText files are what Git does best
Configuration files (JSON, YAML)NoSmall text files

☁️ Jordan’s LFS Strategy

Jordan at Cloudstream Media manages a repo with video processing pipelines. The repo contains test video files (500MB each) for integration testing.

Jordan configures LFS tracking:

git lfs track "*.mp4"
git lfs track "*.mov"
git lfs track "*.psd"
git lfs track "models/*.bin"

Clone time dropped from 45 minutes to 3 minutes. Developers only download the video files they need for their current branch.

Question

What file does 'git lfs track' modify, and what does it do?

Click or press Enter to reveal answer

Answer

It modifies the .gitattributes file, adding patterns that tell Git LFS to manage matching files. For example, 'git lfs track *.psd' adds '*.psd filter=lfs diff=lfs merge=lfs -text' to .gitattributes. This file must be committed to the repository so all collaborators use LFS for the same file types.

Click to flip back

git-fat: A Lightweight Alternative

git-fat is a simpler alternative to Git LFS that stores large files in any rsync-accessible location (including S3, network drives, or cloud storage).

AspectGit LFSgit-fat
Server requirementDedicated LFS server (GitHub, Azure Repos, GitLab include one)Any rsync-accessible storage
ProtocolCustom LFS API over HTTPrsync
Hosting supportGitHub, Azure Repos, GitLab, BitbucketSelf-hosted storage only
MaintenanceManaged by hosting platformSelf-managed
Best forTeams using hosted Git platformsTeams needing custom storage backends
Exam Tip: git-fat on the Exam

git-fat appears in the AZ-400 objectives but is rarely the correct answer. The exam typically tests whether you know it exists as an alternative to Git LFS for scenarios where you need custom storage backends. If the question mentions GitHub or Azure Repos, Git LFS is always the answer. git-fat is the answer only when the scenario requires self-managed storage or rsync-based transfer.

Scalar: Scaling Massive Repositories

Scalar is a tool from Microsoft (originally developed for the Windows OS repository — 300GB, 3.5 million files) that makes Git faster on large repositories without changing your workflow.

What Scalar does:

  • Partial clone — clone without downloading all file contents (blobs downloaded on demand)
  • Sparse checkout — only materialise the files and folders you need in your working directory
  • Background maintenance — prefetch commits and run git maintenance automatically
  • File system monitor — uses OS-level file watching instead of scanning all files for changes
  • Commit graph — pre-computes commit relationships for faster log and blame operations

Setup:

scalar clone https://dev.azure.com/org/project/_git/huge-repo

Scalar wraps a normal git clone but enables all the optimisations automatically.

Question

What is Scalar and when should you use it?

Click or press Enter to reveal answer

Answer

Scalar is a Microsoft tool that optimises Git for very large repositories. It enables partial clone, sparse checkout, background maintenance, and file system monitoring. Use it when your repository is large enough that normal git operations (clone, status, checkout) are slow — typically repositories with 100K+ files or 10GB+ history. It was built for the Windows OS repo (300GB, 3.5M files).

Click to flip back

Cross-Repository Sharing

When multiple repositories need shared code, you have several options:

Cross-Repository Sharing Strategies
StrategyHow It WorksProsCons
Git SubmodulesEmbeds one repo inside another as a pointer to a specific commitExact version pinning; independent reposComplex update workflow; nested clone issues; confusing for beginners
Git SubtreesCopies another repo's content into a subdirectory with merged historySimpler than submodules; works with normal Git commandsHistory pollution; manual sync required
Package managers (NuGet, npm)Publish shared code as a package; consume via dependencyClean separation; semantic versioning; standard toolingRequires package registry; more setup; release process needed
MonorepoAll code in one repository with build system managing projectsSingle source of truth; atomic cross-project changesRequires Scalar-level tooling at scale; long CI times without optimisation

☁️ Jordan’s Recommendation

Jordan recommends package managers for most teams: “Submodules are a footgun for anyone who doesn’t live in the terminal. Publish shared libraries as packages — NuGet for .NET, npm for Node, PyPI for Python. Pin versions, test independently, update deliberately.”

For Cloudstream’s internal Bicep modules, Jordan uses Azure Container Registry as a Bicep module registry — each module is versioned and consumed by reference.

Question

What is the difference between Git submodules and Git subtrees?

Click or press Enter to reveal answer

Answer

Submodules embed a reference (pointer) to a specific commit in another repository — the content stays in the external repo and is cloned separately. Subtrees copy the external repo's content directly into a subdirectory with merged history — the content lives in your repository. Submodules are precise but complex. Subtrees are simpler but pollute your history. For most teams, package managers are preferred over both.

Click to flip back

Repository Permissions

Azure Repos Permissions

Azure Repos uses a granular permission model at multiple levels:

LevelPermissions Available
OrganisationCreate repositories, manage repository policies
ProjectRead, contribute, create branches, manage permissions
RepositoryRead, contribute, create branch, create tag, manage notes, bypass policies, force push, edit policies
BranchPer-branch permissions (contribute, force push, bypass policies)

Key groups: Project Administrators, Contributors, Readers, Build Service (pipeline identity)

Important: The Build Service account needs explicit contribute permissions to push tags or update branches from pipelines.

GitHub Repository Roles

RoleCapabilities
ReadView code, open issues, comment
TriageManage issues and PRs (label, assign, close) without code access
WritePush code, manage branches, merge PRs
MaintainManage repository settings (except destructive actions)
AdminFull access including settings, secrets, branch protection, delete

GitHub Teams: Organise users into teams with role assignments. Teams can be nested (parent/child) for hierarchical access.

CODEOWNERS: Adds per-path reviewer requirements (covered in Module 5) — not technically permissions but functionally enforces who must review changes.

Exam Tip: Least Privilege Principle

The exam frequently tests the principle of least privilege. When asked which permission level to grant:

  • Developers who push code: Write (GitHub) or Contributor (Azure Repos)
  • CI/CD pipeline service accounts: Contributor with specific branch permissions
  • QA team that only manages issues: Triage (GitHub) — a commonly missed role
  • Project managers who view dashboards: Read (both platforms)

Never grant Admin when Write or Maintain would suffice. The exam penalises over-permissioning.

Tags: Organising the Repository

Tags mark specific commits as significant — typically releases.

Lightweight vs Annotated Tags

TypeCommandWhat It StoresUse When
Lightweightgit tag v1.0Just a pointer to a commit (like a branch that doesn’t move)Quick, informal markers
Annotatedgit tag -a v1.0 -m "Release 1.0"Full Git object with tagger name, email, date, and messageProduction releases — includes metadata for auditing

Annotated tags are recommended for releases because they include:

  • Who created the tag
  • When it was created
  • A message explaining the release
  • Can be GPG-signed for verification

Tag Naming Conventions

  • Semantic versioning: v1.2.3 (major.minor.patch)
  • Pre-release: v2.0.0-beta.1, v2.0.0-rc.1
  • Date-based: release-2026-04-15 (for teams without semver)
  • Environment-based: avoid — tags should mark versions, not environments
Question

What is the difference between a lightweight tag and an annotated tag in Git?

Click or press Enter to reveal answer

Answer

A lightweight tag is just a pointer to a commit — like a bookmark. It stores no metadata. An annotated tag is a full Git object that stores the tagger's name, email, date, and a message. Annotated tags can be GPG-signed. Always use annotated tags for releases (git tag -a v1.0 -m 'message') because they provide an audit trail of who tagged what and when.

Click to flip back

Data Recovery with Git Commands

Git’s reflog is your safety net. It records every HEAD movement — even ones that don’t appear in git log.

git reflog

git reflog shows a log of where HEAD has pointed. Even if you force-push, reset, or rebase away commits, the reflog remembers.

git reflog
# a1b2c3d HEAD@{0}: reset: moving to HEAD~3
# e4f5g6h HEAD@{1}: commit: Add feature X
# i7j8k9l HEAD@{2}: commit: Fix bug Y

# Recover the lost commits:
git checkout e4f5g6h
# or
git cherry-pick e4f5g6h
# or
git reset --hard e4f5g6h

Important: The reflog is local only — it’s not pushed to remotes. Entries expire after 90 days for reachable refs and 30 days for unreachable refs (both configurable).

Common Recovery Scenarios

ScenarioRecovery Command
Accidentally reset to wrong commitgit reflog then git reset --hard HEAD@{N}
Deleted a branch with unmerged workgit reflog then git checkout -b recovered-branch COMMIT_SHA
Need a specific commit from another branchgit cherry-pick COMMIT_SHA
Reverted a merge and need to undo the revertgit revert REVERT_COMMIT_SHA (revert the revert)
Lost stashed changesgit stash list then git stash apply stash@{N}
Question

How do you recover a commit that was lost after a git reset --hard?

Click or press Enter to reveal answer

Answer

Use 'git reflog' to find the SHA of the lost commit — reflog records every HEAD movement including resets. Then either: 'git reset --hard SHA' to move HEAD back, 'git cherry-pick SHA' to apply just that commit, or 'git checkout -b recovery-branch SHA' to create a new branch at that point. Note: reflog is local only and entries expire after 90 days.

Click to flip back

Removing Data from Source Control

Sometimes you need to permanently remove data — leaked credentials, accidentally committed large files, or sensitive data that should never have been pushed.

git filter-repo (Recommended)

git filter-repo is the modern, officially recommended tool for rewriting Git history. It replaced the older git filter-branch.

# Remove a specific file from all history
git filter-repo --path secrets.json --invert-paths

# Remove files larger than 10MB from all history
git filter-repo --strip-blobs-bigger-than 10M

# Replace text in all files across all history
git filter-repo --replace-text expressions.txt

BFG Repo-Cleaner

BFG is an older but still popular alternative — faster than git filter-branch but less flexible than git filter-repo.

# Remove files larger than 100MB from history
bfg --strip-blobs-bigger-than 100M

# Remove a specific file from all history
bfg --delete-files secrets.json

# Replace passwords in all history
bfg --replace-text passwords.txt

After rewriting history:

  1. Force-push to the remote: git push --force --all
  2. Force-push tags: git push --force --tags
  3. All collaborators must re-clone (their local history is now divergent)
  4. If credentials were leaked, rotate them immediately — rewriting history doesn’t revoke access
Exam Tip: Leaked Credentials

If the exam asks what to do when credentials are accidentally committed to a public repository:

  1. Rotate the credentials immediately — this is step one, before any history cleanup
  2. Remove the file from the working directory and commit
  3. Use git filter-repo or BFG to purge the file from all history
  4. Force-push to overwrite remote history
  5. Contact GitHub support to clear cached views (if public repo)
  6. Enable secret scanning to prevent future leaks

The key insight: rewriting history removes the file from Git but anyone who already cloned still has it. The credential must be rotated regardless.

Knowledge Check

Jordan's repository has grown to 8GB because developers committed large video test files directly (without LFS) over the past year. Clone times are unacceptable. What should Jordan do?

Knowledge Check

A developer accidentally committed an API key to a public GitHub repository 3 hours ago. Multiple people have already cloned the repository. What is the FIRST action to take?

Knowledge Check

Chen (SRE at Cloudstream) needs to mark a specific commit as the v3.0 production release with metadata including who approved it and a GPG signature. Which Git command should Chen use?

🎬 Video coming soon

Repository Management Deep Dive

Next up: Design and Implement Build and Release Pipelines — Domain 3 starts with testing strategies and pipeline fundamentals.

← Previous

Pull Requests: Policies, Protections & Merge Rules

Next →

Package Management: Feeds, Versioning & Upstream

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.