🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided SC-401 Domain 1
Domain 1 — Module 3 of 8 38%
3 of 25 overall

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring
Domain 1: Implement Information Protection Premium ⏱ ~14 min read

EDM & Fingerprinting: Detect Exact Data

Exact data match classifiers compare content against your actual database values. Document fingerprinting creates templates from forms. Together with OCR, they catch what regex cannot.

Beyond patterns — matching real data

☕ Simple explanation

Think about the difference between spotting a passport and spotting YOUR passport.

A regex-based SIT is like a border guard who knows what a passport looks like — the right size, the right colour, a photo, machine-readable text. It can spot any passport. But it cannot tell if a specific passport belongs to a specific person on a watchlist.

Exact data match (EDM) is like a guard with a database of everyone on the watchlist. When they scan a passport, they compare it directly against the database. If the name and number match an entry, they know exactly who it is — not just “this is a passport” but “this is John Smith’s passport.”

Document fingerprinting is different again — it learns the template of a specific form (like a tax return or patent application) and detects any document that matches that form’s structure, even if the content is different.

Exact data match (EDM) classifiers detect sensitive data by comparing document content against hashed values from a structured data source (database, CSV, or table). Instead of matching patterns, EDM matches actual values — specific patient names, real account numbers, exact employee records. This eliminates false positives from pattern-only detection.

Document fingerprinting converts a standard form or template into a SIT by analysing its text structure. Any document that matches the template’s structure (even with different content) triggers detection. It works for tax forms, patent applications, insurance claims, and any other standardised document.

Optical character recognition (OCR) extends all classification methods to images and scanned PDFs by extracting text before SIT evaluation.

Exact data match (EDM) — detect your actual data

EDM is the highest-accuracy classification method in Purview. Instead of “find anything that looks like a credit card number,” EDM says “find these specific credit card numbers from our customer database.”

How EDM works

StepWhat Happens
1. Define the schemaCreate an EDM schema that describes your data table columns (e.g., Name, SSN, Account Number)
2. Prepare the dataExport your sensitive data to a CSV/TSV file (the “sensitive information source table”)
3. Hash and uploadThe EDM Upload Agent hashes the data locally (it never sends plaintext to the cloud) and uploads the hashes
4. Create the EDM SITDefine which columns are primary (must match) and which are corroborative (supporting evidence)
5. DetectionWhen content is scanned, Purview hashes values in documents and compares against your uploaded hashes

Primary vs corroborative elements

Element TypeRoleExample
PrimaryMust match for detection to triggerPatient ID, Account Number, SSN
CorroborativeSupporting evidence — increases confidencePatient Name, Date of Birth, Address

A match requires at least one primary element. Corroborative elements boost confidence and reduce false positives.

💡 Scenario: Priya deploys EDM for client accounts

Meridian Financial has 45,000 client accounts. Priya wants DLP to detect when any actual client’s data appears in an email or document — not just any 8-digit number, but specifically numbers that belong to real clients.

She configures EDM:

  • Schema: ClientAccountNumber (primary), ClientName (corroborative), TaxID (primary)
  • Data source: Nightly export from the client management system (45,000 rows)
  • Hash schedule: Daily refresh via the EDM Upload Agent
  • Result: DLP now catches “John Smith, account MF-12345678” with near-zero false positives — because it matches against the real client database, not just the pattern.

EDM requirements and limitations

RequirementDetail
Maximum rowsUp to 100 million rows per data table
Maximum columnsUp to 32 columns per schema
Maximum table sizeUncompressed data up to 32 GB
Refresh frequencyCan refresh up to twice per day
Hash algorithmSHA-256 — data is hashed locally before upload
Upload toolEDM Upload Agent installed on a Windows server with access to the data source
LicensingRequires Microsoft 365 E5, E5 Compliance, or E5 Information Protection
💡 Exam tip: EDM hashing is done locally

A common exam question tests whether you understand where hashing happens. The EDM Upload Agent hashes your sensitive data on-premises (or on your designated server) before uploading. Plaintext sensitive data is never sent to the Microsoft cloud. Only SHA-256 hashes are uploaded.

This is a critical data residency and privacy feature. If a question asks about the security of EDM, remember: hashing is local, only hashes are stored in the cloud.

Document fingerprinting — detect forms by structure

Document fingerprinting converts a blank form or template into a SIT based on its text structure (the “word pattern”). Any document that matches the template’s structure triggers detection.

How it works

  1. Upload a blank template — e.g., a blank patent application, tax form, or new hire form
  2. Purview analyses the word pattern — it identifies the unique combination of text elements that define the form’s structure
  3. Creates a SIT — the fingerprint becomes a SIT you can use in DLP policies
  4. Detection — when any document matches the word pattern, it’s flagged

Good candidates for fingerprinting

Document TypeWhy It Works
Tax formsStandardised structure with consistent field labels
Patent applicationsSpecific sections, headers, and legal language
Insurance claim formsPredictable layout and terminology
New hire paperworkStandard fields across all employees
Regulatory filingsMandated structure and section headings

Not good candidates

Document TypeWhy It Doesn’t Work
Free-form emailsNo consistent structure to fingerprint
Meeting notesHighly variable format
Source codeStructure varies too much between projects
PresentationsSlides have inconsistent layouts
💡 Scenario: Dr. Liam fingerprints patient intake forms

St. Harbour Health uses a standardised patient intake form across all clinics. Dr. Liam wants DLP to detect when completed intake forms are emailed externally — because they contain patient identifiers, medical history, and insurance details.

He uploads the blank intake form template as a document fingerprint. Now, whenever a completed version of the form is attached to an external email, DLP blocks it with a policy tip: “This appears to be a patient intake form. External sharing of patient data requires approval.”

OCR — see text in images

Optical character recognition extracts text from images and scanned PDFs so SITs can evaluate them.

What OCR enables

Without OCR, a credit card number photographed on a desk is invisible to DLP. With OCR enabled, Purview extracts the text from the image and runs SIT evaluation on it.

Where OCR works

LocationOCR Available?
Exchange OnlineYes — images in email bodies and attachments
SharePoint OnlineYes — images and scanned PDFs in document libraries
OneDrive for BusinessYes — personal file storage
TeamsYes — images shared in chats and channels
EndpointsYes — scanned documents on devices

How to enable OCR

OCR is configured at the Microsoft Purview portal → Settings → Optical character recognition (OCR).

Key configuration points:

  • OCR must be explicitly enabled — it is not on by default
  • Specify which locations should use OCR scanning
  • OCR consumes additional processing — enable only where needed
  • Supports 149+ languages for text extraction
  • Requires Azure AI Services (billed separately for high volumes)
💡 Exam tip: OCR prerequisites

The exam may ask about OCR prerequisites. Key facts:

  • OCR requires an Azure subscription with Azure AI Services
  • There is a monthly free tier (5,000 images) before billing starts
  • OCR must be enabled in Purview settings — it is NOT on by default
  • OCR applies to DLP, auto-labeling, and SIT evaluation
  • If OCR is not enabled and sensitive data is only in images, DLP will NOT detect it
Three advanced detection methods that fill gaps left by regex-based SITs
FeatureExact Data Match (EDM)Document FingerprintingOCR
What it detectsSpecific values from your databaseDocuments matching a template structureText within images and scanned PDFs
Best forKnown data (customer lists, employee records)Standardised forms (tax, HR, insurance)Sensitive data captured as images or scans
Detection methodHash comparison against uploaded dataWord pattern matching against templateText extraction then SIT evaluation
False positive rateVery low — matches exact valuesLow — form structure is uniqueDepends on image quality and SIT accuracy
Setup effortMedium — schema + data + upload agentLow — upload a blank templateLow — enable in Purview settings
MaintenanceRegular data refresh (daily/weekly)Update template if form changesMinimal — runs automatically once enabled
Question

In exact data match, what is the difference between a primary element and a corroborative element?

Click or press Enter to reveal answer

Answer

A primary element must match for detection (e.g., Account Number, SSN). A corroborative element is supporting evidence that increases confidence (e.g., Name, Date of Birth). At least one primary element must match for an EDM detection to trigger.

Click to flip back

Question

Where does EDM hashing happen — in the cloud or on-premises?

Click or press Enter to reveal answer

Answer

On-premises (or on your designated server). The EDM Upload Agent hashes sensitive data locally using SHA-256 before uploading. Plaintext sensitive data is never sent to the Microsoft cloud.

Click to flip back

Question

What type of document is a good candidate for document fingerprinting?

Click or press Enter to reveal answer

Answer

Standardised forms with a consistent structure — tax forms, patent applications, insurance claims, new hire paperwork, regulatory filings. Documents with highly variable formats (free-form emails, meeting notes, source code) are NOT good candidates.

Click to flip back

Question

What prerequisite must be met before OCR can scan images for sensitive information types?

Click or press Enter to reveal answer

Answer

OCR must be explicitly enabled in Microsoft Purview settings (it is NOT on by default). It also requires an Azure subscription with Azure AI Services for processing. A free tier covers 5,000 images per month before billing starts.

Click to flip back

Knowledge Check

Priya at Meridian Financial wants to ensure that DLP detects when actual client names and account numbers appear in documents — not just any 8-digit number that matches a pattern. She has a client database with 45,000 records. What classification method should she use?

Knowledge Check

Dr. Liam at St. Harbour Health discovers that nurses are photographing whiteboards containing patient information and sharing the photos via Teams. Current DLP policies do not detect this. What should Dr. Liam configure?

🎬 Video coming soon


Next up: Trainable Classifiers: AI-Powered Detection — teach Microsoft Purview to recognise content by example.

← Previous

Custom Sensitive Info Types: Build Your Own

Next →

Trainable Classifiers: AI-Powered Detection

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.