πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided SC-401 Domain 1
Domain 1 β€” Module 1 of 8 13%
1 of 25 overall

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring
Domain 1: Implement Information Protection Free ⏱ ~12 min read

Know Your Data: Sensitive Info Types

Before you can protect sensitive data, you need to find it. Learn how Microsoft Purview uses sensitive information types to detect credit card numbers, patient IDs, tax file numbers, and anything else that matters to your organisation.

What are sensitive information types?

β˜• Simple explanation

Think of a sniffer dog at an airport.

The dog does not read every bag tag or scan every passport. It sniffs for specific chemical signatures β€” explosives, drugs, currency. It knows exactly what pattern to look for, and when it detects a match, it alerts the handler.

Sensitive information types (SITs) work the same way for your data. They scan emails, documents, chats, and files looking for specific patterns β€” credit card numbers, tax IDs, patient records, passport numbers. When a SIT finds a match, it triggers a policy action: block it, warn the user, or log the event.

SITs are the foundation of everything in SC-401. Without them, DLP policies, sensitivity labels, and auto-labeling have nothing to detect.

Sensitive information types (SITs) are pattern-based classifiers in Microsoft Purview that identify sensitive content across Microsoft 365 workloads. They use a combination of regular expressions, keyword lists, checksum validation, proximity rules, and confidence levels to detect data like financial account numbers, government IDs, health records, and proprietary information.

SITs serve as the detection engine for multiple Purview features: DLP policies, sensitivity labels (auto-labeling), retention labels (auto-apply), Insider Risk Management, and Data Security Posture Management for AI. They work across Exchange Online, SharePoint, OneDrive, Teams, endpoints, and Power BI.

Microsoft provides 300+ built-in SITs covering common patterns across dozens of countries and industries. When built-in types don’t fit, you can create custom SITs, exact data match (EDM) classifiers, or trainable classifiers.

Why classification comes first

Every protection feature in Microsoft Purview follows the same sequence:

Know β†’ Detect β†’ Protect β†’ Monitor

StepWhat HappensPurview Feature
KnowUnderstand what sensitive data your org handlesRisk assessment, data inventory
DetectFind that data wherever it livesSensitive information types, classifiers
ProtectApply controls β€” labels, encryption, DLPSensitivity labels, DLP policies
MonitorTrack what’s happening to sensitive dataActivity Explorer, Content Explorer, Audit

SITs handle step 2. Without detection, protection is guesswork.

πŸ’‘ Scenario: Priya's classification challenge

Priya Kapoor is the CISO at Meridian Financial, a 3,000-person investment bank. A recent audit found that trading floor analysts were emailing spreadsheets containing client account numbers and tax IDs to personal email addresses.

Before she can create DLP policies to block this, Priya needs to answer: what exactly counts as sensitive data at Meridian?

Her list includes: client account numbers (custom 8-digit format), tax file numbers (country-specific), credit card numbers, SWIFT codes, and internal deal codes. Some are covered by Microsoft’s built-in SITs. Others need custom definitions.

Built-in vs custom sensitive info types

Microsoft ships over 300 built-in SITs that cover common patterns worldwide. But most organisations also have unique data formats.

Built-in SITs cover common patterns; custom SITs fill the gaps
FeatureBuilt-in SITsCustom SITs
Created byMicrosoft β€” shipped with every tenantYour admin team β€” you define the pattern
ExamplesCredit card number, SSN, passport number, IBAN, tax IDEmployee ID (EMP-XXXXX), internal project codes, custom account numbers
Detection methodRegex + keyword + checksum + proximityRegex + keyword (you define the pattern)
Editable?No β€” you cannot modify built-in definitionsYes β€” full control over patterns, keywords, confidence
Country-specific?Yes β€” many SITs are region-specific (e.g., Australia Tax File Number)You decide β€” create for any region or format
Confidence levelsPre-configured (low, medium, high)You define confidence levels based on supporting evidence

How a SIT detects sensitive data

Every SIT uses a combination of techniques to reduce false positives:

1. Primary pattern (regex)

The main pattern that identifies the data. For a credit card number, this is a 16-digit number with specific spacing rules.

2. Supporting evidence (keywords)

Keywords near the pattern that increase confidence. Finding β€œ4532 0123 4567 8901” near the word β€œVisa” or β€œcard number” is stronger evidence than the number alone.

3. Checksum validation

Mathematical checks that confirm the number is structurally valid. Credit card numbers use the Luhn algorithm β€” not every 16-digit number is a real card number.

4. Proximity rules

How close the supporting evidence must be to the primary pattern. Keywords within 300 characters of the number score higher than keywords 1,000 characters away.

5. Confidence levels

ConfidenceWhat It MeansExample
High (85-100%)Strong match β€” multiple evidence elements found16-digit number + Luhn checksum + β€œVisa” keyword within 300 chars
Medium (75-84%)Moderate match β€” some evidence present16-digit number + Luhn checksum, but no keywords nearby
Low (65-74%)Weak match β€” pattern found but minimal context16-digit number alone, no checksum validation
πŸ’‘ Exam tip: confidence levels and DLP

Confidence levels matter for DLP policy configuration. A DLP rule can trigger on high confidence only (fewer false positives, may miss some real data) or on medium and above (catches more, but more false alerts).

The exam tests whether you understand this trade-off. If a question asks how to reduce false positives in a DLP policy, increasing the required confidence level is often the answer.

Identifying sensitive information requirements

Before you touch Purview, you need to map your organisation’s data landscape:

Step 1: Inventory your sensitive data

Work with legal, compliance, HR, and business units to identify:

  • Regulatory requirements β€” GDPR personal data, HIPAA PHI, PCI-DSS cardholder data, SOX financial data
  • Industry standards β€” banking account formats, medical record numbers, insurance claim IDs
  • Internal policies β€” employee IDs, project codes, deal names, salary data

Step 2: Map to built-in SITs

For each data type, check if Microsoft already provides a built-in SIT:

  • Go to Microsoft Purview portal β†’ Data classification β†’ Sensitive info types
  • Search by name or country
  • Review the pattern definition and test against sample data

Step 3: Identify gaps

Any data type not covered by built-in SITs needs one of:

  • Custom SIT β€” for pattern-based data (Module 2)
  • EDM classifier β€” for exact matches against a database (Module 3)
  • Trainable classifier β€” for content that’s hard to define by pattern, like contracts or resumes (Module 4)
πŸ’‘ Scenario: Dr. Liam's healthcare classification

Dr. Liam Chen is the IT Security Manager at St. Harbour Health, a 5,000-person healthcare network. His classification needs include:

  • Patient Health Identifiers (PHI) β€” covered by built-in SITs for many countries
  • Medicare numbers β€” built-in SIT available (country-specific)
  • Internal Medical Record Numbers (MRN-XXXXXXX) β€” NOT covered. Needs a custom SIT.
  • Clinical trial data β€” too varied for regex. Needs a trainable classifier.
  • Prescription data β€” combination of drug names + patient info. Needs EDM matching.

Liam creates a classification plan that uses all three approaches: built-in SITs for standard patterns, custom SITs for internal formats, and trainable classifiers for unstructured clinical content.

Where SITs are used across Purview

SITs don’t work alone. They’re the shared detection engine across multiple features:

Purview FeatureHow It Uses SITs
DLP policiesConditions that trigger block/warn/audit actions
Sensitivity labels (auto-labeling)Automatically apply labels when SITs are detected
Retention labels (auto-apply)Automatically retain or dispose content containing SITs
Insider Risk ManagementDetect when users interact with SIT-matching content
Content ExplorerBrowse and inspect documents that match SITs
DSPM for AIMonitor what sensitive data AI services can access
Question

What are the three main techniques a SIT uses to detect sensitive data?

Click or press Enter to reveal answer

Answer

1. Primary pattern (regex) β€” matches the data format. 2. Supporting evidence (keywords) β€” context near the pattern. 3. Checksum validation β€” mathematical verification the data is structurally valid. Together with proximity rules and confidence levels, these reduce false positives.

Click to flip back

Question

What is the difference between a built-in SIT and a custom SIT?

Click or press Enter to reveal answer

Answer

Built-in SITs are pre-configured by Microsoft (300+ types) and cannot be modified. Custom SITs are created by your admin team with your own regex patterns, keywords, and confidence levels β€” used for organisation-specific data formats.

Click to flip back

Question

If a 16-digit number is found in a document but no keywords like 'Visa' or 'card number' appear nearby, what confidence level would a credit card SIT typically assign?

Click or press Enter to reveal answer

Answer

Medium confidence. The pattern matches and the Luhn checksum passes, but the absence of supporting keywords reduces confidence from high to medium.

Click to flip back

Knowledge Check

Priya at Meridian Financial discovers that trading analysts are emailing client account numbers (a custom 8-digit format: MF-XXXXXX) to personal addresses. She wants DLP to detect these. Which approach should she take?

Knowledge Check

Dr. Liam at St. Harbour Health needs to classify three types of data: standard Medicare numbers, internal Medical Record Numbers (MRN-XXXXXXX), and unstructured clinical trial documents. Which combination of classification methods should he use?

Knowledge Check

A DLP policy at Meridian Financial is generating too many false positive alerts for credit card numbers. The policy currently triggers on medium confidence matches. What should Priya do to reduce false positives?

🎬 Video coming soon


Next up: Custom Sensitive Info Types: Build Your Own β€” create your own detection patterns for organisation-specific data.

Next β†’

Custom Sensitive Info Types: Build Your Own

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.