🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided MS-102 Domain 4
Domain 4 — Module 1 of 5 20%
24 of 28 overall

MS-102 Study Guide

Domain 1: Deploy and Manage a Microsoft 365 Tenant

  • Establish and Configure Your M365 Tenant
  • Monitor Tenant Health and Network Readiness
  • Adoption Tracking and Microsoft 365 Backup
  • Manage Users, Contacts and External Identities
  • Groups, Shared Mailboxes and Licensing at Scale
  • Automate with PowerShell: Bulk User Operations
  • Roles, Role Groups and Workload Permissions
  • Delegate with Administrative Units and PIM

Domain 2: Implement and Manage Microsoft Entra Identity and Access

  • Prepare for Identity Synchronization
  • Implement Connect Sync and Cloud Sync
  • Monitor and Troubleshoot Identity Sync
  • Authentication Methods and Self-Service Password Reset
  • Password Protection and Authentication Troubleshooting
  • Entra Identity Protection and Risk Policies
  • Conditional Access and MFA Enforcement

Domain 3: Manage Security and Threats by Using Microsoft Defender XDR

  • Defender XDR: Security Posture and Threat Intelligence
  • Investigate Incidents with Advanced Hunting
  • Defender for Office 365: Threat Policies
  • Email Threats, Attack Simulation and Restricted Entities
  • Defender for Endpoint: Onboard and Protect
  • Vulnerability Management
  • Defender for Cloud Apps: Connect and Govern
  • Cloud App Discovery and Activity Monitoring

Domain 4: Manage Compliance by Using Microsoft Purview

  • Sensitive Information Types and Data Classification
  • Retention Labels and Data Lifecycle
  • Sensitivity Labels and Monitoring
  • DLP Policies Across M365 Workloads
  • Endpoint DLP and Alert Response

MS-102 Study Guide

Domain 1: Deploy and Manage a Microsoft 365 Tenant

  • Establish and Configure Your M365 Tenant
  • Monitor Tenant Health and Network Readiness
  • Adoption Tracking and Microsoft 365 Backup
  • Manage Users, Contacts and External Identities
  • Groups, Shared Mailboxes and Licensing at Scale
  • Automate with PowerShell: Bulk User Operations
  • Roles, Role Groups and Workload Permissions
  • Delegate with Administrative Units and PIM

Domain 2: Implement and Manage Microsoft Entra Identity and Access

  • Prepare for Identity Synchronization
  • Implement Connect Sync and Cloud Sync
  • Monitor and Troubleshoot Identity Sync
  • Authentication Methods and Self-Service Password Reset
  • Password Protection and Authentication Troubleshooting
  • Entra Identity Protection and Risk Policies
  • Conditional Access and MFA Enforcement

Domain 3: Manage Security and Threats by Using Microsoft Defender XDR

  • Defender XDR: Security Posture and Threat Intelligence
  • Investigate Incidents with Advanced Hunting
  • Defender for Office 365: Threat Policies
  • Email Threats, Attack Simulation and Restricted Entities
  • Defender for Endpoint: Onboard and Protect
  • Vulnerability Management
  • Defender for Cloud Apps: Connect and Govern
  • Cloud App Discovery and Activity Monitoring

Domain 4: Manage Compliance by Using Microsoft Purview

  • Sensitive Information Types and Data Classification
  • Retention Labels and Data Lifecycle
  • Sensitivity Labels and Monitoring
  • DLP Policies Across M365 Workloads
  • Endpoint DLP and Alert Response
Domain 4: Manage Compliance by Using Microsoft Purview Premium ⏱ ~14 min read

Sensitive Information Types and Data Classification

Create and manage sensitive information types using keywords, keyword lists, and regular expressions to automatically identify and classify sensitive data.

The foundation of data protection

☕ Simple explanation

Before you can protect sensitive data, you need to find it. Sensitive information types (SITs) are the search patterns that tell Microsoft Purview what to look for.

Think of SITs like customs declarations at an airport. You train the scanner to recognise passport numbers, credit card numbers, and medical records. Once it knows what sensitive data looks like, it can flag it automatically — whether it’s in an email, a SharePoint document, or a Teams message.

Microsoft provides 300+ built-in SITs. For industry-specific data (patient IDs, internal codes), you create custom SITs.

Sensitive information types (SITs) are pattern-based classifiers in Microsoft Purview that identify sensitive data across M365 workloads. They power DLP policies, sensitivity labels, retention policies, and data classification.

Each SIT defines:

  • Primary pattern — a regular expression, keyword list, or function that matches the data format
  • Corroborative evidence — supporting keywords or patterns that increase confidence (e.g., “credit card” near a 16-digit number)
  • Confidence level — Low, Medium, or High based on how many evidence elements match
  • Proximity — how close the supporting evidence must be to the primary pattern (default: 300 characters)

Built-in vs custom SITs

Microsoft provides 300+ built-in SITs covering common data types globally:

CategoryExamplesDetection Method
FinancialCredit card numbers, bank account numbers, SWIFT codesPattern (Luhn algorithm) + keywords
HealthMedical record numbers, drug names, ICD codesPattern + medical keyword lists
IdentitySSN, passport numbers, driver’s licenceCountry-specific patterns + keywords
ITAzure storage keys, connection stringsPattern matching
RegionalNZ IRD numbers, Australian TFN, UK NINOCountry-specific formats

When you need custom SITs

Elena needs to detect MedGuard Health-specific data that no built-in SIT covers:

Data TypeFormatWhy Custom
Patient IDMG- followed by 8 digits (e.g., MG-12345678)Company-specific format
Internal drug codes3 letters + 4 digits (e.g., ASP1234)Internal classification system
Referring doctor codesDR- + 6 digitsInternal referral system

Creating custom SITs

Custom SITs are created in Microsoft Purview portal > Information Protection > Classifiers > Sensitive info types (purview.microsoft.com).

Method 1: Keyword-based SIT

For simple text matching:

ComponentExample
Keyword list”patient record”, “medical history”, “diagnosis report”, “treatment plan”
Case sensitiveNo (recommended for most scenarios)
Word matchWhole word (prevents false positives from partial matches)

Method 2: Regular expression SIT

For structured data patterns:

ComponentExample
Primary patternMG-\d{8} (matches MG- followed by exactly 8 digits)
Supporting keywords”patient”, “record”, “MedGuard” (within 300 characters)
Confidence levelsHigh: pattern + 2 keywords. Medium: pattern + 1 keyword. Low: pattern only.

Method 3: Keyword dictionary

For large keyword lists (up to 1 MB post-compression):

  • Import from a file (one term per line)
  • Useful for lists of drug names, medical terms, internal project codes
  • More efficient than keyword lists for large volumes
💡 Exam tip: Confidence levels and false positives

The exam tests your understanding of confidence levels and their impact on DLP:

  • High confidence — primary pattern + multiple supporting evidence. Few false positives, may miss some real data.
  • Medium confidence — primary pattern + some supporting evidence. Balanced.
  • Low confidence — primary pattern alone. Catches more data but more false positives.

DLP policies can be configured to act on different confidence levels. For example: high confidence → block, medium confidence → warn, low confidence → log only. If the exam asks “Elena’s DLP policy is blocking too many legitimate emails,” the answer is likely to increase the required confidence level.

Exact Data Match (EDM)

For the highest accuracy, EDM-based SITs match against your actual data:

  1. Upload a hashed version of your sensitive data (e.g., actual patient IDs from your database)
  2. Purview matches content against the hashed data
  3. Zero false positives — it only flags data that exists in your database

Elena uses EDM for patient IDs — instead of matching the pattern MG-\d{8} (which might match test data or random numbers), EDM matches only actual patient IDs from MedGuard’s patient database.

ℹ️ Deep dive: Trainable classifiers

Beyond pattern-based SITs, Microsoft Purview also offers trainable classifiers — machine learning models trained to recognise content types:

  • Pre-trained classifiers — resumes, source code, financial statements, legal documents
  • Custom trainable classifiers — trained with your own sample documents

Trainable classifiers work on content understanding (not just patterns) and are useful for unstructured data. The exam may ask about the difference: SITs match patterns, trainable classifiers match content types.

Key concepts to remember

Question

What three components make up a sensitive information type?

Click or press Enter to reveal answer

Answer

1. Primary pattern (regex, keyword list, or function). 2. Corroborative evidence (supporting keywords within proximity). 3. Confidence level (Low/Medium/High based on how many evidence elements match). Higher confidence = fewer false positives.

Click to flip back

Question

What is the difference between a keyword list and a keyword dictionary in Purview?

Click or press Enter to reveal answer

Answer

Keyword lists are small, inline collections of terms defined directly in the SIT. Keyword dictionaries are large, file-based collections (up to 1 MB post-compression) imported from a text file. Use dictionaries for drug names, medical terms, or other large reference lists.

Click to flip back

Question

What is Exact Data Match (EDM) and when should you use it?

Click or press Enter to reveal answer

Answer

EDM-based SITs match content against a hashed copy of your actual sensitive data (e.g., real patient IDs from your database). This eliminates false positives because it only matches data that exists in your records. Use for high-value data where false positives are unacceptable.

Click to flip back

Knowledge check

Knowledge Check

Elena creates a custom SIT using a regex pattern to match MedGuard patient IDs (format: MG- followed by 8 digits). The DLP policy using this SIT generates many false positives from test documents containing similar patterns. What should Elena do to reduce false positives?

Knowledge Check

Dev needs to create a SIT that detects drug names for a pharmaceutical client. The client has a list of 15,000 drug names that changes quarterly. What is the most efficient approach?

🎬 Video coming soon


Next up: Retention Labels and Data Lifecycle — keeping data for as long as you need it, and disposing of it when you don’t.

← Previous

Cloud App Discovery and Activity Monitoring

Next →

Retention Labels and Data Lifecycle

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.