πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided SC-401 Domain 1
Domain 1 β€” Module 4 of 8 50%
4 of 25 overall

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring

SC-401 Study Guide

Domain 1: Implement Information Protection

  • Know Your Data: Sensitive Info Types Free
  • Custom Sensitive Info Types: Build Your Own Free
  • EDM & Fingerprinting: Detect Exact Data
  • Trainable Classifiers: AI-Powered Detection Free
  • Sensitivity Labels: Create & Protect Free
  • Sensitivity Labels: Publish & Auto-Apply
  • Email Encryption: Lock Down Messages
  • Purview IP Client: Classify Files at Scale

Domain 2: Implement DLP and Retention

  • DLP Foundations: Stop Data Leaks
  • DLP Policies: Build, Manage & Extend
  • DLP: Precedence & Adaptive Protection
  • Endpoint DLP: Setup & Configuration
  • Endpoint DLP: Advanced Rules & Monitoring
  • Retention: Plan Your Data Lifecycle
  • Retention Labels: Publish & Auto-Apply
  • Retention: Policies, Precedence & Recovery

Domain 3: Manage Risks, Alerts, and Activities

  • Insider Risk: Foundations & Setup
  • Insider Risk: Policies & Indicators
  • Insider Risk: Investigate & Close Cases
  • Adaptive Protection: Risk Levels Meet DLP
  • Purview Audit: Investigate & Retain
  • Activity Explorer & Content Search
  • Alert Response: Purview, XDR & Cloud Apps
  • DSPM for AI: Setup & Controls
  • DSPM for AI: Policies & Monitoring
Domain 1: Implement Information Protection Free ⏱ ~13 min read

Trainable Classifiers: AI-Powered Detection

When regex cannot describe the content and no database exists to match against, trainable classifiers learn from examples to recognise contracts, resumes, source code, and other unstructured content.

What are trainable classifiers?

β˜• Simple explanation

Think about training a new security guard.

You cannot give them a checklist for every single threat β€” threats are too varied. Instead, you show them 50 examples of suspicious behaviour: β€œThis is what tailgating looks like. This is what a stolen badge scan looks like. This is what a social engineering attempt sounds like.”

After seeing enough examples, the guard learns to recognise the pattern β€” not by a fixed rule, but by understanding what these situations have in common.

Trainable classifiers work the same way. You feed Microsoft Purview dozens of example documents β€” contracts, resumes, financial statements β€” and the AI learns to recognise new documents that look similar. No regex needed.

Trainable classifiers use machine learning to classify content based on example documents rather than explicit patterns. They work for unstructured content where no single regex or keyword set can reliably identify the document type β€” contracts, resumes, source code, financial statements, harassment complaints, and intellectual property.

Microsoft provides pre-trained classifiers for common document types. You can also create custom trainable classifiers by providing positive examples (documents that ARE the target type) and optionally negative examples (documents that are NOT). After training and testing, the classifier can be used in DLP policies, auto-labeling, retention, and other Purview features.

Pre-trained vs custom trainable classifiers

Pre-trained classifiers are instant; custom classifiers are tailored
FeaturePre-trained ClassifiersCustom Trainable Classifiers
Created byMicrosoft β€” ships with your tenantYour admin team β€” trained on your examples
ExamplesResumes, source code, harassment, threats, profanity, financial statements, agreementsWhatever you train β€” clinical trial docs, board minutes, internal memos, R&D reports
Training needed?No β€” ready to use immediatelyYes β€” you provide 50+ positive examples and test
Customisable?No β€” you cannot modify their trainingYes β€” retrain if accuracy drops or content types evolve
AccuracyGood for common types, may vary for niche contentDepends on training quality and example diversity
Use caseQuick classification of common document typesOrganisation-specific content that no built-in classifier covers

Key pre-trained classifiers

ClassifierWhat It Detects
Agreements/ContractsLegal agreements, NDAs, contracts
Resumes/CVsJob applications and curriculum vitae
Source CodeProgramming code in various languages
Financial StatementsBalance sheets, income statements, cash flow statements
HarassmentOffensive or harassing language
ThreatsThreatening language toward people or property
ProfanityVulgar or offensive language
DiscriminationDiscriminatory language
Targeted HarassmentOffensive content directed at specific individuals
Customer ComplaintsContent expressing dissatisfaction with products or services

Creating a custom trainable classifier

When no pre-trained classifier fits, you build your own. The process has four stages:

Stage 1: Seed content (positive examples)

Collect at least 50 documents (ideally 200+) that ARE the target type. These must be representative examples β€” diverse in content but consistent in type.

RequirementDetail
Minimum count50 positive examples (200+ recommended for better accuracy)
FormatMust be uploaded to a SharePoint Online site
QualityMust genuinely represent the content type β€” not just any random documents
DiversityInclude variety within the type (different authors, dates, topics)
LanguageExamples should reflect the languages used in your organisation

Stage 2: Processing

After you submit the seed content, the classifier processes the examples and builds a prediction model. This takes 24-72 hours β€” there is no way to speed it up.

Stage 3: Testing

Provide both positive examples (more of the target type) and negative examples (documents that are NOT the target type). The classifier evaluates each and you review the results.

Test ResultWhat It MeansAction
True positiveCorrectly identified as the target typeGood β€” no action needed
True negativeCorrectly identified as NOT the target typeGood β€” no action needed
False positiveIncorrectly flagged as the target typeMark as β€œNot a match” to improve the model
False negativeMissed a real example of the target typeMark as β€œMatch” to improve the model

Stage 4: Publish

Once testing accuracy is acceptable, publish the classifier. It becomes available as a condition in DLP policies, auto-labeling rules, and retention labels β€” just like any SIT.

πŸ’‘ Scenario: Dr. Liam builds a clinical trial classifier

St. Harbour Health runs clinical trials. Trial documents vary widely β€” protocols, consent forms, adverse event reports, data collection forms β€” but they share characteristics: medical terminology, trial phase references, patient cohort language, regulatory citations.

No regex pattern can describe β€œa clinical trial document.” Dr. Liam collects 200 examples from the clinical research team, uploads them to SharePoint, creates a custom classifier, waits 48 hours for processing, then tests with 50 positive and 50 negative examples.

Results: 94% accuracy. He publishes the classifier and uses it in a DLP policy to prevent clinical trial documents from being shared externally without approval.

Retraining classifiers

Over time, document formats evolve. A classifier trained on 2024-era contracts may not recognise 2026-era contracts with AI-generated clauses. Microsoft Purview allows retraining:

  1. Add new positive and negative examples
  2. Submit for reprocessing (another 24-72 hours)
  3. Re-test and validate accuracy
  4. Republish
πŸ’‘ Exam tip: the 50-document minimum

The exam frequently tests the minimum requirements for trainable classifiers. Key numbers:

  • Seed content: minimum 50 positive examples (200+ recommended)
  • Testing: provide both positive AND negative examples
  • Processing time: 24-72 hours (not instant)
  • Location: seed content must be in SharePoint Online (not OneDrive, not local files)

If a question asks β€œwhat is the minimum number of positive examples for a custom trainable classifier?” the answer is 50.

Monitor classification: Data Explorer and Content Explorer

Once your SITs and classifiers are running, you need visibility into what they’re finding.

Content Explorer

Content Explorer lets you browse individual items that match a SIT or classifier. You can see exactly which documents contain sensitive data, where they live, and what was detected.

CapabilityWhat You Can Do
Browse by SIT/labelSee all items matching a specific SIT or sensitivity label
View contentOpen and inspect the actual document (with appropriate permissions)
Filter by locationNarrow to Exchange, SharePoint, OneDrive, or endpoints
Verify accuracyConfirm that SITs are detecting the right content

Who can access: Content Explorer Viewer role or Content Explorer List Viewer role (list viewer sees item count but cannot open content).

Data Explorer (Activity Explorer)

Data Explorer (also called Activity Explorer) shows what users are doing with classified data β€” a timeline of activities:

Activity TypeWhat It Shows
Label appliedA sensitivity label was added to a document
Label changedA sensitivity label was changed or removed
DLP policy matchedContent triggered a DLP rule
File copied to USBEndpoint DLP detected a file copy to removable media
File uploaded to cloudContent was uploaded to a cloud service
πŸ’‘ Scenario: Zara audits Atlas Global's classification

Zara Okonkwo at Atlas Global just rolled out new SITs for employee data and project codes. After two weeks, she opens Content Explorer to check:

  • 65,000 items match the employee data SIT across SharePoint and OneDrive
  • 12,000 items match the project code SIT β€” but 2,000 are in a public SharePoint site
  • She drills into the public site items and finds project proposals that should be labelled Confidential

She then checks Activity Explorer and finds 45 instances where employees downloaded project documents to personal OneDrive β€” confirming she needs an Endpoint DLP policy next.

Question

What is the minimum number of positive examples required to create a custom trainable classifier?

Click or press Enter to reveal answer

Answer

50 positive examples minimum, though 200+ is recommended for better accuracy. Examples must be uploaded to a SharePoint Online site. Processing takes 24-72 hours.

Click to flip back

Question

What is the difference between Content Explorer and Activity Explorer?

Click or press Enter to reveal answer

Answer

Content Explorer lets you browse individual items that match a SIT or sensitivity label β€” you can view the actual documents. Activity Explorer shows what users are doing with classified data β€” a timeline of label changes, DLP matches, file copies, and other activities.

Click to flip back

Question

When should you use a trainable classifier instead of a custom SIT?

Click or press Enter to reveal answer

Answer

Use a trainable classifier when the content cannot be described by a regex pattern β€” contracts, resumes, financial statements, clinical documents. Custom SITs work for structured, pattern-based data (IDs, account numbers). Trainable classifiers work for unstructured content defined by shared characteristics.

Click to flip back

Question

What Content Explorer role allows you to see item counts but NOT view actual document content?

Click or press Enter to reveal answer

Answer

Content Explorer List Viewer. This role can see how many items match a SIT or label, and which locations contain them, but cannot open and read the actual documents. The full Content Explorer Viewer role is required to see content.

Click to flip back

Knowledge Check

Marcus at NovaTech needs to classify internal R&D documents. These documents vary widely β€” some are research papers, some are experiment logs, some are patent drafts β€” but they all share technical language patterns specific to NovaTech's AI products. No built-in classifier covers this. What should Marcus do?

Knowledge Check

Zara at Atlas Global created a custom trainable classifier for employee performance reviews six months ago. Recently, Atlas Global switched to a new review format with AI-generated summary sections. The classifier is now missing 20% of new reviews. What should Zara do?

🎬 Video coming soon


Next up: Sensitivity Labels: Create & Protect β€” now that you can find sensitive data, learn how to protect it with labels.

← Previous

EDM & Fingerprinting: Detect Exact Data

Next β†’

Sensitivity Labels: Create & Protect

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.