Data Classification & Sensitivity Labels

You can’t protect what you don’t know about

Simple explanation

Think of a hospital filing room.

Imagine thousands of patient files scattered across desks, drawers, and shared folders — some confidential, some routine, all mixed together. You can’t lock up the sensitive ones if you don’t know which files contain patient records.

Data classification is the process of finding sensitive data and labelling it, so you know what needs protection. Microsoft Purview does this automatically — scanning documents for credit card numbers, medical records, passport numbers, and more.

How Purview classifies data

Microsoft Purview uses three methods to find and classify sensitive information:

1. Sensitive information types (SITs)

SITs use pattern matching to detect specific data formats. Think of them as smart templates that recognise data patterns.

SIT Category	Examples	How It Detects
Financial	Credit card numbers, bank account numbers	Number patterns + checksum validation
Personal ID	Social Security numbers, passport numbers	Format patterns + context keywords nearby
Health	Medical record numbers, drug names	Patterns + proximity to health-related terms
Custom	Your organisation’s patient ID format, internal codes	You define the pattern and keywords

Key exam concept: SITs use patterns and keywords, not AI. They look for specific formats (like 16 digits starting with 4 for Visa cards) plus supporting evidence (like the word “Visa” nearby). Microsoft includes 300+ built-in SITs and you can create custom ones.

2. Trainable classifiers

When patterns aren’t enough, trainable classifiers use machine learning to recognise types of documents based on their content — not just specific data formats.

Examples of built-in trainable classifiers:

Legal documents — contracts, NDAs, settlement agreements
Financial statements — balance sheets, income statements
Resumes/CVs — candidate applications
Source code — programming files
Healthcare — clinical trial documentation, discharge summaries

Key exam concept: Trainable classifiers identify document types, not data patterns. A SIT finds a credit card number; a classifier recognises an entire document as a “financial statement.”

3. Sensitivity labels (classification + protection)

We’ll cover labels in detail below, but the key point: labels are how you classify AND protect data. SITs and classifiers find the data; labels tell the world what it is and enforce rules.

Three complementary approaches: detect patterns, recognise documents, classify and protect
Feature	Sensitive Information Types	Trainable Classifiers	Sensitivity Labels
What it does	Detects specific data patterns	Recognises document types using ML	Classifies AND protects data
How it works	Pattern matching + keywords	Machine learning models trained on examples	Metadata tags with enforced protection
Example	Finds a credit card number in a spreadsheet	Identifies a document as a legal contract	Marks a file as Confidential and encrypts it
Learns from examples?	No — uses fixed patterns	Yes — trained on sample documents	No — applied by users, policies, or auto-labelling

Content Explorer and Activity Explorer

Once Purview classifies your data, you need visibility into what was found and what’s happening to it. That’s where the two Explorers come in.

Content Explorer

Content Explorer lets you browse the actual documents that contain sensitive data. Think of it as a search engine for classified content.

See exactly which files contain credit card numbers, patient IDs, or other SIT matches
Drill into a document to see the specific sensitive data detected
Requires Content Explorer Content Viewer or Content Explorer List Viewer roles (not everyone should see the actual data)

Activity Explorer

Activity Explorer shows what actions are happening with classified and labelled content:

A sensitivity label was applied or changed
A file was shared externally
A DLP policy was matched
A labelled document was printed or copied to USB

Content Explorer shows what's there; Activity Explorer shows what's happening
Feature	Content Explorer	Activity Explorer
What it answers	What sensitive data exists and where is it?	What are people doing with sensitive data?
Shows	Documents, the sensitive data found inside them, location	Actions: label applied, file shared, DLP match, download, print
Use case	Nadia needs to know how many files contain patient SSNs	Nadia needs to know if anyone shared labelled files externally this week
Permissions	Requires specific Content Explorer roles	Requires Activity Explorer role

Scenario: Nadia investigates a data concern

MedGuard’s IT Director, Liam, reports that users have been sharing spreadsheets externally. Nadia investigates:

Content Explorer — she searches for files containing “patient SSN” SIT matches. She finds 340 files across SharePoint and OneDrive
Activity Explorer — she filters for “shared externally” actions on files with sensitivity labels. She spots 12 files shared with external partners in the last 30 days

Nadia now has the evidence to create a DLP policy targeting external sharing of patient data — and she knows exactly the scope of the problem.

Sensitivity labels

Sensitivity labels are the action part of data classification. They don’t just tag data — they enforce protection.

What can a sensitivity label do?

Protection	How It Works
Encrypt	Only authorised users can open the document, even if it’s leaked
Restrict access	Block specific users or groups from accessing the content
Visual markings	Add headers, footers, or watermarks (e.g., “CONFIDENTIAL” watermark)
Protect containers	Control settings on Teams, SharePoint sites, and Microsoft 365 Groups

Key exam concept: Sensitivity labels travel with the document. If you email a labelled file to someone outside your organisation, the encryption and restrictions still apply. The label is embedded in the file’s metadata.

Label policies

Creating a label is only half the job. Label policies control how labels are published and enforced:

Policy Setting	What It Does
Publish to users	Choose which users and groups see which labels
Default label	Automatically apply a label to new documents (e.g., “General” by default)
Mandatory labelling	Users must choose a label before saving — no unlabelled documents allowed
Auto-labelling	Automatically apply labels based on SIT matches (e.g., if a document contains 5+ credit card numbers, label it “Confidential”)
Justification for downgrade	If a user tries to lower a label from “Highly Confidential” to “General,” they must explain why

Label priority

Labels have a priority order (set by the admin). Higher-priority labels override lower ones.

Example priority:

Public (lowest)
General
Confidential
Highly Confidential (highest)

If auto-labelling detects patient data and wants to apply “Highly Confidential” but the user already applied “General,” the higher-priority label replaces it. However, if a user manually applied “General,” auto-labelling will not override it by default — an admin must explicitly enable this in the auto-labelling policy settings.

Scenario: Nadia sets up labelling for MedGuard

Nadia configures MedGuard’s labelling strategy:

Labels created: Public, General, Confidential, Highly Confidential - Patient Data
Default label: “General” applied to all new documents
Mandatory labelling: Enabled — staff must label every document before saving
Auto-labelling: If a document contains 3+ patient SSN matches, automatically apply “Highly Confidential - Patient Data” (which adds encryption + “PATIENT DATA” watermark)
Justification required: Anyone downgrading from “Highly Confidential” must provide a reason

Now every document at MedGuard is classified, and patient data gets automatic protection without relying on staff to remember.

Exam tip: Labels vs SITs — the exam tests both

The exam often presents a scenario and asks whether to use a SIT, a classifier, or a label. Here’s the decision tree:

“We need to detect credit card numbers in documents” → SIT (pattern detection)
“We need to identify which documents are legal contracts” → Trainable classifier (ML-based)
“We need to encrypt documents containing patient data” → Sensitivity label (protection)
“We need to find credit card numbers AND encrypt the files” → SIT (to detect) + auto-labelling with a sensitivity label (to protect)