πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-901 Domain 1
Domain 1 β€” Module 11 of 11 100%
11 of 26 overall

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together
Domain 1: AI Concepts and Capabilities Premium ⏱ ~12 min read

Information Extraction: From Chaos to Structure

Documents, images, audio, video β€” all full of valuable data locked in unstructured formats. Information extraction AI turns chaos into clean, structured, searchable data.

What is information extraction?

β˜• Simple explanation

Information extraction is AI reading a messy document and pulling out exactly what you need β€” like a really efficient assistant.

You hand your assistant a stack of 500 invoices. You say: β€œGet me the supplier name, total amount, and due date from each one.” They’d take weeks. An extraction AI does it in minutes.

But it’s not just documents β€” AI can extract information from images (photos of receipts), audio (recorded meetings), and video (presentation slides in a webinar).

Information extraction transforms unstructured content β€” documents, images, audio, and video β€” into structured, machine-readable data. Unlike text analysis (which understands meaning), extraction focuses on identifying and pulling out specific data fields with their values.

In Azure, the primary service for this is Azure Content Understanding (part of Foundry Tools), which provides multimodal extraction capabilities across documents, images, audio, and video.

Extraction across modalities

Information extraction across four modalities
FeatureWhat AI ExtractsExample
πŸ“„ From text/documentsSpecific fields from forms, invoices, contracts, reportsInvoice number, supplier name, line items, total amount
πŸ–ΌοΈ From imagesText, objects, labels, and metadata from photosProduct label info, building permit numbers, medical chart readings
πŸŽ™οΈ From audioSpoken content, speaker identity, key phrases, topicsMeeting action items, interview highlights, customer complaints
🎬 From videoVisual content, spoken words, on-screen text, scenesPresentation slide text, training video topics, security footage events

Document extraction

The most common extraction scenario. AI reads structured and semi-structured documents and extracts specific fields.

Document TypeFields Extracted
InvoicesInvoice number, vendor, date, line items, total, tax
ReceiptsStore name, items, prices, total, date
ID documentsName, date of birth, document number, nationality
Health recordsPatient name, diagnosis, medications, dates
ContractsParties, dates, terms, obligations, amounts

GreenLeaf scenario: GreenLeaf receives hundreds of supplier invoices per month in different formats β€” some printed, some scanned, some handwritten. Content Understanding reads each one and extracts the vendor name, amounts, and payment terms into their accounting system.

How extraction differs from text analysis

Text analysis vs information extraction
FeatureText AnalysisInformation Extraction
GoalUnderstand meaning and sentimentPull out specific data fields and values
InputUsually clean textDocuments, images, audio, video (messy/varied)
OutputSentiment scores, keywords, entities, summariesStructured data: { field: value } pairs
Example'This review is 85% positive''Invoice #4521, Total: $3,400, Due: 15 May 2026'
Azure serviceAzure AI LanguageAzure Content Understanding

Azure Content Understanding

Azure Content Understanding is the Azure service for multimodal information extraction. It’s part of Foundry Tools and can process:

  • Documents and forms (PDF, images of forms)
  • Images (photos, screenshots)
  • Audio (recordings, calls)
  • Video (presentations, training content)

You’ll work hands-on with Content Understanding in Domain 2 (Modules 24-27).

ℹ️ How Content Understanding works under the hood

Content Understanding combines multiple AI capabilities:

  1. OCR β€” reads text from the document/image
  2. Layout analysis β€” understands tables, headers, paragraphs, and document structure
  3. Field extraction β€” maps specific regions to named fields
  4. Validation β€” checks extracted data against expected formats (dates, numbers, etc.)

For audio and video, it adds: 5. Speech recognition β€” transcribes spoken content 6. Scene detection β€” identifies key moments in video 7. Slide extraction β€” captures on-screen text and slides

This multimodal approach means you can build one extraction pipeline that handles documents, images, audio, AND video.

🎬 Video walkthrough

🎬 Video coming soon

Information Extraction β€” AI-901 Module 11

Information Extraction β€” AI-901 Module 11

~12 min

Flashcards

Question

What is the difference between text analysis and information extraction?

Click or press Enter to reveal answer

Answer

Text analysis understands meaning (sentiment, keywords, entities). Information extraction pulls out specific data fields and values from unstructured content (documents, images, audio, video). Analysis = understanding. Extraction = structured output.

Click to flip back

Question

What is Azure Content Understanding?

Click or press Enter to reveal answer

Answer

A Foundry Tools service for multimodal information extraction. It can process documents, forms, images, audio, and video β€” extracting structured data fields from unstructured content.

Click to flip back

Question

What four modalities can information extraction work with?

Click or press Enter to reveal answer

Answer

Text/documents (invoices, forms), images (photos, labels), audio (recordings, calls), and video (presentations, security footage).

Click to flip back

Question

How does Content Understanding process a scanned invoice?

Click or press Enter to reveal answer

Answer

1) OCR reads the text, 2) Layout analysis understands tables and structure, 3) Field extraction maps regions to named fields (invoice number, total), 4) Validation checks data formats.

Click to flip back

Knowledge Check

Knowledge Check

MediSpark receives patient intake forms in multiple formats: some typed PDFs, some scanned handwritten forms, some photographed with phones. They need to extract patient name, DOB, and insurance number from all of them. Which Azure service is best suited?

Knowledge Check

DataFlow Corp records all customer support calls. They want to extract: the customer's account number (spoken), the issue category, and the resolution provided. Which modality of information extraction is this?


πŸŽ‰ You’ve completed Domain 1! You now understand AI concepts, responsible AI, model types, deployment, and all six workload categories. Domain 2 takes you hands-on β€” building real AI solutions in Microsoft Foundry.

Next up: Prompting Fundamentals β€” crafting effective system and user prompts for generative AI models.

← Previous

Image Generation: Creating with AI

Next β†’

Prompting Fundamentals: System & User Prompts

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.