πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-901 Domain 2
Domain 2 β€” Module 11 of 15 73%
22 of 26 overall

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together
Domain 2: Implement AI Solutions Using Foundry Premium ⏱ ~14 min read

Building a Vision App

Combine image analysis capabilities into a complete application. Use Azure AI Vision to classify images, detect objects, and read text β€” all from Python.

Building with Azure AI Vision

β˜• Simple explanation

Module 20 used GPT-4o to answer questions about images. This module uses Azure AI Vision β€” a dedicated service that’s faster and cheaper for specific vision tasks.

Think of the difference: GPT-4o is like a brilliant friend who can discuss anything about an image. Azure AI Vision is like a specialist tool β€” it’s optimised for reading text (OCR), detecting objects, and classifying images with high speed and accuracy.

Azure AI Vision (part of Foundry Tools) provides dedicated computer vision APIs for image analysis, OCR, object detection, and people detection. For advanced face analysis, there’s a separate Face API. Vision is optimised for production workloads with lower per-transaction costs than multimodal LLMs.

Azure AI Vision vs GPT-4o for vision

Azure AI Vision vs GPT-4o for image tasks
FeatureAzure AI VisionGPT-4o Visual Prompts
Best forHigh-volume classification, OCR, object detectionComplex visual reasoning, open-ended questions
OutputStructured JSON (tags, objects, text)Natural language response
CostLower per-transactionHigher per-token
Custom modelsYes β€” Custom Vision service for your own classifiersNo β€” uses general knowledge
SpeedFast β€” optimised for visionSlower β€” processes full LLM pipeline

Image analysis with the SDK

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint="https://your-vision-resource.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("your-key")
)

# Analyse an image
result = client.analyze(
    image_url="https://example.com/field-photo.jpg",
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.READ
    ]
)

# Caption
print(f"Caption: {result.caption.text} (confidence: {result.caption.confidence:.2f})")

# Tags
for tag in result.tags.list:
    print(f"Tag: {tag.name} ({tag.confidence:.2f})")

# Objects detected
for obj in result.objects.list:
    print(f"Object: {obj.tags[0].name} at [{obj.bounding_box}]")

# Text (OCR)
for block in result.read.blocks:
    for line in block.lines:
        print(f"Text: {line.text}")

Visual features explained

FeatureWhat It ReturnsUse Case
CAPTIONA natural language description of the imageAccessibility, image cataloguing
TAGSList of keywords describing the contentSearch indexing, content tagging
OBJECTSDetected objects with bounding boxesQuality control, inventory counting
READExtracted text (OCR)Document processing, sign reading
PEOPLEDetected people with positionsCrowd analysis, security
SMART_CROPSSuggested crop regions for thumbnailsSocial media, responsive images

GreenLeaf scenario: GreenLeaf builds a crop health monitoring app:

  1. Farmer uploads field photo via mobile app
  2. TAGS β€” identifies plant types, soil conditions
  3. OBJECTS β€” counts individual plants, locates problem areas
  4. CAPTION β€” generates a description for the report

🎬 Video walkthrough

🎬 Video coming soon

Building a Vision App β€” AI-901 Module 22

Building a Vision App β€” AI-901 Module 22

~14 min

Flashcards

Question

What Python package provides the Azure AI Vision image analysis SDK?

Click or press Enter to reveal answer

Answer

azure-ai-vision-imageanalysis β€” provides ImageAnalysisClient with the analyze() method that accepts visual features like CAPTION, TAGS, OBJECTS, and READ.

Click to flip back

Question

What visual features can you request from Azure AI Vision?

Click or press Enter to reveal answer

Answer

CAPTION (description), TAGS (keywords), OBJECTS (with bounding boxes), READ (OCR text), PEOPLE (detected persons), and SMART_CROPS (thumbnail suggestions).

Click to flip back

Question

When should you use Azure AI Vision instead of GPT-4o for image tasks?

Click or press Enter to reveal answer

Answer

When you need high-volume processing, structured JSON output, custom classifiers, or lower per-transaction cost. GPT-4o is better for complex visual reasoning and open-ended questions about images.

Click to flip back

Knowledge Check

Knowledge Check

GreenLeaf wants to build an app that processes 5,000 field photos daily, tagging each with the type of crop visible. Which approach is most cost-effective?

Knowledge Check

DataFlow Corp receives scanned business documents. They need to: 1) extract all text, 2) identify what objects appear in any embedded photos, and 3) generate a description of each page. Which visual features do they request?


Next up: Content Understanding: Documents & Forms β€” extracting structured data from invoices, receipts, and forms.

← Previous

Generating Images with AI

Next β†’

Content Understanding: Documents & Forms

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.