🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-901 Domain 2
Domain 2 — Module 9 of 15 60%
20 of 26 overall

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together
Domain 2: Implement AI Solutions Using Foundry Premium ⏱ ~12 min read

Visual Prompts: Images as Input

Modern AI can see. Send an image alongside your text prompt, and the AI analyses what's in it. Learn how to use visual input with multimodal models in Foundry.

Sending images to AI

☕ Simple explanation

You can show a picture to AI and ask questions about it — just like showing a photo to a friend.

”What’s in this image?” “Is there anything unusual?” “Read the text on this sign.” “How many people are in this photo?” The AI looks at the image and gives you an intelligent answer.

This works because multimodal models like GPT-4o can process both text AND images simultaneously.

Multimodal models accept images as input alongside text prompts. The model processes both modalities together, enabling tasks like image description, visual question answering, document reading, and diagram analysis. In Azure, GPT-4o’s vision capabilities are available through the standard chat completions API.

Sending an image with your prompt

import base64

# Read image file
with open("xray.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = chat.complete(
    model="gpt4o-deployment",
    messages=[
        {"role": "system", "content": "You are a medical image analysis assistant. Describe what you observe but never provide diagnoses."},
        {"role": "user", "content": [
            {"type": "text", "text": "What do you observe in this chest X-ray?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}}
        ]}
    ]
)

print(response.choices[0].message.content)

What’s happening:

  • The user message contains BOTH text and an image
  • The image is base64-encoded and embedded in the message
  • GPT-4o processes both together, understanding the question AND the visual content

Image input methods

MethodHow It WorksBest For
Base64 encodingEmbed the image data directly in the API callLocal files, private images
URL referenceProvide a public URL to the imagePublicly accessible images, web content
# Method 2: URL reference
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}

What you can do with visual prompts

TaskExample PromptUse Case
Describe”What’s in this image?”Accessibility, cataloguing
Analyse”What trends do you see in this chart?”Business intelligence, reporting
Read text”Read all the text in this document”OCR alternative, document processing
Compare”What’s different between these two images?”Quality control, before/after analysis
Count”How many people are in this photo?”Event monitoring, crowd analysis
Classify”Is this a defective or normal product?”Manufacturing quality control

GreenLeaf scenario: GreenLeaf farmers photograph their crops and ask the AI:

  • “Are there signs of disease in this tomato plant?”
  • “What type of pest damage do you see?”
  • “Compare this week’s growth to last week’s photo”
💡 Limitations of visual prompts

Visual prompts are powerful but have limitations:

  • Not a medical diagnostic tool — the model can describe what it sees, but shouldn’t make diagnoses
  • May misidentify fine details — small text, distant objects, or subtle differences may be missed
  • No real-time video — processes individual images, not live video streams
  • Token cost — images consume tokens, with higher-resolution images using more tokens
  • Content filtering — harmful or sensitive images are blocked

Exam tip: The exam may test your understanding of when visual prompts are appropriate vs when a dedicated vision service (Azure AI Vision) is better.

🎬 Video walkthrough

🎬 Video coming soon

Visual Prompts — AI-901 Module 20

Visual Prompts — AI-901 Module 20

~12 min

Flashcards

Question

How do you send an image to GPT-4o for analysis?

Click or press Enter to reveal answer

Answer

Include it in the user message as a content array item with type 'image_url'. The image can be base64-encoded (for local files) or referenced by URL (for public images). The model processes both text and image together.

Click to flip back

Question

What are the two methods for providing images to a multimodal model?

Click or press Enter to reveal answer

Answer

Base64 encoding (embed image data directly in the API call, best for local/private images) and URL reference (provide a public URL, best for web-accessible images).

Click to flip back

Question

What types of tasks can visual prompts handle?

Click or press Enter to reveal answer

Answer

Image description, chart/diagram analysis, text reading (OCR), image comparison, object counting, classification, and visual question answering.

Click to flip back

Knowledge Check

Knowledge Check

MediSpark wants doctors to upload X-ray images and get a description of what the AI observes. The system prompt should ensure the AI never provides diagnoses. Which implementation is correct?

Knowledge Check

GreenLeaf wants to process 10,000 field photos per day to detect crop disease. The analysis needs to be fast and cost-effective with a simple 'healthy/diseased' classification. What's the best approach?


Next up: Generating Images with AI — creating new visual content from text descriptions using GPT-image.

← Previous

Azure Speech in Foundry Tools

Next →

Generating Images with AI

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.