🔒 Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-103 Domain 3
Domain 3 — Module 1 of 3 33%
20 of 27 overall

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together

AI-103 Study Guide

Domain 1: Plan and Manage an Azure AI Solution

  • Choosing the Right AI Model Free
  • Foundry Services: Your AI Toolkit Free
  • Retrieval, Indexing & Agent Memory
  • Designing AI Infrastructure
  • Deploying Models & CI/CD
  • Quotas, Scaling & Cost
  • Monitoring & Security
  • Responsible AI: Filters, Auditing & Governance

Domain 2: Implement Generative AI and Agentic Solutions

  • Connecting Your App to Foundry Free
  • Building RAG Applications
  • Workflows & Reasoning Pipelines
  • Evaluating AI Models & Apps
  • Agent Fundamentals: Roles, Goals & Tools Free
  • Building Agents with Retrieval & Memory
  • Agent Tools & Knowledge Integration
  • Multi-Agent Orchestration & Safeguards
  • Agent Monitoring & Error Analysis
  • Prompt Engineering & Model Tuning
  • Observability & Production Operations

Domain 3: Implement Computer Vision Solutions

  • Image & Video Generation
  • Multimodal Visual Understanding
  • Responsible AI for Visual Content

Domain 4: Implement Text Analysis Solutions

  • Text Analysis with Language Models
  • Speech, Translation & Voice Agents

Domain 5: Implement Information Extraction Solutions

  • Ingestion, Indexing & Grounding Pipelines
  • Extracting Content with Content Understanding
  • Exam Prep: Putting It All Together
Domain 3: Implement Computer Vision Solutions Premium ⏱ ~14 min read

Image & Video Generation

From text prompts to stunning visuals. Learn how to generate images and videos, edit with inpainting and masks, and apply the right generation controls for quality and safety.

Creating visual content with AI

☕ Simple explanation

Image generation is like describing a painting to an artist — you write what you want, and the AI creates it. Video generation does the same but with moving pictures.

Beyond creating from scratch, you can also edit existing images: fill in removed areas (inpainting), change specific parts using masks, or modify elements with text instructions. The AI handles the pixel-level work.

Microsoft Foundry provides image and video generation through models like DALL-E 3 and Azure’s video generation APIs. Key capabilities:

  • Text-to-image — generate images from text descriptions
  • Image-to-image — modify existing images using reference media and prompts
  • Inpainting — fill in masked areas of an image with AI-generated content
  • Text-to-video — generate video clips from text descriptions
  • Video editing — modify generated video segments with text instructions

Image generation capabilities

CapabilityWhat It DoesUse Case
Text-to-imageGenerate an image from a text prompt”A professional office meeting with diverse team members”
Image variationGenerate variations of a reference imageCreate 5 alternatives of a product photo concept
InpaintingReplace masked areas with new generated contentRemove background objects, change clothing colour
Mask-based editingExtend or modify composition via masks on a larger canvasExpand a portrait to include more background
Style-directed generationPrompt the model for a specific visual style”A product photo in watercolour style” — achieved through prompt wording, not a separate API

Image editing with masks

Edit TypeHow It WorksExample
Mask-based inpaintingDefine an area (mask), AI fills it with new contentMask the sky, generate a sunset instead of grey clouds
Prompt-driven modificationDescribe what to change, AI modifies the image”Change the car colour from red to blue”
Object removalMask an object, AI fills with matching backgroundRemove a person from a product photo
Object replacementMask an object, describe replacement”Replace the chair with a modern standing desk”

Video generation and editing

FeatureDescriptionControl Options
Text-to-videoGenerate video clips from text promptsDuration, resolution, aspect ratio
Reference-basedGenerate video matching a reference image or clipStyle, motion, subject consistency
Video editingModify specific segments of generated videoText instructions for changes
Generation controlsPlatform-provided quality and safety settingsContent filters, watermarks, resolution limits
ℹ️ Real-world example: MediaForge's content pipeline

MediaForge uses image generation for client marketing campaigns:

  1. Brief → concept images: Client brief describes “modern tech office, diverse team, warm lighting” → generate 10 concept images
  2. Selection → variations: Client picks favourite → generate 5 variations with different compositions
  3. Refinement → inpainting: Client wants the window view changed → mask the window, prompt “city skyline at sunset”
  4. Final → style application: Apply brand-consistent colour grading to the final image

Total time: 20 minutes. Traditional photography: 2 days + $5,000.

Generation controls

ControlWhat It DoesWhen to Use
Content filtersBlock generation of unsafe contentAlways enabled — additional custom filters for brand safety
WatermarksAdd invisible or visible watermarks to generated contentCompliance with AI content disclosure requirements
ResolutionSet output image/video dimensionsMatch target platform requirements (social, print, web)
SeedReproduce similar results from the same promptA/B testing, consistent brand imagery
Quality settingsStandard vs HD generationStandard for prototyping, HD for final production
💡 Exam tip: Generation controls are about safety AND quality

The exam tests both:

  • Safety controls: content filters, watermarks, prohibited content detection
  • Quality controls: resolution, seed for reproducibility, style parameters

When a question asks about “appropriate generation controls,” consider both dimensions.

Key terms

Question

What is inpainting?

Click or press Enter to reveal answer

Answer

An image editing technique where you mask (select) an area of an image, and AI generates new content to fill that area. Used for object removal, background replacement, or targeted edits.

Click to flip back

Question

What is a generation seed?

Click or press Enter to reveal answer

Answer

A numerical value that makes image generation reproducible. Using the same prompt + seed produces very similar images each time. Useful for A/B testing and maintaining visual consistency.

Click to flip back

Question

What is text-to-video generation?

Click or press Enter to reveal answer

Answer

Creating video clips from text descriptions. The AI generates frames and motion based on the prompt, with controls for duration, resolution, and style. Can use reference images for visual consistency.

Click to flip back

Knowledge check

Knowledge Check

MediaForge needs to replace the background in a product photo — keeping the product but changing the background from a studio to a beach scene. Which technique should they use?

Knowledge Check

NeuralMed generates anatomical diagrams for patient education materials. Which generation control is MOST important to configure?

🎬 Video coming soon

← Previous

Observability & Production Operations

Next →

Multimodal Visual Understanding

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.