Computer Vision: Seeing the World
AI can look at a photo and tell you what's in it, read text from images, detect objects, and classify scenes. This module covers all the vision capabilities the exam tests.
How does AI see?
Computer vision lets AI look at images and understand what’s in them — just like you do, but at scale.
When you look at a photo, your brain instantly recognises faces, reads signs, notices objects. Computer vision does the same thing using AI models trained on millions of labelled images.
The difference? AI can process thousands of images per second. A human quality inspector checks maybe 60 items per hour. A vision AI checks 60 per second.
Computer vision capabilities
| Feature | What It Does | Example |
|---|---|---|
| Image classification | Assigns labels/categories to an image | 'This is a photo of a cat' or 'This X-ray shows pneumonia' |
| Object detection | Finds and locates specific objects within an image with bounding boxes | Counting people in a room, detecting products on a shelf |
| Image description | Generates a natural language description of what's in the image | 'A woman in a white coat examining an X-ray on a lightbox' |
| OCR (Optical Character Recognition) | Reads and extracts text from images | Reading a licence plate, extracting text from a scanned document |
| Face detection | Detects human faces and attributes (head pose, glasses, blur) | Security cameras, photo organisation, accessibility |
| Spatial analysis | Analyses movement and positioning in video | Counting foot traffic in a store, social distancing monitoring |
Image classification: is this a cat or a dog?
The simplest vision task — the model looks at an image and assigns it to a category.
| Use Case | Input | Output |
|---|---|---|
| Medical imaging | X-ray of a lung | ”Pneumonia detected” or “Normal” |
| Quality control | Photo of a product | ”Pass” or “Defect detected” |
| Content moderation | Uploaded image | ”Safe” or “Contains violence” |
MediSpark scenario: MediSpark trains a classification model to sort dermatology images into categories: benign, monitor, urgent referral. Each category triggers a different workflow in their patient management system.
Object detection: what’s in the picture and where?
Goes beyond classification — it identifies specific objects and marks their location with bounding boxes.
GreenLeaf scenario: GreenLeaf uses object detection on drone photos of their fields:
- Detects individual plants
- Identifies weeds vs crops
- Counts healthy vs diseased plants
- Maps problem areas for targeted treatment
OCR: reading text from images
Optical Character Recognition (OCR) extracts text from images — printed text, handwriting, signs, documents.
| Source | What OCR Reads |
|---|---|
| Scanned documents | Full page text, tables, headers |
| Business cards | Name, phone, email, company |
| Street signs | Road names, directions |
| Handwritten notes | Handwriting (with varying accuracy) |
| Receipts | Items, prices, totals |
Key exam concept: OCR is the bridge between the physical and digital world. It’s a computer vision capability, but its output feeds into text analysis and information extraction workflows.
OCR vs Content Understanding
OCR and Content Understanding (Module 11) are related but different:
| OCR | Content Understanding |
|---|---|
| Extracts raw text from images | Extracts structured fields from documents |
| Output: “Dr. Sarah Chen, DOB 15/03/1985” | Output: structured JSON with named fields like name, dob |
| Doesn’t understand what the text means | Understands document structure and field meanings |
| General-purpose | Trained for specific document types |
The exam may test whether you know when to use simple OCR vs full Content Understanding.
Azure AI Vision capabilities
Azure AI Vision (Foundry Tools) provides:
| Capability | API |
|---|---|
| Image analysis (tags, description, objects, people) | Image Analysis 4.0 |
| OCR | Read API |
| Face detection | Face API |
| Custom models | Custom Vision (train your own classifier) |
| Spatial analysis | Video analysis for movement patterns |
🎬 Video walkthrough
🎬 Video coming soon
Computer Vision — AI-901 Module 9
Computer Vision — AI-901 Module 9
~14 minFlashcards
Knowledge Check
GreenLeaf uses a drone to photograph their fields. They need AI to count individual plants, identify which are weeds, and mark their exact location in the image. Which computer vision capability is this?
DataFlow Corp receives thousands of business cards at conferences. They want to read all the text from photos of each card so they can search and filter it later. Which computer vision capability is most appropriate?
Next up: Image Generation — how AI creates entirely new images from text descriptions.