πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901 aws-aif-c01
Guided AB-620 Domain 2
Domain 2 β€” Module 6 of 12 50%
16 of 28 overall

AB-620 Study Guide

Domain 1: Plan and Configure Agent Solutions

  • Getting Started: Copilot Studio for Developers Free
  • Planning Enterprise Integration and Reusable Components Free
  • Identity Strategy for Agents Free
  • Channels, Deployment and Audience Design Free
  • Responsible AI and Security Governance Free
  • Agent Flows: Build, Monitor and Handle Errors Free
  • Human-in-the-Loop Agent Flows Free
  • Topics, Tools and Variables Free
  • Advanced Responses: Custom Prompts and Generative Answers Free
  • API Calls, HTTP Requests and Adaptive Cards Free

Domain 2: Integrate and Extend Agents in Copilot Studio

  • Enterprise Knowledge Sources: The Big Picture
  • Copilot Connectors and Power Platform Connectors
  • Azure AI Search as a Knowledge Source
  • Adding Tools: Custom Connectors and REST APIs
  • MCP Tools: Model Context Protocol in Action
  • Computer Use: Agent-Driven UI Automation
  • Multi-Agent Solutions: Design and Agent Reuse
  • Integrating Foundry Agents
  • Fabric Data Agents: Analytics Meets AI
  • A2A Protocol: Cross-Platform Agent Collaboration
  • Grounded Answers: Azure AI Search with Foundry
  • Foundry Model Catalog and Application Insights

Domain 3: Test and Manage Agents

  • Test Sets & Evaluation Methods
  • Reviewing Results & Tuning Performance
  • Solutions & Environment Variables
  • Power Platform Pipelines for Agent ALM
  • Agent Lifecycle: From Dev to Production
  • Exam Prep: Diagnostic Review

AB-620 Study Guide

Domain 1: Plan and Configure Agent Solutions

  • Getting Started: Copilot Studio for Developers Free
  • Planning Enterprise Integration and Reusable Components Free
  • Identity Strategy for Agents Free
  • Channels, Deployment and Audience Design Free
  • Responsible AI and Security Governance Free
  • Agent Flows: Build, Monitor and Handle Errors Free
  • Human-in-the-Loop Agent Flows Free
  • Topics, Tools and Variables Free
  • Advanced Responses: Custom Prompts and Generative Answers Free
  • API Calls, HTTP Requests and Adaptive Cards Free

Domain 2: Integrate and Extend Agents in Copilot Studio

  • Enterprise Knowledge Sources: The Big Picture
  • Copilot Connectors and Power Platform Connectors
  • Azure AI Search as a Knowledge Source
  • Adding Tools: Custom Connectors and REST APIs
  • MCP Tools: Model Context Protocol in Action
  • Computer Use: Agent-Driven UI Automation
  • Multi-Agent Solutions: Design and Agent Reuse
  • Integrating Foundry Agents
  • Fabric Data Agents: Analytics Meets AI
  • A2A Protocol: Cross-Platform Agent Collaboration
  • Grounded Answers: Azure AI Search with Foundry
  • Foundry Model Catalog and Application Insights

Domain 3: Test and Manage Agents

  • Test Sets & Evaluation Methods
  • Reviewing Results & Tuning Performance
  • Solutions & Environment Variables
  • Power Platform Pipelines for Agent ALM
  • Agent Lifecycle: From Dev to Production
  • Exam Prep: Diagnostic Review
Domain 2: Integrate and Extend Agents in Copilot Studio Premium ⏱ ~13 min read

Computer Use: Agent-Driven UI Automation

Agents that automate web application tasks with visual understanding.

When there is no API β€” the agent clicks the buttons

β˜• Simple explanation

Computer use is like teaching your agent to be a human operator.

Every tool you have learned so far β€” connectors, REST APIs, MCP β€” requires the target system to have an API. But what about that ancient order management system from 2005 that only has a web interface? No API. No connectors. Just a browser and a login form.

Computer use lets your agent interact with web applications the way a human would β€” it takes screenshots of the page, understands what it sees, decides what to click or type, and executes the action. Think of it as giving your agent a pair of eyes and a mouse.

Important caveat: this is a preview feature and should only be used when no API alternative exists. APIs are always faster, more reliable, and more secure.

Computer use (preview) is a Copilot Studio capability that enables agents to interact with web and desktop applications through visual understanding and UI automation. The agent uses a vision model (such as OpenAI CUA or Anthropic Claude) to interpret screenshots, identify UI elements (buttons, text fields, dropdowns, tables), plan a sequence of actions, and execute them.

The technical architecture: Copilot Studio connects to a configured Windows machine (physical or VM) via Power Automate machine management, launches the target application, captures screenshots, sends them to a vision model for interpretation, generates a plan of UI actions (click, type, scroll, select), executes the actions, and captures the result. All actions are logged in Dataverse for audit and monitoring.

Key constraints: computer use requires a configured Windows machine, is currently in preview, and requires careful monitoring. It is positioned as a last resort β€” always prefer API-based integration when available. The exam tests your ability to know when computer use is appropriate and how to configure and monitor it.

API vs computer use vs RPA β€” know the difference

The exam expects you to choose the right automation approach for a given scenario. This comparison is critical.

API vs computer use vs RPA β€” choosing the right automation approach
FeatureHow it worksSpeedReliabilitySetup effortWhen to use
API integrationAgent calls structured API endpoints β€” JSON in, JSON outFastest β€” milliseconds to low secondsHighest β€” deterministic, versioned, documentedModerate β€” requires API access and connector/tool configurationAlways the first choice. Use whenever the target system has an API.
Computer use (preview)Agent sees screenshots, understands the UI, and performs clicks/typing on a configured Windows machineSlow β€” seconds to minutes per task (screenshots, vision model, execution)Moderate β€” UI changes can break automation; vision model may misinterpret elementsModerate β€” requires configured Windows machine via Power Automate machine managementLast resort for web or desktop apps with no API. Legacy systems, internal tools without API exposure.
RPA (Power Automate Desktop)Recorded or scripted UI automation on desktop or web apps β€” pixel/selector basedMedium β€” faster than computer use but slower than APIsLower β€” brittle to UI changes; requires maintenance when UI updatesHigh β€” requires recording flows, installing agents on machines, maintaining selectorsDesktop applications, complex multi-app workflows, systems where browser-only access is insufficient.
πŸ’‘ The golden rule: always prefer API

If the exam gives you a scenario where an API exists, the answer is never computer use. Computer use is explicitly positioned as a fallback for systems without API access. Even if the scenario says β€œthe UI is easier to use,” the correct answer is still API. APIs are faster, more reliable, and more secure. Computer use is the tool of last resort.

How computer use works β€” the execution loop

The agent performs a repeating cycle until the task is complete:

  1. Navigate β€” the managed browser opens the target URL.
  2. Capture β€” a screenshot of the current page state is taken.
  3. Interpret β€” the vision model analyses the screenshot, identifying UI elements, text content, and page structure.
  4. Plan β€” the model determines the next action to take (click a button, type in a field, select from a dropdown, scroll).
  5. Execute β€” the browser automation engine performs the planned action.
  6. Verify β€” another screenshot is captured to confirm the action succeeded.
  7. Repeat β€” steps 2-6 loop until the task is complete or a failure is detected.

Configuring computer use

Setting up computer use in Copilot Studio involves several configuration steps:

StepWhat you configureDetails
1. Enable the featureTurn on computer use in the agent settingsPreview feature β€” must be explicitly enabled
2. Define target URLsSpecify which web applications the agent can accessSecurity boundary β€” the agent can only navigate to allowed domains
3. Provide credentialsConfigure how the agent authenticates to the web appStored securely β€” typically a service account with minimum necessary permissions
4. Describe the taskWrite a natural-language instruction for what the agent should doClear, step-by-step instructions improve reliability (e.g., β€œNavigate to Orders, search for the order number, click View Details, read the status field”)
5. Set guardrailsConfigure timeouts, maximum actions, and failure behaviourPrevents runaway sessions β€” e.g., max 20 actions, 5-minute timeout
6. Test in sandboxRun the task in the test pane and review the execution traceWatch the screenshot sequence to verify the agent followed the correct path

Monitoring and governance

Computer use actions are fully logged for audit and compliance. The exam expects you to know the monitoring capabilities.

Dataverse logging: Every computer use session is recorded in Dataverse with:

  • Session ID, start time, end time, duration
  • Target URL and authenticated user
  • Screenshot sequence (what the agent β€œsaw” at each step)
  • Action log (what the agent did β€” click coordinates, typed text, selected values)
  • Outcome (success, failure, timeout)

Activity map: Copilot Studio provides a visual activity map showing the agent’s navigation path through the web application β€” which pages it visited, what actions it took, and where it spent time. This is invaluable for debugging when the agent goes off track.

πŸ’‘ Security considerations

Computer use raises unique security concerns: the agent has visual access to everything on the page, including sensitive data. Key safeguards:

  • Least-privilege service accounts β€” the agent should only have access to what it needs, nothing more.
  • URL allowlisting β€” restrict which domains the agent can navigate to. Prevent accidental navigation to sensitive internal systems.
  • Action limits β€” set maximum actions and timeouts to prevent runaway sessions.
  • Audit logging β€” all actions are logged in Dataverse. Review logs regularly for unexpected behaviour.
  • Human-in-the-loop β€” for sensitive operations, require human approval before the agent executes (covered in Module 6).
Scenario: Dev automates the legacy order management system

Dev’s logistics company has a 15-year-old order management system called β€œShipTrack Classic.” It has a web interface but absolutely no API β€” the vendor stopped development years ago. The company processes 200 order status updates daily, each requiring a human operator to: log in, search for the order, click through three screens, update the status field, add a note, and save.

Dev configures computer use for the customer service agent:

Target URL: https://internal.shiptrack-classic.logistics.local/orders Credentials: A service account with order-update permissions only (no admin access). Task description: β€œSearch for the order by number in the search bar. Click the matching result. Navigate to the Status tab. Update the Status dropdown to the new value. Enter the provided note in the Notes field. Click Save. Confirm the success message appears.” Guardrails: Maximum 15 actions per session, 3-minute timeout, fail if the success message does not appear.

Dev tests in the sandbox: the agent takes a screenshot of the login page, enters credentials, navigates to the search bar, types the order number, clicks the result, updates the status, adds the note, and saves. The activity map shows the exact navigation path. The Dataverse log records every action for compliance.

200 manual updates per day, now handled by the agent. Dev’s team just got 3 hours back β€” and the legacy system finally has β€œautomation” without ever getting an API.

Question

What is computer use in Copilot Studio?

Click or press Enter to reveal answer

Answer

A preview capability that lets agents interact with web applications through visual understanding β€” taking screenshots, interpreting UI elements, and performing clicks/typing in a managed browser session. Used only when no API is available.

Click to flip back

Question

When should you use computer use vs an API integration?

Click or press Enter to reveal answer

Answer

Always prefer API integration β€” it is faster, more reliable, and more secure. Computer use is a last resort for web applications that have no API access at all.

Click to flip back

Question

Where are computer use actions logged?

Click or press Enter to reveal answer

Answer

In Dataverse. Every session records the target URL, screenshot sequence, action log (clicks, typing, selections), timestamps, and outcome. An activity map provides visual navigation tracking.

Click to flip back

Knowledge Check

Dev's logistics company has two systems: a modern shipping API (REST, documented) and a legacy order system (web UI only, no API). How should Dev integrate each?

Knowledge Check

A computer use task is taking too long and the agent appears stuck on a page. What configuration should have prevented this?

Knowledge Check

Which monitoring capability helps a developer debug a computer use session where the agent clicked the wrong button?

🎬 Video coming soon

Computer Use: Agent-Driven UI Automation

← Previous

MCP Tools: Model Context Protocol in Action

Next β†’

Multi-Agent Solutions: Design and Agent Reuse

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.