Computer Use: Agent-Driven UI Automation

When there is no API — the agent clicks the buttons

Simple explanation

Computer use is like teaching your agent to be a human operator.

Every tool you have learned so far — connectors, REST APIs, MCP — requires the target system to have an API. But what about that ancient order management system from 2005 that only has a web interface? No API. No connectors. Just a browser and a login form.

Computer use lets your agent interact with web applications the way a human would — it takes screenshots of the page, understands what it sees, decides what to click or type, and executes the action. Think of it as giving your agent a pair of eyes and a mouse.

Important caveat: this is a preview feature and should only be used when no API alternative exists. APIs are always faster, more reliable, and more secure.

API vs computer use vs RPA — know the difference

The exam expects you to choose the right automation approach for a given scenario. This comparison is critical.

API vs computer use vs RPA — choosing the right automation approach
Feature	How it works	Speed	Reliability	Setup effort	When to use
API integration	Agent calls structured API endpoints — JSON in, JSON out	Fastest — milliseconds to low seconds	Highest — deterministic, versioned, documented	Moderate — requires API access and connector/tool configuration	Always the first choice. Use whenever the target system has an API.
Computer use (preview)	Agent sees screenshots, understands the UI, and performs clicks/typing on a configured Windows machine	Slow — seconds to minutes per task (screenshots, vision model, execution)	Moderate — UI changes can break automation; vision model may misinterpret elements	Moderate — requires configured Windows machine via Power Automate machine management	Last resort for web or desktop apps with no API. Legacy systems, internal tools without API exposure.
RPA (Power Automate Desktop)	Recorded or scripted UI automation on desktop or web apps — pixel/selector based	Medium — faster than computer use but slower than APIs	Lower — brittle to UI changes; requires maintenance when UI updates	High — requires recording flows, installing agents on machines, maintaining selectors	Desktop applications, complex multi-app workflows, systems where browser-only access is insufficient.

The golden rule: always prefer API

If the exam gives you a scenario where an API exists, the answer is never computer use. Computer use is explicitly positioned as a fallback for systems without API access. Even if the scenario says “the UI is easier to use,” the correct answer is still API. APIs are faster, more reliable, and more secure. Computer use is the tool of last resort.

How computer use works — the execution loop

The agent performs a repeating cycle until the task is complete:

Navigate — the managed browser opens the target URL.
Capture — a screenshot of the current page state is taken.
Interpret — the vision model analyses the screenshot, identifying UI elements, text content, and page structure.
Plan — the model determines the next action to take (click a button, type in a field, select from a dropdown, scroll).
Execute — the browser automation engine performs the planned action.
Verify — another screenshot is captured to confirm the action succeeded.
Repeat — steps 2-6 loop until the task is complete or a failure is detected.

Configuring computer use

Setting up computer use in Copilot Studio involves several configuration steps:

Step	What you configure	Details
1. Enable the feature	Turn on computer use in the agent settings	Preview feature — must be explicitly enabled
2. Define target URLs	Specify which web applications the agent can access	Security boundary — the agent can only navigate to allowed domains
3. Provide credentials	Configure how the agent authenticates to the web app	Stored securely — typically a service account with minimum necessary permissions
4. Describe the task	Write a natural-language instruction for what the agent should do	Clear, step-by-step instructions improve reliability (e.g., “Navigate to Orders, search for the order number, click View Details, read the status field”)
5. Set guardrails	Configure timeouts, maximum actions, and failure behaviour	Prevents runaway sessions — e.g., max 20 actions, 5-minute timeout
6. Test in sandbox	Run the task in the test pane and review the execution trace	Watch the screenshot sequence to verify the agent followed the correct path

Monitoring and governance

Computer use actions are fully logged for audit and compliance. The exam expects you to know the monitoring capabilities.

Dataverse logging: Every computer use session is recorded in Dataverse with:

Session ID, start time, end time, duration
Target URL and authenticated user
Screenshot sequence (what the agent “saw” at each step)
Action log (what the agent did — click coordinates, typed text, selected values)
Outcome (success, failure, timeout)

Activity map: Copilot Studio provides a visual activity map showing the agent’s navigation path through the web application — which pages it visited, what actions it took, and where it spent time. This is invaluable for debugging when the agent goes off track.

Security considerations

Computer use raises unique security concerns: the agent has visual access to everything on the page, including sensitive data. Key safeguards:

Least-privilege service accounts — the agent should only have access to what it needs, nothing more.
URL allowlisting — restrict which domains the agent can navigate to. Prevent accidental navigation to sensitive internal systems.
Action limits — set maximum actions and timeouts to prevent runaway sessions.
Audit logging — all actions are logged in Dataverse. Review logs regularly for unexpected behaviour.
Human-in-the-loop — for sensitive operations, require human approval before the agent executes (covered in Module 6).

Scenario: Dev automates the legacy order management system

Dev’s logistics company has a 15-year-old order management system called “ShipTrack Classic.” It has a web interface but absolutely no API — the vendor stopped development years ago. The company processes 200 order status updates daily, each requiring a human operator to: log in, search for the order, click through three screens, update the status field, add a note, and save.

Dev configures computer use for the customer service agent:

Target URL: https://internal.shiptrack-classic.logistics.local/orders Credentials: A service account with order-update permissions only (no admin access). Task description: “Search for the order by number in the search bar. Click the matching result. Navigate to the Status tab. Update the Status dropdown to the new value. Enter the provided note in the Notes field. Click Save. Confirm the success message appears.” Guardrails: Maximum 15 actions per session, 3-minute timeout, fail if the success message does not appear.

Dev tests in the sandbox: the agent takes a screenshot of the login page, enters credentials, navigates to the search bar, types the order number, clicks the result, updates the status, adds the note, and saves. The activity map shows the exact navigation path. The Dataverse log records every action for compliance.

200 manual updates per day, now handled by the agent. Dev’s team just got 3 hours back — and the legacy system finally has “automation” without ever getting an API.

Question

What is computer use in Copilot Studio?

Click or press Enter to reveal answer

Answer

A preview capability that lets agents interact with web applications through visual understanding — taking screenshots, interpreting UI elements, and performing clicks/typing in a managed browser session. Used only when no API is available.

Click to flip back

Question

When should you use computer use vs an API integration?

Click or press Enter to reveal answer

Answer

Always prefer API integration — it is faster, more reliable, and more secure. Computer use is a last resort for web applications that have no API access at all.

Click to flip back

Question

Where are computer use actions logged?

Click or press Enter to reveal answer

Answer

In Dataverse. Every session records the target URL, screenshot sequence, action log (clicks, typing, selections), timestamps, and outcome. An activity map provides visual navigation tracking.

Click to flip back

Knowledge Check

Dev's logistics company has two systems: a modern shipping API (REST, documented) and a legacy order system (web UI only, no API). How should Dev integrate each?

Knowledge Check

A computer use task is taking too long and the agent appears stuck on a page. What configuration should have prevented this?

Knowledge Check

Which monitoring capability helps a developer debug a computer use session where the agent clicked the wrong button?