Computer Use: Agent-Driven UI Automation
Agents that automate web application tasks with visual understanding.
When there is no API β the agent clicks the buttons
Computer use is like teaching your agent to be a human operator.
Every tool you have learned so far β connectors, REST APIs, MCP β requires the target system to have an API. But what about that ancient order management system from 2005 that only has a web interface? No API. No connectors. Just a browser and a login form.
Computer use lets your agent interact with web applications the way a human would β it takes screenshots of the page, understands what it sees, decides what to click or type, and executes the action. Think of it as giving your agent a pair of eyes and a mouse.
Important caveat: this is a preview feature and should only be used when no API alternative exists. APIs are always faster, more reliable, and more secure.
API vs computer use vs RPA β know the difference
The exam expects you to choose the right automation approach for a given scenario. This comparison is critical.
| Feature | How it works | Speed | Reliability | Setup effort | When to use |
|---|---|---|---|---|---|
| API integration | Agent calls structured API endpoints β JSON in, JSON out | Fastest β milliseconds to low seconds | Highest β deterministic, versioned, documented | Moderate β requires API access and connector/tool configuration | Always the first choice. Use whenever the target system has an API. |
| Computer use (preview) | Agent sees screenshots, understands the UI, and performs clicks/typing on a configured Windows machine | Slow β seconds to minutes per task (screenshots, vision model, execution) | Moderate β UI changes can break automation; vision model may misinterpret elements | Moderate β requires configured Windows machine via Power Automate machine management | Last resort for web or desktop apps with no API. Legacy systems, internal tools without API exposure. |
| RPA (Power Automate Desktop) | Recorded or scripted UI automation on desktop or web apps β pixel/selector based | Medium β faster than computer use but slower than APIs | Lower β brittle to UI changes; requires maintenance when UI updates | High β requires recording flows, installing agents on machines, maintaining selectors | Desktop applications, complex multi-app workflows, systems where browser-only access is insufficient. |
The golden rule: always prefer API
If the exam gives you a scenario where an API exists, the answer is never computer use. Computer use is explicitly positioned as a fallback for systems without API access. Even if the scenario says βthe UI is easier to use,β the correct answer is still API. APIs are faster, more reliable, and more secure. Computer use is the tool of last resort.
How computer use works β the execution loop
The agent performs a repeating cycle until the task is complete:
- Navigate β the managed browser opens the target URL.
- Capture β a screenshot of the current page state is taken.
- Interpret β the vision model analyses the screenshot, identifying UI elements, text content, and page structure.
- Plan β the model determines the next action to take (click a button, type in a field, select from a dropdown, scroll).
- Execute β the browser automation engine performs the planned action.
- Verify β another screenshot is captured to confirm the action succeeded.
- Repeat β steps 2-6 loop until the task is complete or a failure is detected.
Configuring computer use
Setting up computer use in Copilot Studio involves several configuration steps:
| Step | What you configure | Details |
|---|---|---|
| 1. Enable the feature | Turn on computer use in the agent settings | Preview feature β must be explicitly enabled |
| 2. Define target URLs | Specify which web applications the agent can access | Security boundary β the agent can only navigate to allowed domains |
| 3. Provide credentials | Configure how the agent authenticates to the web app | Stored securely β typically a service account with minimum necessary permissions |
| 4. Describe the task | Write a natural-language instruction for what the agent should do | Clear, step-by-step instructions improve reliability (e.g., βNavigate to Orders, search for the order number, click View Details, read the status fieldβ) |
| 5. Set guardrails | Configure timeouts, maximum actions, and failure behaviour | Prevents runaway sessions β e.g., max 20 actions, 5-minute timeout |
| 6. Test in sandbox | Run the task in the test pane and review the execution trace | Watch the screenshot sequence to verify the agent followed the correct path |
Monitoring and governance
Computer use actions are fully logged for audit and compliance. The exam expects you to know the monitoring capabilities.
Dataverse logging: Every computer use session is recorded in Dataverse with:
- Session ID, start time, end time, duration
- Target URL and authenticated user
- Screenshot sequence (what the agent βsawβ at each step)
- Action log (what the agent did β click coordinates, typed text, selected values)
- Outcome (success, failure, timeout)
Activity map: Copilot Studio provides a visual activity map showing the agentβs navigation path through the web application β which pages it visited, what actions it took, and where it spent time. This is invaluable for debugging when the agent goes off track.
Security considerations
Computer use raises unique security concerns: the agent has visual access to everything on the page, including sensitive data. Key safeguards:
- Least-privilege service accounts β the agent should only have access to what it needs, nothing more.
- URL allowlisting β restrict which domains the agent can navigate to. Prevent accidental navigation to sensitive internal systems.
- Action limits β set maximum actions and timeouts to prevent runaway sessions.
- Audit logging β all actions are logged in Dataverse. Review logs regularly for unexpected behaviour.
- Human-in-the-loop β for sensitive operations, require human approval before the agent executes (covered in Module 6).
Scenario: Dev automates the legacy order management system
Devβs logistics company has a 15-year-old order management system called βShipTrack Classic.β It has a web interface but absolutely no API β the vendor stopped development years ago. The company processes 200 order status updates daily, each requiring a human operator to: log in, search for the order, click through three screens, update the status field, add a note, and save.
Dev configures computer use for the customer service agent:
Target URL: https://internal.shiptrack-classic.logistics.local/orders
Credentials: A service account with order-update permissions only (no admin access).
Task description: βSearch for the order by number in the search bar. Click the matching result. Navigate to the Status tab. Update the Status dropdown to the new value. Enter the provided note in the Notes field. Click Save. Confirm the success message appears.β
Guardrails: Maximum 15 actions per session, 3-minute timeout, fail if the success message does not appear.
Dev tests in the sandbox: the agent takes a screenshot of the login page, enters credentials, navigates to the search bar, types the order number, clicks the result, updates the status, adds the note, and saves. The activity map shows the exact navigation path. The Dataverse log records every action for compliance.
200 manual updates per day, now handled by the agent. Devβs team just got 3 hours back β and the legacy system finally has βautomationβ without ever getting an API.
Dev's logistics company has two systems: a modern shipping API (REST, documented) and a legacy order system (web UI only, no API). How should Dev integrate each?
A computer use task is taking too long and the agent appears stuck on a page. What configuration should have prevented this?
Which monitoring capability helps a developer debug a computer use session where the agent clicked the wrong button?
π¬ Video coming soon
Computer Use: Agent-Driven UI Automation