MCP, Computer Use & Agent Behaviours
Design agent extensibility with Model Context Protocol, automate tasks in apps and websites with Computer Use, and configure agent behaviours including reasoning mode and voice mode in Copilot Studio.
New frontiers: MCP, Computer Use, and smart behaviours
Model Context Protocol (MCP) is like a universal power adapter for AI agents. Just as a travel adapter lets your laptop plug into any countryβs power outlet, MCP lets your agent plug into any tool or data source that speaks the MCP protocol β without building a custom connector for each one.
Computer Use is like giving your agent hands and eyes. It can see whatβs on a screen and interact with it β clicking buttons, typing text, navigating menus. Think of it as a very patient colleague who follows your instructions exactly, even in apps that have no API.
Deep reasoning is like asking your colleague to take extra time on a tough problem β carefully working through the steps before giving you an answer. Voice mode is like calling that colleague on the phone instead of texting them.
Model Context Protocol (MCP) in Copilot Studio
MCP is an open protocol that standardises how AI agents connect to external tools and data. Instead of building a custom connector for every integration, you connect to an MCP server that exposes tools, resources, and prompts in a standard format.
How MCP works in Copilot Studio:
- MCP server β an external service that exposes tools (functions the agent can call), resources (data the agent can read), and prompts (pre-built prompt templates)
- Connection β Copilot Studio connects to the MCP server endpoint
- Discovery β the agent discovers available tools and resources from the MCP server
- Invocation β during a conversation, the agent calls MCP tools when relevant to the userβs request
| Feature | Architecture | Best For | Effort |
|---|---|---|---|
| MCP connections | Open protocol β connect to any MCP-compatible server. Agent discovers tools dynamically | When MCP servers already exist for the tools you need, when connecting to open-source tool ecosystems, or when you want a single integration pattern across many tools | Low if MCP server exists. Medium if you need to build one |
| Custom connectors | Point-to-point REST API integration β you define the schema, authentication, and endpoints | When the external system has a REST API but no MCP server, when you need fine-grained control over the API interaction | Medium β define connector schema, configure auth, map data |
| Copilot connectors | Prebuilt Microsoft connectors in the connector gallery β ready to use with authentication configured | When a prebuilt connector already exists for the target system (Salesforce, ServiceNow, SAP, etc.) | Low β select from gallery, configure credentials |
Scenario: Natalie implements MCP for a client's inventory system
Natalieβs client has a custom inventory management system built in-house. Ravi (senior developer) has already built an MCP server that exposes three tools:
check_stock(product_id)β returns current inventory levelsfind_alternatives(product_id)β finds substitute products when stock is lowcreate_reorder(product_id, quantity)β creates a purchase order
Copilot Studio integration:
- Natalie adds the MCP connection in Copilot Studio, pointing to the MCP server endpoint
- The agent automatically discovers the three tools
- When a sales rep asks βDo we have 500 units of part X in stock?β, the agent calls
check_stockand responds with current levels - If stock is low, the agent proactively calls
find_alternativesand offers substitutes - The rep can say βOrder 200 more of part Xβ and the agent calls
create_reorder
Why MCP over a custom connector? The MCP server already exists, the tool discovery is automatic, and if Ravi adds new tools later (like track_shipment), the agent discovers them without reconfiguration.
Important: MCP requires generative orchestration
MCP connections in Copilot Studio require generative orchestration to be enabled on the agent. Classic topic-based agents cannot call MCP servers directly β MCP tools are invoked through the generative orchestrator, which decides when and how to use them based on the conversation context.
When to prefer MCP: Choose MCP when you want centrally managed, reusable tools that multiple agents can share. An MCP server built once can serve many agents β unlike custom connectors that are configured per agent.
Computer Use in Copilot Studio (preview)
Computer Use gives agents the ability to interact with applications and websites by seeing the screen and performing actions β clicking buttons, typing into fields, navigating menus, and reading displayed content.
Preview notice: Computer Use is currently in preview and available in the US region only. It requires generative orchestration to be enabled. There are two hosting options: hosted machines (Microsoft-managed VMs β simpler setup, Microsoft handles infrastructure) and bring-your-own machines (self-hosted VMs β more control, you manage the infrastructure). Consider latency implications for production use, as each screen observation and action cycle adds delay.
Use cases:
| Scenario | Why Computer Use | Alternative |
|---|---|---|
| Automating data entry in a legacy ERP with no API | No programmatic access β the only interface is the UI | Build an API layer (expensive, time-consuming) |
| Testing web applications | Navigate pages, fill forms, verify content | Dedicated testing tools (Playwright, Selenium) |
| Extracting data from desktop applications | Information is on-screen but not accessible via APIs | Manual extraction or custom screen scraping |
| Training and onboarding demonstrations | Show step-by-step how to use an application | Recorded videos or documentation |
Design considerations for Computer Use:
- Latency: Screen observation and action take time β expect 2-5 seconds per step. Not suitable for real-time interactions
- Accuracy: Complex UIs with dynamic elements can confuse the agent. Simple, consistent layouts work best
- Security: The agent sees everything on screen β ensure sensitive data is masked or the session is scoped appropriately
- Reliability: UI changes (button moves, new pop-up) can break automation. Build error handling and retry logic
- Scope: Define exactly which applications and actions the agent is allowed to interact with
Scenario: Ravi builds a Computer Use agent for legacy ERP data entry
Ravi Krishnan (Cloudbridge Partners) automates data entry into a clientβs legacy ERP system that has no API. The clientβs warehouse team currently copies shipping data from emails into the ERP manually β 80 entries per day.
Computer Use agent design:
- Agent receives shipping notification data (structured from an email processing step)
- Opens the legacy ERP application
- Navigates to the βNew Shipmentβ form
- Fills in fields: tracking number, carrier, ship date, destination, weight
- Clicks βSaveβ
- Verifies the confirmation message appears
- Logs the result for audit
Guardrails:
- Agent can only interact with the ERPβs shipment entry form β no access to other modules
- Every entry is logged with before/after screenshots for audit
- If the agent encounters an unexpected screen (error dialog, login prompt), it stops and alerts the warehouse manager
- A human reviews a random 10% sample daily
Natalie presents the ROI: 80 entries at 3 minutes each = 4 hours of manual work per day. The agent handles 90% automatically, saving 3.6 hours daily.
Deep reasoning (preview)
Deep reasoning enables enhanced multi-step analysis for complex tasks. When enabled, the agent methodically breaks problems into steps and plans its approach before responding. This produces higher-quality answers for complex queries but is slower and costlier than standard orchestration.
When to enable deep reasoning:
- Multi-step calculations or analyses
- Complex decision-making with multiple variables
- Tasks that require comparing options and making trade-offs
- Situations requiring careful, methodical problem decomposition
When NOT to enable deep reasoning:
- Simple Q&A from a knowledge base
- Straightforward topic routing
- Tasks where speed matters more than depth
Note: Deep reasoning improves the agentβs internal analysis quality. It does not expose its reasoning chain to the end user β the user receives a final answer, not the intermediate steps.
Voice mode
Voice mode enables spoken interaction β the agent listens to speech, processes it, and responds with synthesised voice. Designed for phone and IVR (Interactive Voice Response) scenarios.
Design considerations for voice agents:
- Latency is critical β voice users expect fast responses. Keep prompts concise and responses under 30 seconds
- No visual fallback β you canβt show tables, links, or images. Design responses for audio only
- Confirmation patterns β βI heard you say you want to cancel order 1234. Is that correct?β
- Interruption handling β users may interrupt mid-response. Design for barge-in support
- Escalation β always provide a βspeak to a humanβ option
Flashcards
Knowledge check
A client has a 15-year-old desktop HR application with no API. HR staff spend 2 hours daily copying new hire data from a spreadsheet into the application's forms. Natalie proposes using Computer Use to automate this. What is the MOST critical design element she must address?
Ravi has built an MCP server that exposes 5 tools for a client's custom CRM. He adds the MCP connection in Copilot Studio. Two weeks later, he adds 2 new tools to the MCP server. What does the Copilot Studio agent need?
Jordan is designing a patient appointment scheduling agent for CareFirst Health. Patients will call a phone number, speak their scheduling request, and the agent should book, reschedule, or cancel appointments. Which combination of agent behaviours should Jordan enable?
π¬ Video coming soon
Next up: M365 Agents: Teams, SharePoint & Sales/Service in M365 Copilot β optimising agent design across Microsoft 365, configuring Sales and Service in M365 Copilot, and leveraging the AI hub in Power Platform.