MCP, Computer Use & Agent Behaviours

New frontiers: MCP, Computer Use, and smart behaviours

Simple explanation

Model Context Protocol (MCP) is like a universal power adapter for AI agents. Just as a travel adapter lets your laptop plug into any country’s power outlet, MCP lets your agent plug into any tool or data source that speaks the MCP protocol — without building a custom connector for each one.

Computer Use is like giving your agent hands and eyes. It can see what’s on a screen and interact with it — clicking buttons, typing text, navigating menus. Think of it as a very patient colleague who follows your instructions exactly, even in apps that have no API.

Deep reasoning is like asking your colleague to take extra time on a tough problem — carefully working through the steps before giving you an answer. Voice mode is like calling that colleague on the phone instead of texting them.

Model Context Protocol (MCP) in Copilot Studio

MCP is an open protocol that standardises how AI agents connect to external tools and data. Instead of building a custom connector for every integration, you connect to an MCP server that exposes tools, resources, and prompts in a standard format.

How MCP works in Copilot Studio:

MCP server — an external service that exposes tools (functions the agent can call), resources (data the agent can read), and prompts (pre-built prompt templates)
Connection — Copilot Studio connects to the MCP server endpoint
Discovery — the agent discovers available tools and resources from the MCP server
Invocation — during a conversation, the agent calls MCP tools when relevant to the user’s request

MCP vs custom connectors vs Copilot connectors
Feature	Architecture	Best For	Effort
MCP connections	Open protocol — connect to any MCP-compatible server. Agent discovers tools dynamically	When MCP servers already exist for the tools you need, when connecting to open-source tool ecosystems, or when you want a single integration pattern across many tools	Low if MCP server exists. Medium if you need to build one
Custom connectors	Point-to-point REST API integration — you define the schema, authentication, and endpoints	When the external system has a REST API but no MCP server, when you need fine-grained control over the API interaction	Medium — define connector schema, configure auth, map data
Copilot connectors	Prebuilt Microsoft connectors in the connector gallery — ready to use with authentication configured	When a prebuilt connector already exists for the target system (Salesforce, ServiceNow, SAP, etc.)	Low — select from gallery, configure credentials

Scenario: Natalie implements MCP for a client's inventory system

Natalie’s client has a custom inventory management system built in-house. Ravi (senior developer) has already built an MCP server that exposes three tools:

check_stock(product_id) — returns current inventory levels
find_alternatives(product_id) — finds substitute products when stock is low
create_reorder(product_id, quantity) — creates a purchase order

Copilot Studio integration:

Natalie adds the MCP connection in Copilot Studio, pointing to the MCP server endpoint
The agent automatically discovers the three tools
When a sales rep asks “Do we have 500 units of part X in stock?”, the agent calls check_stock and responds with current levels
If stock is low, the agent proactively calls find_alternatives and offers substitutes
The rep can say “Order 200 more of part X” and the agent calls create_reorder

Why MCP over a custom connector? The MCP server already exists, the tool discovery is automatic, and if Ravi adds new tools later (like track_shipment), the agent discovers them without reconfiguration.

Important: MCP requires generative orchestration

MCP connections in Copilot Studio require generative orchestration to be enabled on the agent. Classic topic-based agents cannot call MCP servers directly — MCP tools are invoked through the generative orchestrator, which decides when and how to use them based on the conversation context.

When to prefer MCP: Choose MCP when you want centrally managed, reusable tools that multiple agents can share. An MCP server built once can serve many agents — unlike custom connectors that are configured per agent.

Computer Use in Copilot Studio (preview)

Computer Use gives agents the ability to interact with applications and websites by seeing the screen and performing actions — clicking buttons, typing into fields, navigating menus, and reading displayed content.

Preview notice: Computer Use is currently in preview and available in the US region only. It requires generative orchestration to be enabled. There are two hosting options: hosted machines (Microsoft-managed VMs — simpler setup, Microsoft handles infrastructure) and bring-your-own machines (self-hosted VMs — more control, you manage the infrastructure). Consider latency implications for production use, as each screen observation and action cycle adds delay.

Use cases:

Scenario	Why Computer Use	Alternative
Automating data entry in a legacy ERP with no API	No programmatic access — the only interface is the UI	Build an API layer (expensive, time-consuming)
Testing web applications	Navigate pages, fill forms, verify content	Dedicated testing tools (Playwright, Selenium)
Extracting data from desktop applications	Information is on-screen but not accessible via APIs	Manual extraction or custom screen scraping
Training and onboarding demonstrations	Show step-by-step how to use an application	Recorded videos or documentation

Design considerations for Computer Use:

Latency: Screen observation and action take time — expect 2-5 seconds per step. Not suitable for real-time interactions
Accuracy: Complex UIs with dynamic elements can confuse the agent. Simple, consistent layouts work best
Security: The agent sees everything on screen — ensure sensitive data is masked or the session is scoped appropriately
Reliability: UI changes (button moves, new pop-up) can break automation. Build error handling and retry logic
Scope: Define exactly which applications and actions the agent is allowed to interact with

Scenario: Ravi builds a Computer Use agent for legacy ERP data entry

Ravi Krishnan (Cloudbridge Partners) automates data entry into a client’s legacy ERP system that has no API. The client’s warehouse team currently copies shipping data from emails into the ERP manually — 80 entries per day.

Computer Use agent design:

Agent receives shipping notification data (structured from an email processing step)
Opens the legacy ERP application
Navigates to the “New Shipment” form
Fills in fields: tracking number, carrier, ship date, destination, weight
Clicks “Save”
Verifies the confirmation message appears
Logs the result for audit

Guardrails:

Agent can only interact with the ERP’s shipment entry form — no access to other modules
Every entry is logged with before/after screenshots for audit
If the agent encounters an unexpected screen (error dialog, login prompt), it stops and alerts the warehouse manager
A human reviews a random 10% sample daily

Natalie presents the ROI: 80 entries at 3 minutes each = 4 hours of manual work per day. The agent handles 90% automatically, saving 3.6 hours daily.

Deep reasoning (preview)

Deep reasoning enables enhanced multi-step analysis for complex tasks. When enabled, the agent methodically breaks problems into steps and plans its approach before responding. This produces higher-quality answers for complex queries but is slower and costlier than standard orchestration.

When to enable deep reasoning:

Multi-step calculations or analyses
Complex decision-making with multiple variables
Tasks that require comparing options and making trade-offs
Situations requiring careful, methodical problem decomposition

When NOT to enable deep reasoning:

Simple Q&A from a knowledge base
Straightforward topic routing
Tasks where speed matters more than depth

Note: Deep reasoning improves the agent’s internal analysis quality. It does not expose its reasoning chain to the end user — the user receives a final answer, not the intermediate steps.

Voice mode

Voice mode enables spoken interaction — the agent listens to speech, processes it, and responds with synthesised voice. Designed for phone and IVR (Interactive Voice Response) scenarios.

Design considerations for voice agents:

Latency is critical — voice users expect fast responses. Keep prompts concise and responses under 30 seconds
No visual fallback — you can’t show tables, links, or images. Design responses for audio only
Confirmation patterns — “I heard you say you want to cancel order 1234. Is that correct?”
Interruption handling — users may interrupt mid-response. Design for barge-in support
Escalation — always provide a “speak to a human” option