πŸ”’ Guided

Pre-launch preview. Authorised access only.

Incorrect code

Guided by A Guide to Cloud
Explore AB-900 AI-901
Guided AI-901 Domain 2
Domain 2 β€” Module 8 of 15 53%
19 of 26 overall

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together

AI-901 Study Guide

Domain 1: AI Concepts and Capabilities

  • What is AI? Your First 10 Minutes Free
  • Responsible AI: The Six Principles Free
  • How Generative AI Actually Works Free
  • Choosing the Right AI Model Free
  • Deploying AI Models: Options & Settings
  • AI Workloads at a Glance
  • Text Analysis: Keywords, Entities & Sentiment
  • Speech: Recognition & Synthesis
  • Computer Vision: Seeing the World
  • Image Generation: Creating with AI
  • Information Extraction: From Chaos to Structure

Domain 2: Implement AI Solutions Using Foundry

  • Prompting Fundamentals: System & User Prompts
  • Microsoft Foundry: Your AI Command Center Free
  • Building a Chat App with the Foundry SDK
  • Agents in Foundry: Create & Test
  • Building an Agent Client App
  • Building a Text Analysis App
  • Multimodal: Responding to Speech
  • Azure Speech in Foundry Tools
  • Visual Prompts: Images as Input
  • Generating Images with AI
  • Building a Vision App
  • Content Understanding: Documents & Forms
  • Multimodal Extraction: Images, Audio & Video
  • Building an Extraction App
  • Exam Prep: Putting It All Together
Domain 2: Implement AI Solutions Using Foundry Premium ⏱ ~14 min read

Azure Speech in Foundry Tools

Build a lightweight speech app using Azure AI Speech β€” the dedicated service for speech recognition, synthesis, and translation within Foundry Tools.

Building with Azure AI Speech

β˜• Simple explanation

Azure AI Speech is like giving your app ears and a voice.

In the last module, you used GPT-4o to process audio directly. This module uses Azure AI Speech β€” a dedicated service that’s optimised specifically for speech tasks. It’s faster for pure transcription, supports 100+ languages, and gives you fine-grained control over voice output.

Think of it as the specialist vs the generalist: GPT-4o can do everything, but Azure Speech does speech tasks better and cheaper.

Azure AI Speech (part of Foundry Tools) provides dedicated APIs for speech-to-text, text-to-speech, and speech translation. It’s optimised for production speech workloads with features like custom speech models, neural voices, real-time streaming, and pronunciation assessment.

Building a speech-to-text app

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription="your-speech-key",
    region="your-region"
)
speech_config.speech_recognition_language = "en-NZ"

# Recognise from microphone
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Speak now...")
result = recognizer.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print(f"You said: {result.text}")
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("Speech not recognised")

Building a text-to-speech app

speech_config = speechsdk.SpeechConfig(
    subscription="your-speech-key",
    region="your-region"
)

# Choose a neural voice
speech_config.speech_synthesis_voice_name = "en-NZ-MollyNeural"

synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

result = synthesizer.speak_text("Welcome to MediSpark. Your appointment is confirmed for Tuesday at 2 PM.")

Using SSML for fine-grained control

SSML (Speech Synthesis Markup Language) lets you control how the AI speaks:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-NZ">
  <voice name="en-NZ-MollyNeural">
    <prosody rate="slow" pitch="+5%">
      Welcome to MediSpark.
    </prosody>
    <break time="500ms"/>
    Your appointment is confirmed for
    <emphasis level="strong">Tuesday at 2 PM</emphasis>.
  </voice>
</speak>
SSML ElementWhat It Controls
prosodySpeed (rate), pitch, and volume
breakPauses between phrases
emphasisStress on specific words
voiceWhich neural voice to use
say-asHow to pronounce dates, numbers, addresses

Combining speech with AI

The most powerful pattern combines Azure Speech with an LLM:

User speaks β†’ Azure Speech (STT) β†’ GPT-4o (reasoning) β†’ Azure Speech (TTS) β†’ AI responds aloud

MediSpark scenario: MediSpark builds a voice-enabled patient assistant:

  1. Patient speaks: β€œWhen is my next appointment?”
  2. Azure Speech transcribes the question
  3. GPT-4o queries the appointment system and generates a response
  4. Azure Speech reads the response aloud in a warm, empathetic neural voice
ℹ️ Continuous recognition for long conversations

recognize_once() listens for a single phrase. For ongoing conversations, use continuous recognition:

recognizer.start_continuous_recognition()
# ... recognizer fires events as speech is detected
recognizer.stop_continuous_recognition()

This is essential for meeting transcription, live captioning, and voice-controlled applications where the user speaks continuously.

🎬 Video walkthrough

🎬 Video coming soon

Azure Speech in Foundry β€” AI-901 Module 19

Azure Speech in Foundry β€” AI-901 Module 19

~14 min

Flashcards

Question

What Python package provides the Azure AI Speech SDK?

Click or press Enter to reveal answer

Answer

azure-cognitiveservices-speech β€” provides SpeechConfig, SpeechRecognizer (for STT), and SpeechSynthesizer (for TTS).

Click to flip back

Question

What is SSML and when do you use it?

Click or press Enter to reveal answer

Answer

Speech Synthesis Markup Language β€” XML-based control for text-to-speech output. Use it to adjust speaking rate, pitch, add pauses, emphasise words, and control pronunciation. Essential when you need fine-grained voice control.

Click to flip back

Question

What is continuous recognition?

Click or press Enter to reveal answer

Answer

A speech recognition mode that listens continuously and fires events as speech is detected β€” unlike recognize_once() which listens for a single phrase. Used for meeting transcription and live captioning.

Click to flip back

Knowledge Check

Knowledge Check

MediSpark wants their patient assistant to speak appointment confirmations in a calm, slow pace with emphasis on the date and time. Which Azure Speech feature enables this level of control?

Knowledge Check

DataFlow Corp needs to transcribe a 2-hour recorded meeting, identifying who said what. Which Azure Speech features do they need?


Next up: Visual Prompts β€” sending images to AI and getting intelligent responses.

← Previous

Multimodal: Responding to Speech

Next β†’

Visual Prompts: Images as Input

Guided

I learn, I simplify, I share.

A Guide to Cloud YouTube Feedback

© 2026 Sutheesh. All rights reserved.

Guided is an independent study resource and is not affiliated with, endorsed by, or officially connected to Microsoft. Microsoft, Azure, and related trademarks are property of Microsoft Corporation. Always verify information against Microsoft Learn.