Text Analysis with Language Models
Extract entities, detect sentiment, summarise documents, translate text, and customise language models for domain-specific tasks — all using generative prompting and Foundry Tools.
Making sense of text
Text analysis is like having a speed reader who can instantly tell you: what’s this about (topics), who’s mentioned (entities), what’s the mood (sentiment), and give you a one-paragraph summary — for any document, in any language.
In AI-103, you use two approaches: (1) prompt a language model to extract information (“Read this contract and extract all party names as JSON”), or (2) use Foundry Tools like Azure Translator for specialised tasks.
Text analysis capabilities
| Capability | Approach | Output |
|---|---|---|
| Entity extraction | Prompt LLM: “Extract all person names, organisations, and dates” | Structured JSON with entities and types |
| Topic extraction | Prompt LLM: “What are the main topics discussed?” | List of topics with relevance scores |
| Summarisation | Prompt LLM: “Summarise this document in 3 sentences” | Concise summary |
| Structured JSON output | Prompt LLM with schema: “Extract fields matching this schema” | JSON matching specified schema |
| Sentiment detection | Prompt LLM: “Classify the sentiment as positive, negative, or neutral” | Positive/negative/neutral + confidence |
| Tone detection | Prompt LLM: “What is the tone of this message?” | Formal/informal/urgent/frustrated/etc. |
| Safety detection | Content Safety API | Flags for hate, violence, self-harm, sexual content |
| Sensitive content | Prompt LLM + custom rules | PII detection, confidential information flags |
Translation approaches
| Feature | Azure Translator (Foundry Tool) | LLM-Powered Translation |
|---|---|---|
| How it works | Dedicated translation engine | Prompt an LLM to translate |
| Best for | Large-volume document translation, 100+ languages | Nuanced translation with context awareness |
| Cost | Lower per character | Higher (LLM tokens) |
| Quality | Excellent for standard text | Better for idioms, context, tone preservation |
| Speed | Very fast | Slower (model inference) |
| Custom terminology | Custom glossaries and dictionaries | Few-shot examples in the prompt |
Exam tip: When to use Translator vs LLM
Decision rule for the exam:
- Bulk document translation → Azure Translator (cost-effective, fast)
- Translation needing context and nuance → LLM (better quality for complex text)
- Real-time chat translation → Depends on volume — low volume = LLM, high volume = Translator
If the scenario mentions cost or scale, lean toward Translator. If it mentions nuance or context, lean toward LLM.
Domain customisation
| Technique | What It Does | Example |
|---|---|---|
| System prompt with domain context | Tell the model about industry terminology | ”You are a legal analyst. ‘Material adverse change’ means…” |
| Few-shot examples | Show the model expected input/output pairs | 3 examples of correctly extracted contract clauses |
| Output schema | Define exact JSON structure for extracted data | ”Return JSON with fields: clause_type, parties, obligation, deadline” |
| Custom glossary | Map domain terms to standard definitions | ”EBITDA” → “Earnings Before Interest, Taxes, Depreciation, and Amortization” |
Real-world example: Atlas Financial's compliance summariser
Atlas Financial customises text analysis for compliance:
Entity extraction: Custom prompt extracts regulatory-specific entities:
- Regulation references (Basel III, Dodd-Frank, MiFID II)
- Financial amounts and thresholds
- Compliance deadlines
- Responsible parties
Compliance summarisation: System prompt includes:
- Financial regulatory terminology definitions
- Output format: risk level, key obligations, deadlines, affected departments
- Few-shot examples of correctly summarised regulations
Sensitive content detection: Custom rules flag:
- Client SSNs and account numbers (PII)
- Non-public financial data
- Insider information indicators
Key terms
Knowledge check
Kai needs to extract shipment details (tracking number, origin, destination, weight, delivery date) from 50,000 shipping confirmation emails and store them in a database. Which approach is most appropriate?
MediaForge needs to translate their client's 200-page product catalogue from English into 15 languages. Budget is tight. Which approach minimises cost?
🎬 Video coming soon