Text Analysis with Language Models

Making sense of text

Simple explanation

Text analysis is like having a speed reader who can instantly tell you: what’s this about (topics), who’s mentioned (entities), what’s the mood (sentiment), and give you a one-paragraph summary — for any document, in any language.

In AI-103, you use two approaches: (1) prompt a language model to extract information (“Read this contract and extract all party names as JSON”), or (2) use Foundry Tools like Azure Translator for specialised tasks.

Text analysis capabilities

Capability	Approach	Output
Entity extraction	Prompt LLM: “Extract all person names, organisations, and dates”	Structured JSON with entities and types
Topic extraction	Prompt LLM: “What are the main topics discussed?”	List of topics with relevance scores
Summarisation	Prompt LLM: “Summarise this document in 3 sentences”	Concise summary
Structured JSON output	Prompt LLM with schema: “Extract fields matching this schema”	JSON matching specified schema
Sentiment detection	Prompt LLM: “Classify the sentiment as positive, negative, or neutral”	Positive/negative/neutral + confidence
Tone detection	Prompt LLM: “What is the tone of this message?”	Formal/informal/urgent/frustrated/etc.
Safety detection	Content Safety API	Flags for hate, violence, self-harm, sexual content
Sensitive content	Prompt LLM + custom rules	PII detection, confidential information flags

Translation approaches

Azure Translator vs LLM translation
Feature	Azure Translator (Foundry Tool)	LLM-Powered Translation
How it works	Dedicated translation engine	Prompt an LLM to translate
Best for	Large-volume document translation, 100+ languages	Nuanced translation with context awareness
Cost	Lower per character	Higher (LLM tokens)
Quality	Excellent for standard text	Better for idioms, context, tone preservation
Speed	Very fast	Slower (model inference)
Custom terminology	Custom glossaries and dictionaries	Few-shot examples in the prompt

Exam tip: When to use Translator vs LLM

Decision rule for the exam:

Bulk document translation → Azure Translator (cost-effective, fast)
Translation needing context and nuance → LLM (better quality for complex text)
Real-time chat translation → Depends on volume — low volume = LLM, high volume = Translator

If the scenario mentions cost or scale, lean toward Translator. If it mentions nuance or context, lean toward LLM.

Domain customisation

Technique	What It Does	Example
System prompt with domain context	Tell the model about industry terminology	”You are a legal analyst. ‘Material adverse change’ means…”
Few-shot examples	Show the model expected input/output pairs	3 examples of correctly extracted contract clauses
Output schema	Define exact JSON structure for extracted data	”Return JSON with fields: clause_type, parties, obligation, deadline”
Custom glossary	Map domain terms to standard definitions	”EBITDA” → “Earnings Before Interest, Taxes, Depreciation, and Amortization”

Real-world example: Atlas Financial's compliance summariser

Atlas Financial customises text analysis for compliance:

Entity extraction: Custom prompt extracts regulatory-specific entities:

Regulation references (Basel III, Dodd-Frank, MiFID II)
Financial amounts and thresholds
Compliance deadlines
Responsible parties

Compliance summarisation: System prompt includes:

Financial regulatory terminology definitions
Output format: risk level, key obligations, deadlines, affected departments
Few-shot examples of correctly summarised regulations

Sensitive content detection: Custom rules flag:

Client SSNs and account numbers (PII)
Non-public financial data
Insider information indicators

Key terms

Question

What is structured JSON output from an LLM?

Click or press Enter to reveal answer

Answer

Prompting a language model to return its response in a specific JSON format with defined fields and types. Used to extract structured data from unstructured text for database storage or API consumption.

Click to flip back

Question

What is domain customisation for text analysis?

Click or press Enter to reveal answer

Answer

Tailoring language model outputs for industry-specific tasks using system prompts with domain context, few-shot examples with domain terminology, and custom output schemas. No fine-tuning required — it's all prompt engineering.

Click to flip back

Question

When should you use Azure Translator vs an LLM for translation?

Click or press Enter to reveal answer

Answer

Azure Translator: bulk document translation, 100+ languages, lower cost. LLM: nuanced translation needing context awareness, tone preservation, or domain-specific terminology. Scale/cost → Translator; nuance/context → LLM.

Click to flip back

Knowledge check

Knowledge Check

Kai needs to extract shipment details (tracking number, origin, destination, weight, delivery date) from 50,000 shipping confirmation emails and store them in a database. Which approach is most appropriate?

Knowledge Check

MediaForge needs to translate their client's 200-page product catalogue from English into 15 languages. Budget is tight. Which approach minimises cost?