TTS Prompt Patterns for Natural Speech
Sentence structure, punctuation, and pacing cues that improve TTS output.
Why TTS Prompting Is Different
Text-to-Speech systems convert your text into audio literally. Unlike visual text where readers can re-read confusing parts, listeners experience audio linearly and cannot pause to decode an ambiguous sentence.
Writing for TTS means thinking about how words sound — rhythm, sentence length, pronunciation ambiguity, and the absence of visual formatting like bold or bullet points.
Sentence Length: Shorter Is Better
Long, complex sentences with multiple clauses are hard to follow in audio. TTS systems often break rhythm at grammatically correct but acoustically awkward points. Aim for sentences of 10-20 words for natural-sounding speech.
# Prompt a model to generate TTS-optimized text
TTS_SYSTEM_PROMPT = (
'You are writing text that will be read aloud by a text-to-speech system.\n\n'
'Rules:\n'
'- Use short, clear sentences (10-20 words each).\n'
'- Avoid complex nested clauses.\n'
'- End each sentence with a period for clear pause signals.\n'
'- Avoid parenthetical asides in the middle of sentences.\n'
'- Use conversational vocabulary — write as you would speak.\n'
'- Never use bullet points, headers, or markdown formatting.'
)
# Bad (long, nested):
BAD = (
'The transformer architecture, which was introduced in the landmark 2017 paper '
'"Attention is All You Need" by Vaswani et al. at Google, fundamentally changed '
'natural language processing by replacing recurrence with self-attention.'
)
# Good (TTS-friendly):
GOOD = (
'The transformer architecture changed natural language processing. '
'It was introduced in 2017 by researchers at Google. '
'Their key innovation was replacing recurrence with self-attention.'
)All lessons in this course
- TTS Prompt Patterns for Natural Speech
- SSML and Prosody Control
- Voice AI Persona Design
- Multimodal Voice and Text Agents