AI Prompt Engineering · Lesson

SSML and Prosody Control

Speech Synthesis Markup Language: breaks, emphasis, rate, and pitch.

What Is SSML?

SSML (Speech Synthesis Markup Language) is an XML-based language that gives you fine-grained control over how text-to-speech engines read text. It's supported by Google Cloud TTS, Amazon Polly, Microsoft Azure TTS, and many others.

Where plain text gives you only the words, SSML lets you control pauses, emphasis, speed, pitch, pronunciation, and more.

Basic SSML Structure

All SSML documents are wrapped in a <speak> tag. Inside, you mix plain text with SSML markup elements. TTS engines process the SSML and produce audio accordingly.

# SSML document structure
SSML_BASIC = """
<speak>
  Welcome to the course.
  <break time="500ms"/>
  Today we will cover three topics.
  First, we will discuss SSML basics.
  <break time="300ms"/>
  Second, we will explore prosody control.
  <break time="300ms"/>
  And third, we will look at advanced features.
</speak>
"""

# Send to Google Cloud TTS
from google.cloud import texttospeech

client_tts = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(ssml=SSML_BASIC)
voice = texttospeech.VoiceSelectionParams(
    language_code='en-US',
    ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)
print('SSML document ready to synthesize')

All lessons in this course

← Back to AI Prompt Engineering