AI Agents · Lesson

Text-to-Speech in Agent Responses

OpenAI TTS, ElevenLabs, and Google TTS APIs in agent pipelines.

Why Agents Need TTS

A voice agent that only listens but responds in text is not a true voice agent. Text-to-Speech (TTS) converts the agent's text responses into spoken audio, completing the full voice loop.

Two leading APIs: OpenAI TTS (fast, affordable, 6 voices) and ElevenLabs (ultra-realistic voices, streaming, voice cloning).

OpenAI TTS Basics

OpenAI's TTS API converts text to speech in seconds. Six built-in voices: alloy, echo, fable, onyx, nova, shimmer. Supports MP3, opus, AAC, and FLAC output formats.

import openai
import os

client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

def text_to_speech(text, voice='alloy', output_path='response.mp3'):
    response = client.audio.speech.create(
        model='tts-1',          # tts-1 (fast) or tts-1-hd (higher quality)
        voice=voice,            # alloy, echo, fable, onyx, nova, shimmer
        input=text,
        response_format='mp3'   # mp3, opus, aac, flac
    )
    with open(output_path, 'wb') as f:
        f.write(response.content)
    print(f'Audio saved to: {output_path}')
    return output_path

# Available voices:
# alloy    - neutral, balanced
# echo     - warm, conversational
# fable    - expressive
# onyx     - deep, authoritative
# nova     - friendly, upbeat
# shimmer  - soft, clear

All lessons in this course

Speech-to-Text with Whisper and Deepgram
Text-to-Speech in Agent Responses
Building a Voice Conversation Loop
Latency Optimization for Voice Agents

← Back to AI Agents