0Pricing
AI Agents · Lesson

Building a Voice Conversation Loop

Record → transcribe → reason → speak cycle with interrupt handling.

The Voice Conversation Loop

A full voice conversation loop connects microphone input to agent output in a continuous cycle: record → transcribe → agent → TTS → play → repeat.

This lesson covers each component and the challenges that make voice loops different from text-based agents: silence detection, interrupt handling, and end-of-speech detection.

Recording from the Microphone

Use sounddevice to capture audio from the default microphone. Record either for a fixed duration or until silence is detected. Always record at 16kHz mono — the sample rate Whisper expects.

Install with pip install sounddevice soundfile numpy.

import sounddevice as sd
import numpy as np
import tempfile
import soundfile as sf

SAMPLE_RATE = 16000  # 16kHz mono — optimal for Whisper
DURATION = 5         # seconds

def record_audio(duration=DURATION, sample_rate=SAMPLE_RATE):
    print(f'Recording for {duration} seconds...')
    audio_data = sd.rec(
        int(duration * sample_rate),
        samplerate=sample_rate,
        channels=1,
        dtype='float32'
    )
    sd.wait()  # block until recording finishes
    print('Recording complete')
    return audio_data, sample_rate

def audio_to_file(audio_data, sample_rate, filepath='/tmp/recording.wav'):
    sf.write(filepath, audio_data, sample_rate)
    return filepath

All lessons in this course

  1. Speech-to-Text with Whisper and Deepgram
  2. Text-to-Speech in Agent Responses
  3. Building a Voice Conversation Loop
  4. Latency Optimization for Voice Agents
← Back to AI Agents