Building a Voice Conversation Loop
Record → transcribe → reason → speak cycle with interrupt handling.
The Voice Conversation Loop
A full voice conversation loop connects microphone input to agent output in a continuous cycle: record → transcribe → agent → TTS → play → repeat.
This lesson covers each component and the challenges that make voice loops different from text-based agents: silence detection, interrupt handling, and end-of-speech detection.
Recording from the Microphone
Use sounddevice to capture audio from the default microphone. Record either for a fixed duration or until silence is detected. Always record at 16kHz mono — the sample rate Whisper expects.
Install with pip install sounddevice soundfile numpy.
import sounddevice as sd
import numpy as np
import tempfile
import soundfile as sf
SAMPLE_RATE = 16000 # 16kHz mono — optimal for Whisper
DURATION = 5 # seconds
def record_audio(duration=DURATION, sample_rate=SAMPLE_RATE):
print(f'Recording for {duration} seconds...')
audio_data = sd.rec(
int(duration * sample_rate),
samplerate=sample_rate,
channels=1,
dtype='float32'
)
sd.wait() # block until recording finishes
print('Recording complete')
return audio_data, sample_rate
def audio_to_file(audio_data, sample_rate, filepath='/tmp/recording.wav'):
sf.write(filepath, audio_data, sample_rate)
return filepathAll lessons in this course
- Speech-to-Text with Whisper and Deepgram
- Text-to-Speech in Agent Responses
- Building a Voice Conversation Loop
- Latency Optimization for Voice Agents