Skip to main content
LiveKit Agents is an open source framework for building real-time voice and multimodal AI agents. It handles streaming audio pipelines, turn detection, interruptions, and LLM orchestration so you can focus on your agent’s behavior. Fish Audio integrates with LiveKit through the fishaudio plugin, providing text-to-speech synthesis with support for both chunked and real-time WebSocket streaming modes.

Prerequisites

Installation

Install LiveKit Agents with Fish Audio support:
pip install "livekit-agents[fishaudio]"

Configuration

Set your Fish Audio API key as an environment variable:
export FISH_API_KEY=your_api_key_here

Basic usage

Add Fish Audio TTS to your LiveKit agent:
from livekit.plugins.fishaudio import TTS

tts = TTS(
    reference_id="your_voice_model_id",  # Optional: use a specific voice
    model="s1",
    sample_rate=24000,
    latency_mode="balanced"
)

Key parameters

ParameterDescription
api_keyYour Fish Audio API key (or use FISH_API_KEY env var)
modelTTS model/backend to use (default: s1)
reference_idVoice model ID from the Fish Audio library
output_formatAudio format: pcm, mp3, wav, or opus (default: pcm)
sample_rateAudio sample rate in Hz (default: 24000)
num_channelsNumber of audio channels (default: 1)
base_urlCustom API endpoint (default: https://api.fish.audio)
latency_modenormal (~500ms) or balanced (~300ms, default)

Streaming modes

The plugin supports two synthesis modes:
# Chunked (non-streaming) synthesis
stream = tts.synthesize("Hello, world!")

# Real-time WebSocket streaming
stream = tts.stream()

Resources