Skip to main content
Pipecat is an open source framework for building voice and multimodal conversational AI. It handles the orchestration of audio, AI services, and conversation pipelines so you can focus on what makes your agent unique. Fish Audio integrates with Pipecat through FishAudioTTSService, which provides real-time text-to-speech synthesis using WebSocket streaming for low-latency conversational applications.

Prerequisites

Installation

Install Pipecat with Fish Audio support:
pip install "pipecat-ai[fish]"

Configuration

Set your Fish Audio API key as an environment variable:
export FISH_API_KEY=your_api_key_here

Basic usage

Add FishAudioTTSService to your Pipecat pipeline:
from pipecat.services.fish import FishAudioTTSService

tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    reference_id="your_voice_model_id",  # Optional: use a specific voice
    model_id="s1",
    params=FishAudioTTSService.InputParams(
        latency="normal",
        prosody_speed=1.0
    )
)

Key parameters

ParameterDescription
api_keyYour Fish Audio API key
reference_idVoice model ID from the Fish Audio library
model_idTTS model version (default: s1)
output_formatAudio format: pcm, mp3, wav, or opus

Prosody controls

Customize speech characteristics with InputParams:
params=FishAudioTTSService.InputParams(
    latency="balanced",      # "normal" or "balanced"
    prosody_speed=1.2,       # 0.5 to 2.0
    prosody_volume=0,        # Volume adjustment in dB
    normalize=True           # Audio normalization
)

Resources