Prerequisites

Get free API credits by verifying your phone number.

Basic Usage

Generate speech from text:
from fish_audio_sdk import Session, TTSRequest

session = Session("your_api_key")

# Generate and save audio
with open("output.mp3", "wb") as f:
    for chunk in session.tts(TTSRequest(
        text="Hello, world!"
    )):
        f.write(chunk)

Using Voice Models

Specify a voice model for consistent voice generation:
request = TTSRequest(
    text="This is my custom voice",
    reference_id="your_model_id"  # Your model ID from fish.audio
)

with open("custom_voice.mp3", "wb") as f:
    for chunk in session.tts(request):
        f.write(chunk)

Getting Model IDs

The reference_id is the model ID from the URL when viewing a model on Fish Audio:
  • Model URL: https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89
  • Reference ID: 802e3bc2b27e49c2995d23ef70e6ac89
You can also get model IDs programmatically:
# List your models
models = session.list_models(self_only=True)
for model in models.items:
    print(f"{model.title}: {model.id}")

# Get specific model details
model = session.get_model("your_model_id")
print(f"Model: {model.title}, ID: {model.id}")

Emotions

Add emotional expressions to your text:
text = """
(happy) I'm excited to share this!
(sad) Unfortunately, it didn't work out.
(whispering) This is a secret.
"""

request = TTSRequest(text=text, reference_id="model_id")
Common emotions: (happy), (sad), (angry), (excited), (calm), (surprised), (whispering), (shouting), (laughing), (sighing)

Audio Formats

Choose output format based on your needs:
# MP3 (default)
TTSRequest(text="...", format="mp3", mp3_bitrate=192)

# WAV - uncompressed
TTSRequest(text="...", format="wav", sample_rate=44100)

# Opus - efficient for streaming
TTSRequest(text="...", format="opus", opus_bitrate=48)

# PCM - raw audio data
TTSRequest(text="...", format="pcm", sample_rate=16000)

Prosody Control

Adjust speech speed and volume:
from fish_audio_sdk import Prosody

request = TTSRequest(
    text="Adjusted speech",
    prosody=Prosody(
        speed=1.2,  # 20% faster (0.5-2.0)
        volume=5    # Louder (-20 to 20)
    )
)

Advanced Parameters

Fine-tune generation:
request = TTSRequest(
    text="Your text here",
    chunk_length=200,      # Characters per chunk (100-300)
    normalize=True,        # Normalize text
    latency="balanced",    # "normal" or "balanced"
    temperature=0.7,       # Randomness (0.0-1.0)
    top_p=0.7             # Token selection (0.0-1.0)
)

Streaming

Process audio in real-time:
import pyaudio

# Setup playback
p = pyaudio.PyAudio()
stream = p.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=44100,
    output=True
)

# Stream audio
for chunk in session.tts(TTSRequest(
    text="Streaming audio",
    format="pcm",
    sample_rate=44100
)):
    stream.write(chunk)

stream.close()
p.terminate()

Error Handling

Handle common errors:
from fish_audio_sdk.exceptions import HttpCodeErr
import time

def generate_with_retry(request, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = b""
            for chunk in session.tts(request):
                audio += chunk
            return audio
        except HttpCodeErr as e:
            if e.status_code == 429:  # Rate limit
                time.sleep(2 ** attempt)
            elif e.status_code == 401:
                raise Exception("Invalid API key")
            else:
                raise e

Request Parameters

ParameterTypeDescriptionDefault
textstrText to convertRequired
reference_idstrVoice model IDNone
referenceslistReference audio[]
formatstrAudio format”mp3”
chunk_lengthintChunk size (100-300)200
normalizeboolNormalize textTrue
latencystrSpeed vs quality”balanced”
prosodyProsodySpeed/volumeNone
temperaturefloatRandomness0.7
top_pfloatToken selection0.7

Next Steps