Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click “Create New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
Get free API credits by verifying your phone number.

Basic Usage

Generate speech from text:
from fish_audio_sdk import Session, TTSRequest

session = Session("your_api_key")

# Generate and save audio
with open("output.mp3", "wb") as f:
    for chunk in session.tts(TTSRequest(
        text="Hello, world!"
    )):
        f.write(chunk)

Using Voice Models

Specify a voice model for consistent voice generation:
request = TTSRequest(
    text="This is my custom voice",
    reference_id="your_model_id"  # Your model ID from fish.audio
)

with open("custom_voice.mp3", "wb") as f:
    for chunk in session.tts(request):
        f.write(chunk)

Getting Model IDs

The reference_id is the model ID from the URL when viewing a model on Fish Audio:
  • Model URL: https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89
  • Reference ID: 802e3bc2b27e49c2995d23ef70e6ac89
You can also get model IDs programmatically:
# List your models
models = session.list_models(self_only=True)
for model in models.items:
    print(f"{model.title}: {model.id}")

# Get specific model details
model = session.get_model("your_model_id")
print(f"Model: {model.title}, ID: {model.id}")

Emotions

Add emotional expressions to your text:
text = """
(happy) I'm excited to share this!
(sad) Unfortunately, it didn't work out.
(whispering) This is a secret.
"""

request = TTSRequest(text=text, reference_id="model_id")
Common emotions: (happy), (sad), (angry), (excited), (calm), (surprised), (whispering), (shouting), (laughing), (sighing)

Audio Formats

Choose output format based on your needs:
# MP3 (default)
TTSRequest(text="...", format="mp3", mp3_bitrate=192)

# WAV - uncompressed
TTSRequest(text="...", format="wav", sample_rate=44100)

# Opus - efficient for streaming
TTSRequest(text="...", format="opus", opus_bitrate=48)

# PCM - raw audio data
TTSRequest(text="...", format="pcm", sample_rate=16000)

Prosody Control

Adjust speech speed and volume:
from fish_audio_sdk import Prosody

request = TTSRequest(
    text="Adjusted speech",
    prosody=Prosody(
        speed=1.2,  # 20% faster (0.5-2.0)
        volume=5    # Louder (-20 to 20)
    )
)

Advanced Parameters

Fine-tune generation:
request = TTSRequest(
    text="Your text here",
    chunk_length=200,      # Characters per chunk (100-300)
    normalize=True,        # Normalize text
    latency="balanced",    # "normal" or "balanced"
    temperature=0.7,       # Randomness (0.0-1.0)
    top_p=0.7             # Token selection (0.0-1.0)
)

Streaming

Process audio in real-time:
import pyaudio

# Setup playback
p = pyaudio.PyAudio()
stream = p.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=44100,
    output=True
)

# Stream audio
for chunk in session.tts(TTSRequest(
    text="Streaming audio",
    format="pcm",
    sample_rate=44100
)):
    stream.write(chunk)

stream.close()
p.terminate()

Error Handling

Handle common errors:
from fish_audio_sdk.exceptions import HttpCodeErr
import time

def generate_with_retry(request, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = b""
            for chunk in session.tts(request):
                audio += chunk
            return audio
        except HttpCodeErr as e:
            if e.status_code == 429:  # Rate limit
                time.sleep(2 ** attempt)
            elif e.status_code == 401:
                raise Exception("Invalid API key")
            else:
                raise e

Request Parameters

ParameterTypeDescriptionDefault
textstrText to convertRequired
reference_idstrVoice model IDNone
referenceslistReference audio[]
formatstrAudio format”mp3”
chunk_lengthintChunk size (100-300)200
normalizeboolNormalize textTrue
latencystrSpeed vs quality”balanced”
prosodyProsodySpeed/volumeNone
temperaturefloatRandomness0.7
top_pfloatToken selection0.7

Next Steps

I