Text to Speech

Prerequisites

Create a Fish Audio account

Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section

Get your API key

Once you have an account, you’ll need an API key to authenticate your requests.

Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely

Keep your API key secret! Never commit it to version control or share it publicly.

Basic Usage

Generate speech from text:

from fish_audio_sdk import Session, TTSRequest

session = Session("your_api_key")

# Generate and save audio
with open("output.mp3", "wb") as f:
    for chunk in session.tts(TTSRequest(
        text="Hello, world!"
    )):
        f.write(chunk)

Using Voice Models

Specify a voice model for consistent voice generation:

request = TTSRequest(
    text="This is my custom voice",
    reference_id="your_model_id"  # Your model ID from fish.audio
)

with open("custom_voice.mp3", "wb") as f:
    for chunk in session.tts(request):
        f.write(chunk)

Getting Model IDs

The reference_id is the model ID from the URL when viewing a model on Fish Audio:

Model URL: https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89
Reference ID: 802e3bc2b27e49c2995d23ef70e6ac89

You can also get model IDs programmatically:

# List your models
models = session.list_models(self_only=True)
for model in models.items:
    print(f"{model.title}: {model.id}")

# Get specific model details
model = session.get_model("your_model_id")
print(f"Model: {model.title}, ID: {model.id}")

Emotions

Add emotional expressions to your text:

text = """
(happy) I'm excited to share this!
(sad) Unfortunately, it didn't work out.
(whispering) This is a secret.
"""

request = TTSRequest(text=text, reference_id="model_id")

Common emotions: (happy), (sad), (angry), (excited), (calm), (surprised), (whispering), (shouting), (laughing), (sighing) For more advanced control over speech generation, including phoneme-level control and additional paralanguage features, see Fine-grained Control.

Audio Formats

Choose output format based on your needs:

# MP3 (default)
TTSRequest(text="...", format="mp3", mp3_bitrate=192)

# WAV - uncompressed
TTSRequest(text="...", format="wav", sample_rate=44100)

# Opus - efficient for streaming
TTSRequest(text="...", format="opus", opus_bitrate=48)

# PCM - raw audio data
TTSRequest(text="...", format="pcm", sample_rate=16000)

Prosody Control

Adjust speech speed and volume:

from fish_audio_sdk import Prosody

request = TTSRequest(
    text="Adjusted speech",
    prosody=Prosody(
        speed=1.2,  # 20% faster (0.5-2.0)
        volume=5    # Louder (-20 to 20)
    )
)

Advanced Parameters

Fine-tune generation:

request = TTSRequest(
    text="Your text here",
    chunk_length=200,      # Characters per chunk (100-300)
    normalize=True,        # Normalize text
    latency="balanced",    # "normal" or "balanced"
    temperature=0.7,       # Randomness (0.0-1.0)
    top_p=0.7             # Token selection (0.0-1.0)
)

Streaming

Process audio in real-time:

import pyaudio

# Setup playback
p = pyaudio.PyAudio()
stream = p.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=44100,
    output=True
)

# Stream audio
for chunk in session.tts(TTSRequest(
    text="Streaming audio",
    format="pcm",
    sample_rate=44100
)):
    stream.write(chunk)

stream.close()
p.terminate()

Error Handling

Handle common errors:

from fish_audio_sdk.exceptions import HttpCodeErr
import time

def generate_with_retry(request, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = b""
            for chunk in session.tts(request):
                audio += chunk
            return audio
        except HttpCodeErr as e:
            if e.status_code == 429:  # Rate limit
                time.sleep(2 ** attempt)
            elif e.status_code == 401:
                raise Exception("Invalid API key")
            else:
                raise e

Request Parameters

Parameter	Type	Description	Default
`text`	str	Text to convert	Required
`reference_id`	str	Voice model ID	None
`references`	list	Reference audio	[]
`format`	str	Audio format	”mp3”
`chunk_length`	int	Chunk size (100-300)	200
`normalize`	bool	Normalize text	True
`latency`	str	Speed vs quality	”balanced”
`prosody`	Prosody	Speed/volume	None
`temperature`	float	Randomness	0.7
`top_p`	float	Token selection	0.7

Next Steps

Fine-grained control for phoneme-level control and paralanguage
Voice cloning for custom voices
WebSocket streaming for real-time apps
Best practices for production use
API reference for direct API calls

Getting Started

Models & Pricing

Core Features

API Reference

SDKs & Tools

Self-Hosting

Other

Prerequisites

Basic Usage

Using Voice Models

Getting Model IDs

Emotions

Audio Formats

Prosody Control

Advanced Parameters

Streaming

Error Handling

Request Parameters

Next Steps

Getting Started

Models & Pricing

Core Features

API Reference

SDKs & Tools

Self-Hosting

Other

​Prerequisites

​Basic Usage

​Using Voice Models

​Getting Model IDs

​Emotions

​Audio Formats

​Prosody Control

​Advanced Parameters

​Streaming

​Error Handling

​Request Parameters

​Next Steps

Prerequisites

Basic Usage

Using Voice Models

Getting Model IDs

Emotions

Audio Formats

Prosody Control

Advanced Parameters

Streaming

Error Handling

Request Parameters

Next Steps