Documentation Index
Fetch the complete documentation index at: https://docs.fish.audio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Transform any text into natural, expressive speech using Fish Audio’s advanced TTS models. Choose from pre-made voices or use your own cloned voices.
Discover the world’s best cloned voices models on our Discovery page.
Quick Start
Web Interface
The easiest way to generate speech:
Enter Your Text
Type or paste the text you want to convert
Choose a Voice
Select from available voices or use your own
Generate
Click “Generate” and download your audio
Using the SDK
Install the SDK
pip install fish-audio-sdk
Basic Usage
Generate speech with just a few lines of code:from fishaudio import FishAudio
from fishaudio.utils import save
# Initialize client
client = FishAudio(api_key="your_api_key_here")
# Generate speech
audio = client.tts.convert(
text="Hello, world!",
reference_id="your_voice_model_id"
)
save(audio, "output.mp3")
print("✓ Audio saved to output.mp3")
Basic Usage
Generate speech with just a few lines of code:import { FishAudioClient } from "fish-audio";
import { writeFile } from "fs/promises";
// Initialize session
const fishAudio = new FishAudioClient({ apiKey: "your_api_key_here" });
const audio = await fishAudio.textToSpeech.convert({
text: "Hello, world!",
reference_id: "your_voice_model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile("output.mp3", buffer);
console.log("✓ Audio saved to output.mp3");
Voice Options
Using Pre-made Voices
Browse and select voices from the playground:
# Use a voice from the playground
audio = client.tts.convert(
text="Welcome to Fish Audio!",
reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
)
# Use a voice from the playground
const audio = await fishAudio.textToSpeech.convert({
text: "Welcome to Fish Audio!",
reference_id: "7f92f8afb8ec43bf81429cc1c9199cb1",
});
Using Your Cloned Voice
Use voices you’ve created:
# Use your own cloned voice
audio = client.tts.convert(
text="This is my custom voice speaking",
reference_id="your_model_id"
)
# Use your own cloned voice
const audio = await fishAudio.textToSpeech.convert({
text: "This is my custom voice speaking",
reference_id: "your_model_id",
});
Using Reference Audio
Provide reference audio directly:
from fishaudio.types import ReferenceAudio
# Use reference audio on-the-fly
with open("voice_sample.wav", "rb") as f:
audio = client.tts.convert(
text="Hello from reference audio",
references=[
ReferenceAudio(
audio=f.read(),
text="Sample text from the audio"
)
]
)
// Use reference audio on-the-fly
const fileBuffer = await readFile("voice_sample.wav");
const voiceFile = new File([fileBuffer], "voice_sample.wav");
const audio = await fishAudio.textToSpeech.convert({
text: "Hello from reference audio",
references: [
{ audio: voiceFile, text: "Sample text from the audio" }
]
});
Model Selection
Choose the right model for your needs:
| Model | Best For | Quality | Speed |
|---|
| s1 | Prototyping | Excellent | Fast |
| s2-pro | Latest features | Excellent | Fastest |
Specify a model in your request:
# Using the latest model (default)
audio = client.tts.convert(text="Hello world")
// Using the latest S2-Pro model
const audio = await fishAudio.textToSpeech.convert(
{ text: "Hello world" },
"s2-pro"
);
Advanced Options
Choose your output format:
audio = client.tts.convert(
text="Your text here",
format="mp3", # Options: "mp3", "wav", "pcm", "opus"
mp3_bitrate=128 # For MP3: 64, 128, or 192
)
const audio = await fishAudio.textToSpeech.convert({
text: "Your text here",
format: "mp3", // Options: "mp3", "wav", "pcm", "opus"
mp3_bitrate: 128, // For MP3: 64, 128, or 192
});
Chunk Length
Control text processing chunks:
audio = client.tts.convert(
text="Long text content...",
chunk_length=200 # 100-300 characters per chunk
)
const audio = await fishAudio.textToSpeech.convert({
text: "Long text content...",
chunk_length: 200, // 100-300 characters per chunk
});
Latency Mode
Optimize for speed or quality:
audio = client.tts.convert(
text="Quick response needed",
latency="balanced" # "normal" or "balanced"
)
const audio = await fishAudio.textToSpeech.convert({
text: "Quick response needed",
latency: "balanced", // "normal" or "balanced"
});
Balanced mode reduces latency to ~300ms but may slightly decrease stability.
Direct API Usage
For direct API calls without the SDK:
import httpx
import ormsgpack
# Prepare request
request_data = {
"text": "Hello, world!",
"reference_id": "your_model_id",
"format": "mp3"
}
# Make API call
with httpx.Client() as client:
response = client.post(
"https://api.fish.audio/v1/tts",
content=ormsgpack.packb(request_data),
headers={
"authorization": "Bearer YOUR_API_KEY",
"content-type": "application/msgpack",
"model": "s2-pro"
}
)
# Save audio
with open("output.mp3", "wb") as f:
f.write(response.content)
import { encode } from "@msgpack/msgpack";
import { writeFile } from "fs/promises";
const body = encode({
text: "Hello, world!",
reference_id: "your_model_id",
format: "mp3",
});
const res = await fetch("https://api.fish.audio/v1/tts", {
method: "POST",
headers: {
Authorization: "Bearer <YOUR_API_KEY>",
"Content-Type": "application/msgpack",
model: "s2-pro",
},
body,
});
const buffer = Buffer.from(await res.arrayBuffer());
await writeFile("output.mp3", buffer);
Streaming Audio
Stream audio for real-time applications:
# Stream audio chunks
audio_stream = client.tts.stream(
text="Streaming this text in real-time",
reference_id="model_id"
)
with open("stream_output.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)
# Process chunk immediately for real-time playback
// Use a Websocket to stream real-time audio
import { FishAudioClient, RealtimeEvents } from "fish-audio";
import { writeFile } from "fs/promises";
import path from "path";
// Simple async generator that yields text chunks
async function* makeTextStream() {
const chunks = [
"Hello from Fish Audio! ",
"This is a realtime text-to-speech test. ",
"We are streaming multiple chunks over WebSocket.",
];
for (const chunk of chunks) {
yield chunk;
}
}
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };
const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());
// Collect audio and write to a file when the stream ends
const chunks: Buffer[] = [];
connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
chunks.push(Buffer.from(audio));
}
});
connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
connection.on(RealtimeEvents.CLOSE, async () => {
const outPath = path.resolve(process.cwd(), "out.mp3");
await writeFile(outPath, Buffer.concat(chunks));
console.log("Saved to", outPath);
});
Streaming with Timestamps
Use the Text to Speech Stream with Timestamps API when you need generated audio and alignment data in the same stream. This endpoint returns Server-Sent Events where each event includes an audio_base64 chunk and, when available, the latest cumulative alignment snapshot for a chunk_seq. Clients should concatenate audio chunks in arrival order and replace stored alignment snapshots by chunk_seq.
Timestamped streaming is best for karaoke-style highlighting, synchronized
captions, phrase progress indicators, and timeline editing. For this endpoint,
prefer opus over mp3 when possible because Opus provides cleaner streaming
boundaries for alignment.
Adding Emotions
The (parenthesis) syntax below applies to the S1 model. S2 uses [bracket] syntax with natural language descriptions and is not limited to a fixed set of tags. See the Models Overview for details.
Make your speech more expressive:
# Add emotion markers to your text
emotional_text = """
(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!
"""
audio = client.tts.convert(
text=emotional_text,
reference_id="model_id"
)
// Add emotion markers to your text
const emotionalText = `(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!`;
const audio = await fishAudio.textToSpeech.convert({
text: emotionalText,
reference_id: "model_id",
});
Available emotions:
- Basic:
(happy), (sad), (angry), (excited), (calm)
- Tones:
(shouting), (whispering), (soft tone)
- Effects:
(laughing), (sighing), (crying)
For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see Fine-grained Control.
Best Practices
Text Preparation
Do:
- Use proper punctuation for natural pauses
- Add emotion markers for expression
- Break long texts into paragraphs
- Use consistent formatting
Don’t:
- Use ALL CAPS (unless shouting)
- Mix multiple languages randomly
- Include special characters unnecessarily
- Forget punctuation
- Batch Processing: Process multiple texts efficiently
- Cache Models: Store frequently used model IDs
- Optimize Chunk Size: Use 200 characters for best balance
- Handle Errors: Implement retry logic for network issues
Quality Optimization
For best results:
- Use high-quality reference audio for cloning
- Choose appropriate emotion markers
- Test different latency modes
- Monitor API rate limits
Troubleshooting
Common Issues
No audio output:
- Check API key validity
- Verify model ID exists
- Ensure proper audio format
Poor quality:
- Use better reference audio
- Try normal latency mode
- Check text formatting
Slow generation:
- Use balanced latency mode
- Reduce chunk length
- Check network connection
Code Examples
Batch Processing
from fishaudio.utils import save
texts = [
"First announcement",
"Second announcement",
"Third announcement"
]
for i, text in enumerate(texts):
audio = client.tts.convert(
text=text,
reference_id="model_id"
)
save(audio, f"output_{i}.mp3")
const texts = [
"First announcement",
"Second announcement",
"Third announcement",
];
for (let i = 0; i < texts.length; i++) {
const audio = await fishAudio.textToSpeech.convert({
text: texts[i],
reference_id: "model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile(`output_${i}.mp3`, buffer);
}
Error Handling
import time
from fishaudio.exceptions import FishAudioError
def generate_with_retry(text, max_retries=3):
for attempt in range(max_retries):
try:
audio = client.tts.convert(
text=text,
reference_id="model_id"
)
return audio
except FishAudioError as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise e
async function generateWithRetry(text, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const audio = await fishAudio.textToSpeech.convert({
text,
reference_id: "model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
return buffer;
} catch (err) {
if (attempt < maxRetries - 1) {
const delayMs = 2 ** attempt * 1000;
await new Promise((r) => setTimeout(r, delayMs));
} else {
throw err;
}
}
}
}
const buffer = await generateWithRetry("Hello with retry");
await writeFile("retry_output.mp3", buffer);
API Reference
Request Parameters
| Parameter | Type | Description | Default |
|---|
| text | string | Text to convert | Required |
| reference_id | string | Model/voice ID | None |
| format | string | Audio format | ”mp3” |
| chunk_length | integer | Characters per chunk | 200 |
| normalize | boolean | Normalize text | true |
| latency | string | Speed vs quality | ”normal” |
Response
Returns audio data in the specified format as binary stream.
Get Support
Need help with text-to-speech?