Overview
Transform any text into natural, expressive speech using Fish Audio’s advanced TTS models. Choose from pre-made voices or use your own cloned voices.
Discover the world’s best cloned voices models on our Discovery page.
Quick Start
Web Interface
The easiest way to generate speech:
Enter Your Text
Type or paste the text you want to convert
Choose a Voice
Select from available voices or use your own
Generate
Click “Generate” and download your audio
Using the SDK
Install the SDK
pip install fish-audio-sdk
Basic Usage
Generate speech with just a few lines of code:from fishaudio import FishAudio
from fishaudio.utils import save
# Initialize client
client = FishAudio(api_key="your_api_key_here")
# Generate speech
audio = client.tts.convert(
text="Hello, world!",
reference_id="your_voice_model_id"
)
save(audio, "output.mp3")
print("✓ Audio saved to output.mp3")
Basic Usage
Generate speech with just a few lines of code:import { FishAudioClient } from "fish-audio";
import { writeFile } from "fs/promises";
// Initialize session
const fishAudio = new FishAudioClient({ apiKey: "your_api_key_here" });
const audio = await fishAudio.textToSpeech.convert({
text: "Hello, world!",
reference_id: "your_voice_model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile("output.mp3", buffer);
console.log("✓ Audio saved to output.mp3");
Voice Options
Using Pre-made Voices
Browse and select voices from the playground:
# Use a voice from the playground
audio = client.tts.convert(
text="Welcome to Fish Audio!",
reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
)
# Use a voice from the playground
const audio = await fishAudio.textToSpeech.convert({
text: "Welcome to Fish Audio!",
reference_id: "7f92f8afb8ec43bf81429cc1c9199cb1",
});
Using Your Cloned Voice
Use voices you’ve created:
# Use your own cloned voice
audio = client.tts.convert(
text="This is my custom voice speaking",
reference_id="your_model_id"
)
# Use your own cloned voice
const audio = await fishAudio.textToSpeech.convert({
text: "This is my custom voice speaking",
reference_id: "your_model_id",
});
Using Reference Audio
Provide reference audio directly:
from fishaudio.types import ReferenceAudio
# Use reference audio on-the-fly
with open("voice_sample.wav", "rb") as f:
audio = client.tts.convert(
text="Hello from reference audio",
references=[
ReferenceAudio(
audio=f.read(),
text="Sample text from the audio"
)
]
)
// Use reference audio on-the-fly
const fileBuffer = await readFile("voice_sample.wav");
const voiceFile = new File([fileBuffer], "voice_sample.wav");
const audio = await fishAudio.textToSpeech.convert({
text: "Hello from reference audio",
references: [
{ audio: voiceFile, text: "Sample text from the audio" }
]
});
Model Selection
Choose the right model for your needs:
| Model | Best For | Quality | Speed |
|---|
| s1 | Latest features | Excellent | Fast |
| speech-1.6 | Stable production | Very Good | Fast |
| speech-1.5 | Legacy support | Good | Fastest |
Specify a model in your request:
# Using the latest model (default)
audio = client.tts.convert(text="Hello world")
// Using the latest S1 model
const audio = await fishAudio.textToSpeech.convert(
{ text: "Hello world" },
"s1"
);
Advanced Options
Choose your output format:
audio = client.tts.convert(
text="Your text here",
format="mp3", # Options: "mp3", "wav", "pcm", "opus"
mp3_bitrate=128 # For MP3: 64, 128, or 192
)
const audio = await fishAudio.textToSpeech.convert({
text: "Your text here",
format: "mp3", // Options: "mp3", "wav", "pcm", "opus"
mp3_bitrate: 128, // For MP3: 64, 128, or 192
});
Chunk Length
Control text processing chunks:
audio = client.tts.convert(
text="Long text content...",
chunk_length=200 # 100-300 characters per chunk
)
const audio = await fishAudio.textToSpeech.convert({
text: "Long text content...",
chunk_length: 200, // 100-300 characters per chunk
});
Latency Mode
Optimize for speed or quality:
audio = client.tts.convert(
text="Quick response needed",
latency="balanced" # "normal" or "balanced"
)
const audio = await fishAudio.textToSpeech.convert({
text: "Quick response needed",
latency: "balanced", // "normal" or "balanced"
});
Balanced mode reduces latency to ~300ms but may slightly decrease stability.
Direct API Usage
For direct API calls without the SDK:
import httpx
import ormsgpack
# Prepare request
request_data = {
"text": "Hello, world!",
"reference_id": "your_model_id",
"format": "mp3"
}
# Make API call
with httpx.Client() as client:
response = client.post(
"https://api.fish.audio/v1/tts",
content=ormsgpack.packb(request_data),
headers={
"authorization": "Bearer YOUR_API_KEY",
"content-type": "application/msgpack",
"model": "s1"
}
)
# Save audio
with open("output.mp3", "wb") as f:
f.write(response.content)
import { encode } from "@msgpack/msgpack";
import { writeFile } from "fs/promises";
const body = encode({
text: "Hello, world!",
reference_id: "your_model_id",
format: "mp3",
});
const res = await fetch("https://api.fish.audio/v1/tts", {
method: "POST",
headers: {
Authorization: "Bearer <YOUR_API_KEY>",
"Content-Type": "application/msgpack",
model: "s1",
},
body,
});
const buffer = Buffer.from(await res.arrayBuffer());
await writeFile("output.mp3", buffer);
Streaming Audio
Stream audio for real-time applications:
# Stream audio chunks
audio_stream = client.tts.stream(
text="Streaming this text in real-time",
reference_id="model_id"
)
with open("stream_output.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)
# Process chunk immediately for real-time playback
// Use a Websocket to stream real-time audio
import { FishAudioClient, RealtimeEvents } from "fish-audio";
import { writeFile } from "fs/promises";
import path from "path";
// Simple async generator that yields text chunks
async function* makeTextStream() {
const chunks = [
"Hello from Fish Audio! ",
"This is a realtime text-to-speech test. ",
"We are streaming multiple chunks over WebSocket.",
];
for (const chunk of chunks) {
yield chunk;
}
}
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };
const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());
// Collect audio and write to a file when the stream ends
const chunks: Buffer[] = [];
connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
chunks.push(Buffer.from(audio));
}
});
connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
connection.on(RealtimeEvents.CLOSE, async () => {
const outPath = path.resolve(process.cwd(), "out.mp3");
await writeFile(outPath, Buffer.concat(chunks));
console.log("Saved to", outPath);
});
Adding Emotions
Make your speech more expressive:
# Add emotion markers to your text
emotional_text = """
(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!
"""
audio = client.tts.convert(
text=emotional_text,
reference_id="model_id"
)
// Add emotion markers to your text
const emotionalText = `(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!`;
const audio = await fishAudio.textToSpeech.convert({
text: emotionalText,
reference_id: "model_id",
});
Available emotions:
- Basic:
(happy), (sad), (angry), (excited), (calm)
- Tones:
(shouting), (whispering), (soft tone)
- Effects:
(laughing), (sighing), (crying)
For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see Fine-grained Control.
Best Practices
Text Preparation
Do:
- Use proper punctuation for natural pauses
- Add emotion markers for expression
- Break long texts into paragraphs
- Use consistent formatting
Don’t:
- Use ALL CAPS (unless shouting)
- Mix multiple languages randomly
- Include special characters unnecessarily
- Forget punctuation
- Batch Processing: Process multiple texts efficiently
- Cache Models: Store frequently used model IDs
- Optimize Chunk Size: Use 200 characters for best balance
- Handle Errors: Implement retry logic for network issues
Quality Optimization
For best results:
- Use high-quality reference audio for cloning
- Choose appropriate emotion markers
- Test different latency modes
- Monitor API rate limits
Troubleshooting
Common Issues
No audio output:
- Check API key validity
- Verify model ID exists
- Ensure proper audio format
Poor quality:
- Use better reference audio
- Try normal latency mode
- Check text formatting
Slow generation:
- Use balanced latency mode
- Reduce chunk length
- Check network connection
Code Examples
Batch Processing
from fishaudio.utils import save
texts = [
"First announcement",
"Second announcement",
"Third announcement"
]
for i, text in enumerate(texts):
audio = client.tts.convert(
text=text,
reference_id="model_id"
)
save(audio, f"output_{i}.mp3")
const texts = [
"First announcement",
"Second announcement",
"Third announcement",
];
for (let i = 0; i < texts.length; i++) {
const audio = await fishAudio.textToSpeech.convert({
text: texts[i],
reference_id: "model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile(`output_${i}.mp3`, buffer);
}
Error Handling
import time
from fishaudio.exceptions import FishAudioError
def generate_with_retry(text, max_retries=3):
for attempt in range(max_retries):
try:
audio = client.tts.convert(
text=text,
reference_id="model_id"
)
return audio
except FishAudioError as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise e
async function generateWithRetry(text, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const audio = await fishAudio.textToSpeech.convert({
text,
reference_id: "model_id",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
return buffer;
} catch (err) {
if (attempt < maxRetries - 1) {
const delayMs = 2 ** attempt * 1000;
await new Promise((r) => setTimeout(r, delayMs));
} else {
throw err;
}
}
}
}
const buffer = await generateWithRetry("Hello with retry");
await writeFile("retry_output.mp3", buffer);
API Reference
Request Parameters
| Parameter | Type | Description | Default |
|---|
| text | string | Text to convert | Required |
| reference_id | string | Model/voice ID | None |
| format | string | Audio format | ”mp3” |
| chunk_length | integer | Characters per chunk | 200 |
| normalize | boolean | Normalize text | true |
| latency | string | Speed vs quality | ”normal” |
Response
Returns audio data in the specified format as binary stream.
Get Support
Need help with text-to-speech?