Text to Speech

Overview

Transform any text into natural, expressive speech using Fish Audio’s advanced TTS models. Choose from pre-made voices or use your own cloned voices. Discover the world’s best cloned voices models on our Discovery page.

Quick Start

Web Interface

The easiest way to generate speech:

Visit Playground

Go to fish.audio and log in

Enter Your Text

Type or paste the text you want to convert

Choose a Voice

Select from available voices or use your own

Generate

Click “Generate” and download your audio

Using the SDK

Python
JavaScript

Install the SDK

pip install fish-audio-sdk

Basic Usage

Generate speech with just a few lines of code:

from fishaudio import FishAudio
from fishaudio.utils import save

# Initialize client
client = FishAudio(api_key="your_api_key_here")

# Generate speech
audio = client.tts.convert(
    text="Hello, world!",
    reference_id="your_voice_model_id"
)
save(audio, "output.mp3")

print("✓ Audio saved to output.mp3")

Install the SDK

npm install fish-audio

Basic Usage

Generate speech with just a few lines of code:

import { FishAudioClient } from "fish-audio";
import { writeFile } from "fs/promises";

// Initialize session
const fishAudio = new FishAudioClient({ apiKey: "your_api_key_here" });

const audio = await fishAudio.textToSpeech.convert({
    text: "Hello, world!",
    reference_id: "your_voice_model_id",
});

const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile("output.mp3", buffer);

console.log("✓ Audio saved to output.mp3");

Voice Options

Using Pre-made Voices

Browse and select voices from the playground:

Python
JavaScript

# Use a voice from the playground
audio = client.tts.convert(
    text="Welcome to Fish Audio!",
    reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
)

# Use a voice from the playground
const audio = await fishAudio.textToSpeech.convert({
    text: "Welcome to Fish Audio!",
    reference_id: "7f92f8afb8ec43bf81429cc1c9199cb1",
});

Using Your Cloned Voice

Use voices you’ve created:

Python
JavaScript

# Use your own cloned voice
audio = client.tts.convert(
    text="This is my custom voice speaking",
    reference_id="your_model_id"
)

# Use your own cloned voice
const audio = await fishAudio.textToSpeech.convert({
    text: "This is my custom voice speaking",
    reference_id: "your_model_id",
});

Using Reference Audio

Provide reference audio directly:

Python
JavaScript

from fishaudio.types import ReferenceAudio

# Use reference audio on-the-fly
with open("voice_sample.wav", "rb") as f:
    audio = client.tts.convert(
        text="Hello from reference audio",
        references=[
            ReferenceAudio(
                audio=f.read(),
                text="Sample text from the audio"
            )
        ]
    )

// Use reference audio on-the-fly
const fileBuffer = await readFile("voice_sample.wav");
const voiceFile = new File([fileBuffer], "voice_sample.wav");

const audio = await fishAudio.textToSpeech.convert({
    text: "Hello from reference audio",
    references: [
        { audio: voiceFile, text: "Sample text from the audio" }
    ]
});

Model Selection

Choose the right model for your needs:

Model	Best For	Quality	Speed
s1	Latest features	Excellent	Fast
speech-1.6	Stable production	Very Good	Fast
speech-1.5	Legacy support	Good	Fastest

Specify a model in your request:

Python
JavaScript

# Using the latest model (default)
audio = client.tts.convert(text="Hello world")

// Using the latest S1 model
const audio = await fishAudio.textToSpeech.convert(
    { text: "Hello world" },
    "s1"
);

Advanced Options

Audio Formats

Choose your output format:

Python
JavaScript

audio = client.tts.convert(
    text="Your text here",
    format="mp3",  # Options: "mp3", "wav", "pcm", "opus"
    mp3_bitrate=128  # For MP3: 64, 128, or 192
)

const audio = await fishAudio.textToSpeech.convert({
    text: "Your text here",
    format: "mp3", // Options: "mp3", "wav", "pcm", "opus"
    mp3_bitrate: 128, // For MP3: 64, 128, or 192
});

Chunk Length

Control text processing chunks:

Python
JavaScript

audio = client.tts.convert(
    text="Long text content...",
    chunk_length=200  # 100-300 characters per chunk
)

const audio = await fishAudio.textToSpeech.convert({
    text: "Long text content...",
    chunk_length: 200, // 100-300 characters per chunk
});

Latency Mode

Optimize for speed or quality:

Python
JavaScript

audio = client.tts.convert(
    text="Quick response needed",
    latency="balanced"  # "normal" or "balanced"
)

const audio = await fishAudio.textToSpeech.convert({
    text: "Quick response needed",
    latency: "balanced", // "normal" or "balanced"
});

Balanced mode reduces latency to ~300ms but may slightly decrease stability.

Direct API Usage

For direct API calls without the SDK:

Python
JavaScript

import httpx
import ormsgpack

# Prepare request
request_data = {
    "text": "Hello, world!",
    "reference_id": "your_model_id",
    "format": "mp3"
}

# Make API call
with httpx.Client() as client:
    response = client.post(
        "https://api.fish.audio/v1/tts",
        content=ormsgpack.packb(request_data),
        headers={
            "authorization": "Bearer YOUR_API_KEY",
            "content-type": "application/msgpack",
            "model": "s1"
        }
    )
    
    # Save audio
    with open("output.mp3", "wb") as f:
        f.write(response.content)

import { encode } from "@msgpack/msgpack";
import { writeFile } from "fs/promises";

const body = encode({
    text: "Hello, world!",
    reference_id: "your_model_id",
    format: "mp3",
});

const res = await fetch("https://api.fish.audio/v1/tts", {
    method: "POST",
    headers: {
        Authorization: "Bearer <YOUR_API_KEY>",
        "Content-Type": "application/msgpack",
        model: "s1",
    },
    body,
});

const buffer = Buffer.from(await res.arrayBuffer());
await writeFile("output.mp3", buffer);

Streaming Audio

Stream audio for real-time applications:

Python
JavaScript

# Stream audio chunks
audio_stream = client.tts.stream(
    text="Streaming this text in real-time",
    reference_id="model_id"
)

with open("stream_output.mp3", "wb") as f:
    for chunk in audio_stream:
        f.write(chunk)
        # Process chunk immediately for real-time playback

// Use a Websocket to stream real-time audio

import { FishAudioClient, RealtimeEvents } from "fish-audio";
import { writeFile } from "fs/promises";
import path from "path";

// Simple async generator that yields text chunks
async function* makeTextStream() {
    const chunks = [
        "Hello from Fish Audio! ",
        "This is a realtime text-to-speech test. ",
        "We are streaming multiple chunks over WebSocket.",
    ];
    for (const chunk of chunks) {
        yield chunk;
    }
}

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };

const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());

// Collect audio and write to a file when the stream ends
const chunks: Buffer[] = [];
connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
    if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
        chunks.push(Buffer.from(audio));
    }
});
connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
connection.on(RealtimeEvents.CLOSE, async () => {
    const outPath = path.resolve(process.cwd(), "out.mp3");
    await writeFile(outPath, Buffer.concat(chunks));
    console.log("Saved to", outPath);
});

Adding Emotions

Make your speech more expressive:

Python
JavaScript

# Add emotion markers to your text
emotional_text = """
(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!
"""

audio = client.tts.convert(
    text=emotional_text,
    reference_id="model_id"
)

// Add emotion markers to your text
const emotionalText = `(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!`;

const audio = await fishAudio.textToSpeech.convert({
    text: emotionalText,
    reference_id: "model_id",
});

Available emotions:

Basic: (happy), (sad), (angry), (excited), (calm)
Tones: (shouting), (whispering), (soft tone)
Effects: (laughing), (sighing), (crying)

For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see Fine-grained Control.

Best Practices

Text Preparation

Do:

Use proper punctuation for natural pauses
Add emotion markers for expression
Break long texts into paragraphs
Use consistent formatting

Don’t:

Use ALL CAPS (unless shouting)
Mix multiple languages randomly
Include special characters unnecessarily
Forget punctuation

Performance Tips

Batch Processing: Process multiple texts efficiently
Cache Models: Store frequently used model IDs
Optimize Chunk Size: Use 200 characters for best balance
Handle Errors: Implement retry logic for network issues

Quality Optimization

For best results:

Use high-quality reference audio for cloning
Choose appropriate emotion markers
Test different latency modes
Monitor API rate limits

Troubleshooting

Common Issues

No audio output:

Check API key validity
Verify model ID exists
Ensure proper audio format

Poor quality:

Use better reference audio
Try normal latency mode
Check text formatting

Slow generation:

Use balanced latency mode
Reduce chunk length
Check network connection

Code Examples

Batch Processing

Python
JavaScript

from fishaudio.utils import save

texts = [
    "First announcement",
    "Second announcement",
    "Third announcement"
]

for i, text in enumerate(texts):
    audio = client.tts.convert(
        text=text,
        reference_id="model_id"
    )
    save(audio, f"output_{i}.mp3")

const texts = [
    "First announcement",
    "Second announcement",
    "Third announcement",
];

for (let i = 0; i < texts.length; i++) {
    const audio = await fishAudio.textToSpeech.convert({
        text: texts[i],
        reference_id: "model_id",
    });
    const buffer = Buffer.from(await new Response(audio).arrayBuffer());
    await writeFile(`output_${i}.mp3`, buffer);
}

Error Handling

Python
JavaScript

import time
from fishaudio.exceptions import FishAudioError

def generate_with_retry(text, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = client.tts.convert(
                text=text,
                reference_id="model_id"
            )
            return audio
        except FishAudioError as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise e

async function generateWithRetry(text, maxRetries = 3) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            const audio = await fishAudio.textToSpeech.convert({
                text,
                reference_id: "model_id",
            });
            const buffer = Buffer.from(await new Response(audio).arrayBuffer());
            return buffer;
        } catch (err) {
            if (attempt < maxRetries - 1) {
                const delayMs = 2 ** attempt * 1000;
                await new Promise((r) => setTimeout(r, delayMs));
            } else {
                throw err;
            }
        }
    }
}

const buffer = await generateWithRetry("Hello with retry");
await writeFile("retry_output.mp3", buffer);

API Reference

Request Parameters

Parameter	Type	Description	Default
text	string	Text to convert	Required
reference_id	string	Model/voice ID	None
format	string	Audio format	”mp3”
chunk_length	integer	Characters per chunk	200
normalize	boolean	Normalize text	true
latency	string	Speed vs quality	”normal”

Response

Returns audio data in the specified format as binary stream.

Get Support

Need help with text-to-speech?

API Reference
Discord Community: Join our Discord
Email Support: [email protected]

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

Overview

Quick Start

Web Interface

Using the SDK

Voice Options

Using Pre-made Voices

Using Your Cloned Voice

Using Reference Audio

Model Selection

Advanced Options

Audio Formats

Chunk Length

Latency Mode

Direct API Usage

Streaming Audio

Adding Emotions

Best Practices

Text Preparation

Performance Tips

Quality Optimization

Troubleshooting

Common Issues

Code Examples

Batch Processing

Error Handling

API Reference

Request Parameters

Response

Get Support

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

​Overview

​Quick Start

​Web Interface

​Using the SDK

​Voice Options

​Using Pre-made Voices

​Using Your Cloned Voice

​Using Reference Audio

​Model Selection

​Advanced Options

​Audio Formats

​Chunk Length

​Latency Mode

​Direct API Usage

​Streaming Audio

​Adding Emotions

​Best Practices

​Text Preparation

​Performance Tips

​Quality Optimization

​Troubleshooting

​Common Issues

​Code Examples

​Batch Processing

​Error Handling

​API Reference

​Request Parameters

​Response

​Get Support

Overview

Quick Start

Web Interface

Using the SDK

Voice Options

Using Pre-made Voices

Using Your Cloned Voice

Using Reference Audio

Model Selection

Advanced Options

Audio Formats

Chunk Length

Latency Mode

Direct API Usage

Streaming Audio

Adding Emotions

Best Practices

Text Preparation

Performance Tips

Quality Optimization

Troubleshooting

Common Issues

Code Examples

Batch Processing

Error Handling

API Reference

Request Parameters

Response

Get Support