{}

Listen to Page Powered by Fish Audio S2 Pro {resolvedVoices.length > 1 ?

{isDropdownOpen &&

{resolvedVoices.map((voice, index) => )}

}

{}

; }; export const AudioSample = () => { const [isPlaying, setIsPlaying] = useState(false); const [currentTime, setCurrentTime] = useState(0); const [duration, setDuration] = useState(0); const audioRef = useRef(null); useEffect(() => { const audio = audioRef.current; if (!audio) return; const updateTime = () => setCurrentTime(audio.currentTime); const updateDuration = () => setDuration(audio.duration); const handleEnded = () => setIsPlaying(false); audio.addEventListener('timeupdate', updateTime); audio.addEventListener('loadedmetadata', updateDuration); audio.addEventListener('ended', handleEnded); return () => { audio.removeEventListener('timeupdate', updateTime); audio.removeEventListener('loadedmetadata', updateDuration); audio.removeEventListener('ended', handleEnded); }; }, []); const togglePlay = () => { if (isPlaying) { audioRef.current.pause(); } else { audioRef.current.play(); } setIsPlaying(!isPlaying); }; const skip = seconds => { audioRef.current.currentTime = Math.max(0, Math.min(duration, currentTime + seconds)); }; const handleProgressChange = e => { const newTime = parseFloat(e.target.value); audioRef.current.currentTime = newTime; setCurrentTime(newTime); }; const formatTime = time => { if (isNaN(time)) return '0:00'; const minutes = Math.floor(time / 60); const seconds = Math.floor(time % 60); return `${minutes}:${seconds.toString().padStart(2, '0')}`; }; return

Listen to a sample:

; }; ## Overview Transform any text into natural, expressive speech using Fish Audio's advanced TTS models. Choose from pre-made voices or use your own cloned voices. Discover the world's best cloned voices models on our [Discovery](https://fish.audio/discovery) page. ## Quick Start ### Web Interface The easiest way to generate speech: Go to [fish.audio](https://fish.audio) and log in Type or paste the text you want to convert Select from available voices or use your own Click "Generate" and download your audio ## Using the SDK ```bash theme={null} pip install fish-audio-sdk ``` Generate speech with just a few lines of code: ```python theme={null} from fishaudio import FishAudio from fishaudio.utils import save # Initialize client client = FishAudio(api_key="your_api_key_here") # Generate speech audio = client.tts.convert( text="Hello, world!", reference_id="your_voice_model_id" ) save(audio, "output.mp3") print("✓ Audio saved to output.mp3") ``` ```bash theme={null} npm install fish-audio ``` Generate speech with just a few lines of code: ```javascript theme={null} import { FishAudioClient } from "fish-audio"; import { writeFile } from "fs/promises"; // Initialize session const fishAudio = new FishAudioClient({ apiKey: "your_api_key_here" }); const audio = await fishAudio.textToSpeech.convert({ text: "Hello, world!", reference_id: "your_voice_model_id", }); const buffer = Buffer.from(await new Response(audio).arrayBuffer()); await writeFile("output.mp3", buffer); console.log("✓ Audio saved to output.mp3"); ``` ## Voice Options ### Using Pre-made Voices Browse and select voices from the playground: ```python theme={null} # Use a voice from the playground audio = client.tts.convert( text="Welcome to Fish Audio!", reference_id="7f92f8afb8ec43bf81429cc1c9199cb1" ) ``` ```javascript theme={null} # Use a voice from the playground const audio = await fishAudio.textToSpeech.convert({ text: "Welcome to Fish Audio!", reference_id: "7f92f8afb8ec43bf81429cc1c9199cb1", }); ``` ### Using Your Cloned Voice Use voices you've created: ```python theme={null} # Use your own cloned voice audio = client.tts.convert( text="This is my custom voice speaking", reference_id="your_model_id" ) ``` ```javascript theme={null} # Use your own cloned voice const audio = await fishAudio.textToSpeech.convert({ text: "This is my custom voice speaking", reference_id: "your_model_id", }); ``` ### Using Reference Audio Provide reference audio directly: ```python theme={null} from fishaudio.types import ReferenceAudio # Use reference audio on-the-fly with open("voice_sample.wav", "rb") as f: audio = client.tts.convert( text="Hello from reference audio", references=[ ReferenceAudio( audio=f.read(), text="Sample text from the audio" ) ] ) ``` ```javascript theme={null} // Use reference audio on-the-fly const fileBuffer = await readFile("voice_sample.wav"); const voiceFile = new File([fileBuffer], "voice_sample.wav"); const audio = await fishAudio.textToSpeech.convert({ text: "Hello from reference audio", references: [ { audio: voiceFile, text: "Sample text from the audio" } ] }); ``` ## Model Selection Choose the right model for your needs: | Model | Best For | Quality | Speed | | ---------- | --------------- | --------- | ------- | | **s1** | Prototyping | Excellent | Fast | | **s2-pro** | Latest features | Excellent | Fastest | Specify a model in your request: ```python theme={null} # Using the latest model (default) audio = client.tts.convert(text="Hello world") ``` ```javascript theme={null} // Using the latest S2-Pro model const audio = await fishAudio.textToSpeech.convert( { text: "Hello world" }, "s2-pro" ); ``` ## Advanced Options ### Audio Formats Choose your output format: ```python theme={null} audio = client.tts.convert( text="Your text here", format="mp3", # Options: "mp3", "wav", "pcm", "opus" mp3_bitrate=128 # For MP3: 64, 128, or 192 ) ``` ```javascript theme={null} const audio = await fishAudio.textToSpeech.convert({ text: "Your text here", format: "mp3", // Options: "mp3", "wav", "pcm", "opus" mp3_bitrate: 128, // For MP3: 64, 128, or 192 }); ``` ### Chunk Length Control text processing chunks: ```python theme={null} audio = client.tts.convert( text="Long text content...", chunk_length=200 # 100-300 characters per chunk ) ``` ```javascript theme={null} const audio = await fishAudio.textToSpeech.convert({ text: "Long text content...", chunk_length: 200, // 100-300 characters per chunk }); ``` ### Latency Mode Optimize for speed or quality: ```python theme={null} audio = client.tts.convert( text="Quick response needed", latency="balanced" # "normal" or "balanced" ) ``` ```javascript theme={null} const audio = await fishAudio.textToSpeech.convert({ text: "Quick response needed", latency: "balanced", // "normal" or "balanced" }); ``` Balanced mode reduces latency to \~300ms but may slightly decrease stability. ## Direct API Usage For direct API calls without the SDK: ```python theme={null} import httpx import ormsgpack # Prepare request request_data = { "text": "Hello, world!", "reference_id": "your_model_id", "format": "mp3" } # Make API call with httpx.Client() as client: response = client.post( "https://api.fish.audio/v1/tts", content=ormsgpack.packb(request_data), headers={ "authorization": "Bearer YOUR_API_KEY", "content-type": "application/msgpack", "model": "s2-pro" } ) # Save audio with open("output.mp3", "wb") as f: f.write(response.content) ``` ```javascript theme={null} import { encode } from "@msgpack/msgpack"; import { writeFile } from "fs/promises"; const body = encode({ text: "Hello, world!", reference_id: "your_model_id", format: "mp3", }); const res = await fetch("https://api.fish.audio/v1/tts", { method: "POST", headers: { Authorization: "Bearer ", "Content-Type": "application/msgpack", model: "s2-pro", }, body, }); const buffer = Buffer.from(await res.arrayBuffer()); await writeFile("output.mp3", buffer); ``` ## Streaming Audio Stream audio for real-time applications: ```python theme={null} # Stream audio chunks audio_stream = client.tts.stream( text="Streaming this text in real-time", reference_id="model_id" ) with open("stream_output.mp3", "wb") as f: for chunk in audio_stream: f.write(chunk) # Process chunk immediately for real-time playback ``` ```javascript theme={null} // Use a Websocket to stream real-time audio import { FishAudioClient, RealtimeEvents } from "fish-audio"; import { writeFile } from "fs/promises"; import path from "path"; // Simple async generator that yields text chunks async function* makeTextStream() { const chunks = [ "Hello from Fish Audio! ", "This is a realtime text-to-speech test. ", "We are streaming multiple chunks over WebSocket.", ]; for (const chunk of chunks) { yield chunk; } } const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY }); // For realtime, set text to "" and stream the content via makeTextStream const request = { text: "" }; const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream()); // Collect audio and write to a file when the stream ends const chunks: Buffer[] = []; connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened")); connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => { if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) { chunks.push(Buffer.from(audio)); } }); connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err)); connection.on(RealtimeEvents.CLOSE, async () => { const outPath = path.resolve(process.cwd(), "out.mp3"); await writeFile(outPath, Buffer.concat(chunks)); console.log("Saved to", outPath); }); ``` ## Adding Emotions The `(parenthesis)` syntax below applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details. Make your speech more expressive: ```python theme={null} # Add emotion markers to your text emotional_text = """ (excited) I just won the lottery! (sad) But then I lost the ticket. (laughing) Just kidding, I found it! """ audio = client.tts.convert( text=emotional_text, reference_id="model_id" ) ``` ```javascript theme={null} // Add emotion markers to your text const emotionalText = `(excited) I just won the lottery! (sad) But then I lost the ticket. (laughing) Just kidding, I found it!`; const audio = await fishAudio.textToSpeech.convert({ text: emotionalText, reference_id: "model_id", }); ``` Available emotions: * Basic: `(happy)`, `(sad)`, `(angry)`, `(excited)`, `(calm)` * Tones: `(shouting)`, `(whispering)`, `(soft tone)` * Effects: `(laughing)`, `(sighing)`, `(crying)` For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see [Fine-grained Control](/developer-guide/core-features/fine-grained-control). ## Best Practices ### Text Preparation **Do:** * Use proper punctuation for natural pauses * Add emotion markers for expression * Break long texts into paragraphs * Use consistent formatting **Don't:** * Use ALL CAPS (unless shouting) * Mix multiple languages randomly * Include special characters unnecessarily * Forget punctuation ### Performance Tips 1. **Batch Processing:** Process multiple texts efficiently 2. **Cache Models:** Store frequently used model IDs 3. **Optimize Chunk Size:** Use 200 characters for best balance 4. **Handle Errors:** Implement retry logic for network issues ### Quality Optimization For best results: * Use high-quality reference audio for cloning * Choose appropriate emotion markers * Test different latency modes * Monitor API rate limits ## Troubleshooting ### Common Issues **No audio output:** * Check API key validity * Verify model ID exists * Ensure proper audio format **Poor quality:** * Use better reference audio * Try normal latency mode * Check text formatting **Slow generation:** * Use balanced latency mode * Reduce chunk length * Check network connection ## Code Examples ### Batch Processing ```python theme={null} from fishaudio.utils import save texts = [ "First announcement", "Second announcement", "Third announcement" ] for i, text in enumerate(texts): audio = client.tts.convert( text=text, reference_id="model_id" ) save(audio, f"output_{i}.mp3") ``` ```javascript theme={null} const texts = [ "First announcement", "Second announcement", "Third announcement", ]; for (let i = 0; i < texts.length; i++) { const audio = await fishAudio.textToSpeech.convert({ text: texts[i], reference_id: "model_id", }); const buffer = Buffer.from(await new Response(audio).arrayBuffer()); await writeFile(`output_${i}.mp3`, buffer); } ``` ### Error Handling ```python theme={null} import time from fishaudio.exceptions import FishAudioError def generate_with_retry(text, max_retries=3): for attempt in range(max_retries): try: audio = client.tts.convert( text=text, reference_id="model_id" ) return audio except FishAudioError as e: if attempt < max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff else: raise e ``` ```javascript theme={null} async function generateWithRetry(text, maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { const audio = await fishAudio.textToSpeech.convert({ text, reference_id: "model_id", }); const buffer = Buffer.from(await new Response(audio).arrayBuffer()); return buffer; } catch (err) { if (attempt < maxRetries - 1) { const delayMs = 2 ** attempt * 1000; await new Promise((r) => setTimeout(r, delayMs)); } else { throw err; } } } } const buffer = await generateWithRetry("Hello with retry"); await writeFile("retry_output.mp3", buffer); ``` ## API Reference ### Request Parameters | Parameter | Type | Description | Default | | ----------------- | ------- | -------------------- | -------- | | **text** | string | Text to convert | Required | | **reference\_id** | string | Model/voice ID | None | | **format** | string | Audio format | "mp3" | | **chunk\_length** | integer | Characters per chunk | 200 | | **normalize** | boolean | Normalize text | true | | **latency** | string | Speed vs quality | "normal" | ### Response Returns audio data in the specified format as binary stream. ## Get Support Need help with text-to-speech? * [API Reference](/api-reference/introduction) * **Discord Community:** [Join our Discord](https://discord.gg/fish-audio) * **Email Support:** [support@fish.audio](mailto:support@fish.audio)