Prerequisites
Create a Fish Audio account
Sign up for a free Fish Audio account to get started with our API.
Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
Overview
WebSocket streaming enables real-time text-to-speech generation, perfect for conversational AI, live captioning, and streaming applications.
Basic Streaming
Stream text and receive audio in real-time:
import { FishAudioClient , RealtimeEvents } from "fish-audio" ;
import { writeFile } from "fs/promises" ;
import path from "path" ;
// Simple async generator that yields text chunks
async function* makeTextStream () {
const chunks = [
"Hello from Fish Audio! " ,
"This is a realtime text-to-speech test. " ,
"We are streaming multiple chunks over WebSocket." ,
];
for ( const chunk of chunks ) {
yield chunk ;
}
}
const fishAudio = new FishAudioClient ({ apiKey: process . env . FISH_API_KEY });
// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };
const connection = await fishAudio . textToSpeech . convertRealtime ( request , makeTextStream ());
// Collect audio and write to a file when the stream ends
const chunks : Buffer [] = [];
connection . on ( RealtimeEvents . OPEN , () => console . log ( "WebSocket opened" ));
connection . on ( RealtimeEvents . AUDIO_CHUNK , ( audio : unknown ) : void => {
if ( audio instanceof Uint8Array || Buffer . isBuffer ( audio )) {
chunks . push ( Buffer . from ( audio ));
}
});
connection . on ( RealtimeEvents . ERROR , ( err ) => console . error ( "WebSocket error:" , err ));
connection . on ( RealtimeEvents . CLOSE , async () => {
const outPath = path . resolve ( process . cwd (), "out.mp3" );
await writeFile ( outPath , Buffer . concat ( chunks ));
console . log ( "Saved to" , outPath );
});
Set text: "" in the request when streaming. The actual text comes from your text stream generator.
Using Voice Models
Stream with a specific voice:
const request = {
text: "" , // Empty for streaming
reference_id: "your_model_id" ,
format: "mp3" ,
};
const conn = await fishAudio . textToSpeech . convertRealtime ( request , makeTextStream ());
conn . on ( RealtimeEvents . AUDIO_CHUNK , () => { /* handle audio */ });
Dynamic Text Generation
Stream text as it’s generated:
async function* generateText () {
const responses = [
"Processing your request..." ,
"Here's what I found:" ,
"The answer is 42." ,
];
for ( const response of responses ) {
for ( const word of response . split ( " " )) {
yield word + " " ;
await new Promise ( r => setTimeout ( r , 20 ));
}
}
}
await fishAudio . textToSpeech . convertRealtime ({ text: "" }, generateText ());
Line-by-Line Processing
Stream text line by line:
import { createReadStream } from "fs" ;
import readline from "readline" ;
async function* readFileLines ( filepath : string ) {
const rl = readline . createInterface ({ input: createReadStream ( filepath ) });
for await ( const line of rl ) {
yield line . trim () + " " ;
}
}
await fishAudio . textToSpeech . convertRealtime ({ text: "" }, readFileLines ( "story.txt" ));
Errors
Handle connection errors via event listeners:
connection . on ( RealtimeEvents . ERROR , ( err ) => {
console . error ( "WebSocket error:" , err );
// Fallback to regular TTS or retry
});
Configuration/Choosing Backend
Customize WebSocket behavior by configuring the client.
Optionally specify the backend model to use.
Our state-of-the-art S1 model is the default:
// Custom endpoint
const fishAudio = new FishAudioClient ({
apiKey: process . env . FISH_API_KEY ,
baseUrl: "https://api.fish.audio" , // Use a proxy/custom endpoint if needed
});
// Select backend model
const conn = await fishAudio . textToSpeech . convertRealtime (
request ,
makeTextStream (),
backend : "s1"
);
Best Practices
Chunk Size : Yield text in natural phrases for best prosody
Buffer Management : Process audio chunks immediately to avoid memory buildup
Connection Reuse : Keep WebSocket sessions alive for multiple streams
Error Recovery : Implement retry logic for connection failures
Format Selection : Use PCM for real-time playback, MP3 for storage
Events
The connection emits these events:
Event Description OPENWebSocket connection established AUDIO_CHUNKAudio chunk received (Uint8Array) ERRORError occurred on the connection CLOSEConnection closed