WebSocket

Prerequisites

Create a Fish Audio account

Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section

Get your API key

Once you have an account, you’ll need an API key to authenticate your requests.

Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely

Keep your API key secret! Never commit it to version control or share it publicly.

Overview

WebSocket streaming enables real-time text-to-speech generation, perfect for conversational AI, live captioning, and streaming applications.

Basic Streaming

Stream text and receive audio in real-time:

import { FishAudioClient, RealtimeEvents } from "fish-audio";
import { writeFile } from "fs/promises";
import path from "path";

// Simple async generator that yields text chunks
async function* makeTextStream() {
  const chunks = [
    "Hello from Fish Audio! ",
    "This is a realtime text-to-speech test. ",
    "We are streaming multiple chunks over WebSocket.",
  ];
  for (const chunk of chunks) {
    yield chunk;
  }
}

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };

const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());

// Collect audio and write to a file when the stream ends
const chunks: Buffer[] = [];
connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
  if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
    chunks.push(Buffer.from(audio));
  }
});
connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
connection.on(RealtimeEvents.CLOSE, async () => {
  const outPath = path.resolve(process.cwd(), "out.mp3");
  await writeFile(outPath, Buffer.concat(chunks));
  console.log("Saved to", outPath);
});

Set text: "" in the request when streaming. The actual text comes from your text stream generator.

Using Voice Models

Stream with a specific voice:

const request = {
  text: "",                // Empty for streaming
  reference_id: "your_model_id",
  format: "mp3",
};

const conn = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());
conn.on(RealtimeEvents.AUDIO_CHUNK, () => { /* handle audio */ });

Dynamic Text Generation

Stream text as it’s generated:

async function* generateText() {
  const responses = [
    "Processing your request...",
    "Here's what I found:",
    "The answer is 42.",
  ];
  for (const response of responses) {
    for (const word of response.split(" ")) {
      yield word + " ";
      await new Promise(r => setTimeout(r, 20));
    }
  }
}

await fishAudio.textToSpeech.convertRealtime({ text: "" }, generateText());

Line-by-Line Processing

Stream text line by line:

import { createReadStream } from "fs";
import readline from "readline";

async function* readFileLines(filepath: string) {
  const rl = readline.createInterface({ input: createReadStream(filepath) });
  for await (const line of rl) {
    yield line.trim() + " ";
  }
}

await fishAudio.textToSpeech.convertRealtime({ text: "" }, readFileLines("story.txt"));

Errors

Handle connection errors via event listeners:

connection.on(RealtimeEvents.ERROR, (err) => {
  console.error("WebSocket error:", err);
  // Fallback to regular TTS or retry
});

Configuration/Choosing Backend

Customize WebSocket behavior by configuring the client.
Optionally specify the backend model to use. Our state-of-the-art S2-Pro model is the default:

// Custom endpoint
const fishAudio = new FishAudioClient({
  apiKey: process.env.FISH_API_KEY,
  baseUrl: "https://api.fish.audio", // Use a proxy/custom endpoint if needed
});

// Select backend model
const conn = await fishAudio.textToSpeech.convertRealtime(
  request,
  makeTextStream(),
  backend: "s2-pro"
);

Best Practices

Chunk Size: Yield text in natural phrases for best prosody
Buffer Management: Process audio chunks immediately to avoid memory buildup
Connection Reuse: Keep WebSocket sessions alive for multiple streams
Error Recovery: Implement retry logic for connection failures
Format Selection: Use PCM for real-time playback, MP3 for storage

Events

The connection emits these events:

Event	Description
`OPEN`	WebSocket connection established
`AUDIO_CHUNK`	Audio chunk received (Uint8Array)
`ERROR`	Error occurred on the connection
`CLOSE`	Connection closed

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

Prerequisites

Overview

Basic Streaming

Using Voice Models

Dynamic Text Generation

Line-by-Line Processing

Errors

Configuration/Choosing Backend

Best Practices

Events

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

​Prerequisites

​Overview

​Basic Streaming

​Using Voice Models

​Dynamic Text Generation

​Line-by-Line Processing

​Errors

​Configuration/Choosing Backend

​Best Practices

​Events

Prerequisites

Overview

Basic Streaming

Using Voice Models

Dynamic Text Generation

Line-by-Line Processing

Errors

Configuration/Choosing Backend

Best Practices

Events