Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click “Create New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.

Basic Usage

Generate speech from text:
import { FishAudioClient, play } from "fish-audio";

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

const audio = await fishAudio.textToSpeech.convert({
  text: "Hello, world!",
});

await play(audio);

Using Voice Models

Specify a voice model for consistent voice generation:
import { FishAudioClient } from "fish-audio";

const fishAudio = new FishAudioClient();

const audio = await fishAudio.textToSpeech.convert({
  text: "This is my custom voice",
  reference_id: "your_model_id", // Your model ID from fish.audio
});

await play(audio);

Getting Model IDs

The reference_id is the model ID from the URL when viewing a model on Fish Audio:
  • Model URL: https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89
  • Reference ID: 802e3bc2b27e49c2995d23ef70e6ac89
You can also get model IDs programmatically:
// List your models
const results = await fishAudio.voices.search({ self: true });
for (const model of results.items ?? []) {
  console.log(`${model.title}: ${model._id}`);
}

// Get specific model details
const model = await fishAudio.voices.get("your_model_id");
console.log(`Model: ${model.title}, ID: ${model._id}`);

Emotions

Add emotional expressions to your text:
import type { TTSRequest } from "fish-audio";

const text = `
(happy) I'm excited to share this!
(sad) Unfortunately, it didn't work out.
(whispering) This is a secret.
`;

const request: TTSRequest = { text, reference_id: "model_id" };
Common emotions: (happy), (sad), (angry), (excited), (calm), (surprised), (whispering), (shouting), (laughing), (sighing) For more advanced control over speech generation, including phoneme-level control and additional paralanguage features, see Fine-grained Control.

Audio Formats

Choose output format based on your needs:
// MP3 (default)
await fishAudio.textToSpeech.convert({ text: "...", format: "mp3", mp3_bitrate: 192 });

// WAV - uncompressed
await fishAudio.textToSpeech.convert({ text: "...", format: "wav", sample_rate: 44100 });

// Opus - efficient for streaming
await fishAudio.textToSpeech.convert({ text: "...", format: "opus", opus_bitrate: 48 });

// PCM - raw audio data
await fishAudio.textToSpeech.convert({ text: "...", format: "pcm", sample_rate: 16000 });

Prosody Control

Adjust speech speed and volume:
const audio = await fishAudio.textToSpeech.convert({
  text: "Adjusted speech",
  prosody: {
    speed: 1.2,  // 0.5 - 2.0
    volume: 5,   // -20 - 20
  },
});

Advanced Parameters

Fine-tune generation:
const audio = await client.textToSpeech.convert({
  text: "Your text here",
  chunk_length: 200,    // Characters per chunk (100-300)
  normalize: true,      // Normalize text
  latency: "balanced",  // "normal" or "balanced"
  temperature: 0.7,     // Randomness (0.0-1.0)
  top_p: 0.7,           // Token selection (0.0-1.0)
});

Streaming

For real-time streaming, see the WebSocket guide.

Error Handling

Handle common errors:
async function generateWithRetry(request: Record<string, unknown>, maxRetries = 3) {
  const fishAudio = new FishAudioClient();
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fishAudio.textToSpeech.convert(request);
    } catch (e: any) {
      const status = e?.status || e?.response?.status;
      if (status === 429) await new Promise(r => setTimeout(r, 2 ** attempt * 1000));
      else if (status === 401) throw new Error("Invalid API key");
      else throw e;
    }
  }
}

Request Parameters

ParameterTypeDescriptionDefault
textstringText to convertRequired
reference_idstringVoice model IDNone
referencesobject[]Reference audio[]
formatstringAudio format”mp3”
chunk_lengthnumberChunk size (100-300)200
normalizebooleanNormalize texttrue
latencystringSpeed vs quality”balanced”
prosodyobjectSpeed/volumeNone
temperaturenumberRandomness0.7
top_pnumberToken selection0.7

Next Steps