Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click β€œCreate New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.

Basic Usage

Transcribe audio to text:
import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

const result = await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
});

console.log(result.text);
console.log("Duration (s):", result.duration);

Language Specification

Improve accuracy by specifying the language:
// English transcription
await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
  language: "en"
});

// Chinese transcription
await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
  language: "zh"
});
Common language codes: en (English), zh (Chinese), es (Spanish), fr (French), de (German), ja (Japanese), ko (Korean), pt (Portuguese)
Automatic language detection works well, but specifying the language improves accuracy and speed.

Working with Segments

Get detailed timing for each segment:
const response = await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3") });

// Full transcription
console.log(response.text);

// Segment details
for (const seg of response.segments ?? []) {
  console.log(`[${seg.start.toFixed(2)}s - ${seg.end.toFixed(2)}s] ${seg.text}`);
}

Timestamps Control

Control timestamp generation:
// Include timestamps (default)
await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3"), ignore_timestamps: false });

// Skip timestamp processing for faster results
await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3"), ignore_timestamps: true });
ignore_timestamps: false (default) includes segment timestamps. Set to true to skip timestamp processing for faster transcription when you only need the text.

Audio Formats

Supported audio formats:
  • MP3 (recommended)
  • WAV
  • M4A
  • OGG
  • FLAC
  • AAC
File requirements:
  • Maximum size: 100MB
  • Maximum duration: 60 minutes
  • Sample rate: 16kHz or higher recommended

Transcribing TTS Output

Transcribe generated speech:
import { FishAudioClient } from "fish-audio";

const fishAudio = new FishAudioClient();

// Generate speech
const ttsAudio = await fishAudio.textToSpeech.convert({ text: "Hello, this is a test" });

// Transcribe it
const asr = await fishAudio.speechToText.convert({ audio: ttsAudio });
console.log(asr.text);

Error Handling

Handle common errors:
try {
  await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3") });
} catch (e: any) {
  const status = e?.status || e?.response?.status;
  if (status === 413) console.error("Audio file too large (max 100MB)");
  else if (status === 400) console.error("Invalid audio format");
  else throw e;
}

Response Structure

The ASR response includes:
FieldTypeDescription
textstringComplete transcription
durationnumberAudio duration (seconds)
segmentsASRSegment[]Timestamped text segments
Segment structure:
FieldTypeDescription
textstringSegment text
startnumberStart time (seconds)
endnumberEnd time (seconds)
Note the timing units: duration and segment times are in seconds.

Request Parameters

ParameterTypeDescriptionDefault
audioFileBufferReadable streamAudio to transcribeRequired
languagestringLanguage code (e.g., β€œen”)None (auto-detect)
ignore_timestampsbooleanSkip timestamp processingfalse