Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click β€œCreate New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
Get free API credits by verifying your phone number.

Basic Usage

Transcribe audio to text:
from fish_audio_sdk import Session, ASRRequest

session = Session("your_api_key")

# Read audio file
with open("audio.mp3", "rb") as f:
    audio_data = f.read()

# Transcribe
response = session.asr(ASRRequest(
    audio=audio_data
))

print(response.text)
print(f"Duration: {response.duration}ms")

Language Specification

Improve accuracy by specifying the language:
# English transcription
response = session.asr(ASRRequest(
    audio=audio_data,
    language="en"
))

# Chinese transcription
response = session.asr(ASRRequest(
    audio=audio_data,
    language="zh"
))
Common language codes: en (English), zh (Chinese), es (Spanish), fr (French), de (German), ja (Japanese), ko (Korean), pt (Portuguese)
Automatic language detection works well, but specifying the language improves accuracy and speed.

Working with Segments

Get detailed timing for each segment:
response = session.asr(ASRRequest(
    audio=audio_data
))

# Full transcription
print(response.text)

# Segment details
for segment in response.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Timestamps Control

Control timestamp generation:
# Include timestamps (default)
response = session.asr(ASRRequest(
    audio=audio_data,
    ignore_timestamps=False  # False = include timestamps
))

# Skip timestamp processing for faster results
response = session.asr(ASRRequest(
    audio=audio_data,
    ignore_timestamps=True   # True = skip timestamps
))
ignore_timestamps=False (default) includes segment timestamps. Set to True to skip timestamp processing for faster transcription when you only need the text.

Audio Formats

Supported audio formats:
  • MP3 (recommended)
  • WAV
  • M4A
  • OGG
  • FLAC
  • AAC
File requirements:
  • Maximum size: 100MB
  • Maximum duration: 60 minutes
  • Sample rate: 16kHz or higher recommended

Transcribing TTS Output

Transcribe generated speech:
from fish_audio_sdk import TTSRequest

# Generate speech
audio_buffer = bytearray()
for chunk in session.tts(TTSRequest(
    text="Hello, this is a test"
)):
    audio_buffer.extend(chunk)

# Transcribe it
response = session.asr(ASRRequest(
    audio=bytes(audio_buffer)
))

print(response.text)

Error Handling

Handle common errors:
from fish_audio_sdk.exceptions import HttpCodeErr

try:
    response = session.asr(ASRRequest(
        audio=audio_data
    ))
except HttpCodeErr as e:
    if e.status_code == 413:
        print("Audio file too large (max 100MB)")
    elif e.status_code == 400:
        print("Invalid audio format")
    else:
        raise e

Response Structure

The ASR response includes:
FieldTypeDescription
textstrComplete transcription
durationfloatAudio duration (milliseconds)
segmentslist[ASRSegment]Timestamped text segments
Segment structure:
FieldTypeDescription
textstrSegment text
startfloatStart time (seconds)
endfloatEnd time (seconds)
Note the timing units: duration is in milliseconds while segment start/end are in seconds.

Request Parameters

ParameterTypeDescriptionDefault
audiobytesAudio data to transcribeRequired
languagestrLanguage code (e.g., β€œen”)None (auto-detect)
ignore_timestampsboolSkip timestamp processingFalse
⌘I