Prerequisites
Create a Fish Audio account
Create a Fish Audio account
Sign up for a free Fish Audio account to get started with our API.
- Go to fish.audio/auth/signup
- Fill in your details to create an account, complete steps to verify your account.
- Log in to your account and navigate to the API section
Get your API key
Get your API key
Once you have an account, you’ll need an API key to authenticate your requests.
- Log in to your Fish Audio Dashboard
- Navigate to the API Keys section
- Click “Create New Key” and give it a descriptive name, set a expiration if desired
- Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
Get free API credits by verifying your phone number.
Basic Usage
Transcribe audio to text:Language Specification
Improve accuracy by specifying the language:en
(English), zh
(Chinese), es
(Spanish), fr
(French), de
(German), ja
(Japanese), ko
(Korean), pt
(Portuguese)
Automatic language detection works well, but specifying the language improves accuracy and speed.
Working with Segments
Get detailed timing for each segment:Timestamps Control
Control timestamp generation:ignore_timestamps=False
(default) includes segment timestamps. Set to True
to skip timestamp processing for faster transcription when you only need the text.Audio Formats
Supported audio formats:- MP3 (recommended)
- WAV
- M4A
- OGG
- FLAC
- AAC
- Maximum size: 100MB
- Maximum duration: 60 minutes
- Sample rate: 16kHz or higher recommended
Transcribing TTS Output
Transcribe generated speech:Error Handling
Handle common errors:Response Structure
The ASR response includes:Field | Type | Description |
---|---|---|
text | str | Complete transcription |
duration | float | Audio duration (milliseconds) |
segments | list[ASRSegment] | Timestamped text segments |
Field | Type | Description |
---|---|---|
text | str | Segment text |
start | float | Start time (seconds) |
end | float | End time (seconds) |
Note the timing units:
duration
is in milliseconds while segment start
/end
are in seconds.Request Parameters
Parameter | Type | Description | Default |
---|---|---|---|
audio | bytes | Audio data to transcribe | Required |
language | str | Language code (e.g., “en”) | None (auto-detect) |
ignore_timestamps | bool | Skip timestamp processing | False |