Speech to Text

For better speech recognition quality, you can specify the language of the audio input. If not specified, the system will attempt to automatically detect the language.

Using the Fish Audio SDK

First, make sure you have the Fish Audio SDK installed. You can install it from GitHub or PyPI.

Example Usage

from fish_audio_sdk import Session, ASRRequest

session = Session("your_api_key")

# Read the audio file
with open("input_audio.mp3", "rb") as audio_file:
    audio_data = audio_file.read()

# Option 1: Without specifying language (auto-detect)
response = session.asr(ASRRequest(audio=audio_data))

# Option 2: Specifying the language
response = session.asr(ASRRequest(audio=audio_data, language="en"))

# Option 3: With precise timestamps (may increase latency for short audio)
response = session.asr(ASRRequest(audio=audio_data, ignore_timestamps=False))

print(f"Transcribed text: {response.text}")
print(f"Audio duration: {response.duration} seconds")

for segment in response.segments:
    print(f"Segment: {segment.text}")
    print(f"Start time: {segment.start}, End time: {segment.end}")

This example demonstrates three ways to use the Speech-to-Text API:

Without specifying a language: The system will attempt to auto-detect the language.
Specifying the language: You can provide the language code (e.g., “en” for English) for potentially better recognition.
With precise timestamps: By setting ignore_timestamps=False, you can get more accurate timing information for each segment. Note that this may increase latency for short audio files.

The ignore_timestamps parameter is set to True by default. This reduces latency for short audio

Raw API Usage

If you prefer to use the raw API instead of the SDK, you can use the MessagePack API as described below.

Endpoint Details

Method: POST
URL: https://api.fish.audio/v1/asr
Content-Type: multipart/form-data or application/msgpack

Example Usage

import httpx
import ormsgpack

# Read the audio file
with open("input_audio.mp3", "rb") as audio_file:
    audio_data = audio_file.read()

# Prepare the request data
request_data = {
    "audio": audio_data,
    "language": "en",  # Optional: specify the language
    "ignore_timestamps": False  # Optional: set to True to ignore precise timestamps
}

# Send the request
with httpx.Client() as client:
    response = client.post(
        "https://api.fish.audio/v1/asr",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/msgpack",
        },
        content=ormsgpack.packb(request_data),
    )

# Parse the response
result = response.json()

print(f"Transcribed text: {result['text']}")
print(f"Audio duration: {result['duration']} seconds")

for segment in result['segments']:
    print(f"Segment: {segment['text']}")
    print(f"Start time: {segment['start']}, End time: {segment['end']}")

This example shows how to use the raw API with MessagePack serialization. You can also use multipart/form-data by changing the Content-Type header and adjusting the request data format accordingly. Make sure to replace "YOUR_API_KEY" with your actual API key, and adjust the file paths as needed.

Get Started

Text to Speech

Emotion Control

Speech to Text

Using the Fish Audio SDK

Example Usage

Raw API Usage

Endpoint Details

Example Usage

Get Started

Text to Speech

Emotion Control

Speech to Text

​Using the Fish Audio SDK

​Example Usage

​Raw API Usage

​Endpoint Details

​Example Usage

Using the Fish Audio SDK

Example Usage

Raw API Usage

Endpoint Details

Example Usage