Skip to main content
This guide will walk you through installation, authentication, and core features.
If you’re using the legacy Session-based API (fish_audio_sdk), see the migration guide to upgrade to the new SDK.

Installation

1

Install the SDK

Install via pip (Python 3.9 or higher required):
pip install fish-audio-sdk
For audio playback utilities, install with the utils extra:
pip install fish-audio-sdk[utils]
2

Get your API key

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click “Create New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
3

Set up authentication

Configure your API key using environment variables:
export FISH_API_KEY=your_api_key_here
Or create a .env file in your project root:
FISH_API_KEY=your_api_key_here

Quick Start

Get started with the FishAudio client in less than a minute:
from fishaudio import FishAudio
from fishaudio.utils import play, save

# Initialize client (reads from FISH_API_KEY environment variable)
client = FishAudio()

# Generate and play audio
audio = client.tts.convert(text="Hello, playing from Fish Audio!")
play(audio)

# Generate and save audio
audio = client.tts.convert(text="Saving this audio to a file!")
save(audio, "output.mp3")

Core Features

Text-to-Speech

Fully customizable text-to-speech generation:
from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

# With a specific voice
audio = client.tts.convert(
    text="Custom voice",
    reference_id="bf322df2096a46f18c579d0baa36f41d" # Adrian
)
play(audio)
from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

# With speed control
audio = client.tts.convert(
    text="I'm talking pretty fast, is this still too slow?",
    speed=1.5  # 1.5x speed
)
play(audio)
Create reusable configurations with TTSConfig. Prosody controls speech characteristics like speed and volume:
from fishaudio import FishAudio
from fishaudio.types import TTSConfig, Prosody
from fishaudio.utils import play

client = FishAudio()

# Define config once
my_config = TTSConfig(
    prosody=Prosody(speed=1.2, volume=-5),
    reference_id="933563129e564b19a115bedd57b7406a", # Sarah
    format="wav",
    latency="balanced"
)

# Reuse across multiple generations
audio1 = client.tts.convert(text="Welcome to our product demonstration.", config=my_config)
audio2 = client.tts.convert(text="Let me show you the key features.", config=my_config)
audio3 = client.tts.convert(text="Thank you for watching this tutorial.", config=my_config)

play(audio1)
play(audio2)
play(audio3)
For chunk-by-chunk processing, use stream() which returns an AudioStream (iterable). For real-time streaming with dynamic text, see Real-time Streaming below.
Learn more in the Text-to-Speech guide.

Speech-to-Text

Transcribe audio to text for various use cases:
from fishaudio import FishAudio

client = FishAudio()

# Transcribe audio
with open("audio.wav", "rb") as f:
    result = client.asr.transcribe(
        audio=f.read(),
        language="en"  # Optional: specify language
    )

print(result.text)

# Access segments
for segment in result.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")
Learn more in the Speech-to-Text guide.

Real-time Streaming

Stream dynamically generated text for conversational AI and live applications. Perfect for integrating with LLM streaming responses, live captions, and chatbot interactions:
from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

# Stream dynamically generated text (e.g., from LLM)
def text_chunks():
    yield "Hello, "
    yield "this is "
    yield "streaming text!"

audio_stream = client.tts.stream_websocket(
    text_chunks(),
    latency="balanced"
)

play(audio_stream)
Learn more in the WebSocket Streaming guide.

Voice Cloning

Instant voice cloning - Clone a voice on-the-fly using ReferenceAudio:
from fishaudio import FishAudio
from fishaudio.types import ReferenceAudio

client = FishAudio()

# Instant voice cloning
with open("reference.wav", "rb") as f:
    audio = client.tts.convert(
        text="This will sound like the reference voice",
        references=[ReferenceAudio(
            audio=f.read(),
            text="Text spoken in the reference audio"
        )]
    )
Voice models - Create persistent voice models for repeated use:
from fishaudio import FishAudio

client = FishAudio()

# Create persistent voice model
with open("voice_sample.wav", "rb") as f:
    voice = client.voices.create(
        title="My Custom Voice",
        voices=[f.read()],
        description="Custom voice clone"
    )
print(f"Created voice: {voice.id}")
Learn more in the Voice Cloning guide.

Client Initialization

  • Environment Variable
  • Direct API Key
  • Custom Endpoint
The recommended approach using environment variables:
from fishaudio import FishAudio

# Automatically reads from FISH_API_KEY environment variable
client = FishAudio()

Sync vs Async

The SDK provides both synchronous and asynchronous clients:
from fishaudio import FishAudio

# For typical applications
client = FishAudio()
audio = client.tts.convert(text="Hello!")
Use AsyncFishAudio when:
  • Building async web applications (FastAPI, Sanic, etc.)
  • Processing multiple requests concurrently
  • Integrating with other async libraries
  • You need maximum performance

Resource Clients

The SDK organizes functionality into resource clients:
ResourceDescriptionKey Methods
client.ttsText-to-speechconvert(), stream(), stream_websocket()
client.asrSpeech recognitiontranscribe()
client.voicesVoice managementlist(), get(), create(), update(), delete()
client.accountAccount infoget_credits(), get_package()

Utility Functions

The SDK includes helpful utilities (requires utils extra):
from fishaudio.utils import save, play, stream

# Save audio to file
save(audio, "output.mp3")

# Play audio (automatically detects environment)
play(audio)  # Works in Jupyter, regular Python, etc.

# Stream audio in real-time (requires mpv)
stream(audio_iterator)
Use play() for playback and save() for writing audio files. Learn more in the API Reference - Utils.

Error Handling

The SDK provides a comprehensive exception hierarchy:
from fishaudio import FishAudio
from fishaudio.exceptions import (
    FishAudioError,
    AuthenticationError,
    RateLimitError,
    ValidationError
)

client = FishAudio()

try:
    audio = client.tts.convert(text="Hello!")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except ValidationError as e:
    print(f"Invalid request: {e}")
except FishAudioError as e:
    print(f"API error: {e}")
The SDK includes exceptions for AuthenticationError, RateLimitError, ValidationError, and FishAudioError for common error scenarios. Learn more in the API Reference - Exceptions.

Next Steps

Resources