Skip to main content
This guide helps you migrate from the legacy fish_audio_sdk (Session-based API) to the new fishaudio (client-based API) available in fish-audio-sdk v1.0+.

Quick Migration

1

Upgrade the package

pip install --upgrade fish-audio-sdk
The package name stays the same, but the import changes from fish_audio_sdk to fishaudio.
2

Update imports

# Before
from fish_audio_sdk import Session, TTSRequest, ASRRequest

# After
from fishaudio import FishAudio
from fishaudio.types import TTSConfig, ReferenceAudio
3

Replace Session with Client

# Before
session = Session("your_api_key")

# After
client = FishAudio(api_key="your_api_key")
# Or use environment variable
client = FishAudio()  # Reads from FISH_API_KEY
4

Update API calls

See the quick reference below for common operations.

Key Changes at a Glance

LegacyNewNotes
Session()FishAudio()Client-based architecture
session.tts()client.tts.convert()Returns complete audio bytes
session.asr()client.asr.transcribe()Clearer method name
session.create_model()client.voices.create()”Model” → “Voice” terminology
session.list_models()client.voices.list()Resource namespacing
TTSRequest(...)Direct parametersNo request objects
WebSocketSessionclient.tts.stream_websocket()Integrated into client
HttpCodeErrSpecific exceptionsBetter error handling

Text-to-Speech Migration

from fish_audio_sdk import Session, TTSRequest

session = Session("your_api_key")

# Basic TTS - returns chunks
audio = b""
for chunk in session.tts(TTSRequest(text="Hello, world!")):
    audio += chunk

with open("output.mp3", "wb") as f:
    f.write(audio)
The new SDK’s convert() returns complete audio bytes instead of chunks. Use stream() for chunk-by-chunk transfer or stream_websocket() for real-time streaming.

Voice Cloning Migration

from fish_audio_sdk import Session, TTSRequest, ReferenceAudio

session = Session("your_api_key")

# Instant cloning
with open("voice.wav", "rb") as f:
    request = TTSRequest(
        text="Cloned voice",
        references=[ReferenceAudio(
            audio=f.read(),
            text="Reference transcript"
        )]
    )
    audio = b"".join(session.tts(request))

# Create voice model
model = session.create_model(
    title="My Voice",
    voices=[voice_data],
    texts=["Sample text"]
)

Speech-to-Text Migration

from fish_audio_sdk import Session, ASRRequest

session = Session("your_api_key")

with open("audio.mp3", "rb") as f:
    response = session.asr(ASRRequest(
        audio=f.read(),
        language="en"
    ))

print(response.text)

# Timestamps in SECONDS
for segment in response.segments:
    print(f"[{segment.start}s - {segment.end}s]")
ASR timestamps changed from seconds to milliseconds. Divide by 1000 to convert: seconds = segment.start / 1000

WebSocket Streaming Migration

from fish_audio_sdk import WebSocketSession, TTSRequest

ws_session = WebSocketSession("your_api_key")

def text_stream():
    yield "Hello, "
    yield "streaming!"

with ws_session:
    for chunk in ws_session.tts(TTSRequest(text=""), text_stream()):
        # Process audio chunks
        pass

Error Handling Migration

from fish_audio_sdk.exceptions import HttpCodeErr

try:
    audio = session.tts(request)
except HttpCodeErr as e:
    if e.status_code == 429:
        print("Rate limited")
    elif e.status_code == 401:
        print("Auth failed")

Async Support

The new SDK has full async support with AsyncFishAudio:
import asyncio
from fishaudio import AsyncFishAudio

async def main():
    client = AsyncFishAudio()

    # All methods work with await
    audio = await client.tts.convert(text="Async speech")
    result = await client.asr.transcribe(audio=audio_bytes)
    voices = await client.voices.list()

asyncio.run(main())

Breaking Changes Summary

Before: Iterator of chunks
audio = b""
for chunk in session.tts(request):
    audio += chunk
After: Complete audio bytes
audio = client.tts.convert(text="...")
Use stream() or stream_websocket() if you need chunks.
Before:
request = TTSRequest(text="...", format="mp3")
audio = session.tts(request)
After:
audio = client.tts.convert(text="...", format="mp3")
Pass parameters directly to methods.
Before: segment.start in seconds (e.g., 1.5)After: segment.start in milliseconds (e.g., 1500)Convert: seconds = segment.start / 1000
  • session.create_model()client.voices.create()
  • session.list_models()client.voices.list()
  • session.get_model()client.voices.get()
Plus new methods: client.voices.update() and client.voices.delete()

Common Issues

Upgrade the package:
pip install --upgrade fish-audio-sdk
python -c "import fishaudio; print(fishaudio.__version__)"
The new convert() returns complete audio. Use stream() for chunks:
audio_stream = client.tts.stream(text="...")
for chunk in audio_stream:
    process_chunk(chunk)
Remove the empty text. Just pass your generator:
# Before
ws_session.tts(TTSRequest(text=""), text_stream())

# After
client.tts.stream_websocket(text_stream())
New SDK uses milliseconds instead of seconds:
seconds = segment.start / 1000

Next Steps

Need Help?