Text to Speech (WebSocket)
Use two-way websocket to get real-time TTS audio
For better speech quality and lower latency, upload reference audio via the create model endpoint. This method uses the Fish Audio SDK and provides a more streamlined approach.
Using the Fish Audio SDK
First, make sure you have the Fish Audio SDK installed. You can install it from GitHub or PyPI.
Example Usage
This example demonstrates two ways to use the Text-to-Speech API:
-
Using a
reference_id
: This option uses a model that you’ve previously uploaded or chosen from the playground. Replace"MODEL_ID_UPLOADED_OR_CHOSEN_FROM_PLAYGROUND"
with the actual model ID. -
Using reference audio: This option allows you to provide a reference audio file and its corresponding text directly in the request.
Make sure to replace "your_api_key"
with your actual API key, and adjust the file paths as needed.
Raw WebSocket API Usage
The WebSocket API provides real-time, bidirectional communication for Text-to-Speech streaming. Here’s how the protocol works:
WebSocket Protocol
-
Connection Endpoint:
- URL:
wss://api.fish.audio/v1/tts/live
- URL:
-
Events:
a.
start
- Initializes the TTS session:b.
text
- Sends text chunks:c.
audio
- Receives audio data (server response):d.
stop
- Ends the session:e.
finish
- Ends the session (server side):f.
log
- Logs messages from the server if debug is true: -
Message Format: All messages use MessagePack encoding
Example Usage with OpenAI + MPV
This example demonstrates:
- Real-time text streaming with WebSocket connection
- Handling audio chunks as they arrive
- Using MPV player for real-time audio playback
- Reference audio support for voice cloning
- Proper connection handling and cleanup
Make sure to install required dependencies:
And install MPV player for audio playback (optional):
- Linux:
apt-get install mpv
- macOS:
brew install mpv
- Windows: Download from mpv.io