Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click “Create New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.

Overview

Use stream_websocket() for real-time text streaming with LLMs and live captions. The connection automatically buffers incoming text and generates audio as it becomes available.

Basic Usage

Stream text chunks and receive audio in real-time:
from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

# Define text generator
def text_chunks():
    yield "Hello, "
    yield "this is "
    yield "real-time "
    yield "streaming!"

# Stream audio via WebSocket
audio_stream = client.tts.stream_websocket(
    text_chunks(),
    latency="balanced"  # Use "balanced" for real-time, "normal" for quality
)

# Play streamed audio
play(audio_stream)
For details on audio formats, voice selection, and advanced configuration options like TTSConfig, see the Text-to-Speech guide.

Using FlushEvent

Force immediate audio generation to create pauses using FlushEvent:
from fishaudio import FishAudio
from fishaudio.types import FlushEvent

client = FishAudio()

def text_with_flush():
    yield "First sentence. "
    yield "Second sentence. "
    yield FlushEvent()  # Forces generation NOW
    yield "Third sentence."

audio_stream = client.tts.stream_websocket(text_with_flush())
See Text-to-Speech guide for detailed FlushEvent usage and advanced examples.

LLM Integration

WebSocket streaming is designed for integrating with LLM streaming responses. The TTS engine automatically buffers incoming text chunks and generates audio when it has enough context for natural speech:
from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

# Simulate streaming LLM response
def llm_stream():
    """Simulates text chunks from an LLM."""
    tokens = [
        "The ", "weather ", "today ", "is ", "sunny ",
        "with ", "clear ", "skies. ", "Perfect ",
        "for ", "outdoor ", "activities!"
    ]
    for token in tokens:
        yield token

# Stream to speech in real-time
audio_stream = client.tts.stream_websocket(
    llm_stream(),
    latency="balanced"
)
play(audio_stream)
The WebSocket connection automatically buffers incoming text and generates audio when it has accumulated enough context for natural-sounding speech. You don’t need to manually batch tokens unless you want to force generation at specific points using FlushEvent.

Next Steps