Overview

Real-time streaming lets you generate speech as you type or speak, perfect for chatbots, virtual assistants, and live applications.

When to Use Streaming

Perfect for:
  • Live chat applications
  • Virtual assistants
  • Interactive storytelling
  • Real-time translations
  • Gaming dialogue
Not ideal for:
  • Pre-recorded content
  • Batch processing
  • When perfect quality is critical

Getting Started

Web Playground

Try real-time streaming instantly:
  1. Visit fish.audio
  2. Enable “Streaming Mode”
  3. Start typing and hear voice generation in real-time

Using the SDK

Stream text as it’s being written:
from fish_audio_sdk import WebSocketSession, TTSRequest

# Initialize WebSocket session
session = WebSocketSession("your_api_key")

# Stream text word by word
def stream_text():
    text = "Hello, this is being generated in real time"
    for word in text.split():
        yield word + " "

# Generate speech as text streams
request = TTSRequest(
    text="",
    reference_id="your_voice_model_id",
    temperature=0.7,  # Controls variation
    top_p=0.7  # Controls diversity
)

with open("output.mp3", "wb") as f:
    for audio_chunk in session.tts(request, stream_text()):
        f.write(audio_chunk)

Configuration Options

Speed vs Quality

Latency Modes:
  • Normal: Best quality, ~500ms latency
  • Balanced: Good quality, ~300ms latency
request = TTSRequest(
    text="",
    reference_id="model_id",
    latency="balanced"  # For faster response
)

Voice Control

Temperature (0.1 - 1.0):
  • Lower: More consistent, predictable
  • Higher: More varied, expressive
Top-p (0.1 - 1.0):
  • Lower: More focused
  • Higher: More diverse

Real-time Applications

Chatbot Integration

Stream responses as they’re generated:
def chatbot_response(user_input):
    # Get AI response (streaming)
    ai_text = get_ai_response(user_input)
    
    # Convert to speech in real-time
    for text_chunk in ai_text:
        for audio_chunk in session.tts(request, text_chunk):
            play_audio(audio_chunk)

Live Translation

Translate and speak simultaneously:
def live_translate(source_audio):
    # Transcribe source audio
    text = transcribe(source_audio)
    
    # Translate text
    translated = translate(text, target_language)
    
    # Stream translated speech
    for chunk in stream_text(translated):
        generate_speech(chunk)

Best Practices

Text Buffering

Do:
  • Send complete words with spaces
  • Use punctuation for natural pauses
  • Buffer 5-10 words for smoothness
Don’t:
  • Send individual characters
  • Forget spaces between words
  • Send huge chunks at once

Connection Management

  1. Keep connections alive for multiple generations
  2. Handle disconnections gracefully
  3. Implement retry logic for reliability

Audio Playback

For smooth playback:
  • Buffer 2-3 audio chunks
  • Use cross-fading between chunks
  • Handle network delays gracefully

Common Use Cases

Interactive Story

def interactive_story():
    story_parts = [
        "Once upon a time,",
        "in a land far away,",
        "there lived a brave knight..."
    ]
    
    for part in story_parts:
        # Generate and play each part
        stream_speech(part)
        # Wait for user input
        user_choice = get_user_input()
        # Continue based on choice

Virtual Assistant

def virtual_assistant():
    while True:
        # Listen for wake word
        if detect_wake_word():
            # Start streaming response
            response = process_command()
            stream_speech(response)

Live Commentary

def live_commentary(event_stream):
    for event in event_stream:
        # Generate commentary
        commentary = generate_commentary(event)
        # Stream immediately
        stream_speech(commentary)

Troubleshooting

Audio Gaps

Problem: Gaps between audio chunks Solution:
  • Increase buffer size
  • Use balanced latency mode
  • Check network connection

Delayed Response

Problem: Long wait before audio starts Solution:
  • Use balanced latency mode
  • Send initial text immediately
  • Reduce chunk size

Choppy Playback

Problem: Audio cuts in and out Solution:
  • Buffer more chunks before playing
  • Check network stability
  • Use consistent chunk sizes

Advanced Features

Dynamic Voice Switching

Change voices mid-stream:
# Start with one voice
request1 = TTSRequest(reference_id="voice1")
stream_speech("Hello from voice one.", request1)

# Switch to another
request2 = TTSRequest(reference_id="voice2")
stream_speech("And now voice two!", request2)

Emotion Injection

Add emotions dynamically:
def emotional_speech(text, emotion):
    emotional_text = f"({emotion}) {text}"
    stream_speech(emotional_text)

Speed Control

Adjust speaking speed:
request = TTSRequest(
    text="",
    prosody={
        "speed": 1.5,  # 1.5x speed
        "volume": 0    # Normal volume
    }
)

Performance Tips

  1. Pre-load voices for instant start
  2. Use connection pooling for multiple streams
  3. Monitor latency and adjust settings
  4. Cache common phrases for instant playback

Get Support

Need help with streaming?