# Emotion Reference
Source: https://docs.fish.audio/api-reference/emotion-reference

Complete reference guide for all 64+ emotional expressions in Fish Audio

## Complete Emotion List

This reference guide provides a comprehensive list of all 64+ supported emotional expressions and voice styles available in Fish Audio's S1 TTS model. The latest S2-Pro model supports free-form natural language emotion tags.

<Tip>
  The `(parenthesis)` syntax on this page applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

## Basic Emotions (24)

| Emotion     | Tag             | Description             | Example Context             |
| ----------- | --------------- | ----------------------- | --------------------------- |
| Happy       | `(happy)`       | Cheerful, upbeat tone   | Good news, greetings        |
| Sad         | `(sad)`         | Melancholic, downcast   | Sympathy, bad news          |
| Angry       | `(angry)`       | Frustrated, aggressive  | Complaints, warnings        |
| Excited     | `(excited)`     | Energetic, enthusiastic | Announcements, celebrations |
| Calm        | `(calm)`        | Peaceful, relaxed       | Instructions, meditation    |
| Nervous     | `(nervous)`     | Anxious, uncertain      | Disclaimers, apologies      |
| Confident   | `(confident)`   | Assertive, self-assured | Presentations, sales        |
| Surprised   | `(surprised)`   | Shocked, amazed         | Reactions, discoveries      |
| Satisfied   | `(satisfied)`   | Content, pleased        | Confirmations, reviews      |
| Delighted   | `(delighted)`   | Very pleased, joyful    | Celebrations, compliments   |
| Scared      | `(scared)`      | Frightened, fearful     | Warnings, horror stories    |
| Worried     | `(worried)`     | Concerned, troubled     | Concerns, questions         |
| Upset       | `(upset)`       | Disturbed, distressed   | Complaints, problems        |
| Frustrated  | `(frustrated)`  | Annoyed, exasperated    | Technical issues, delays    |
| Depressed   | `(depressed)`   | Very sad, hopeless      | Serious topics              |
| Empathetic  | `(empathetic)`  | Understanding, caring   | Support, counseling         |
| Embarrassed | `(embarrassed)` | Ashamed, awkward        | Apologies, mistakes         |
| Disgusted   | `(disgusted)`   | Repelled, revolted      | Negative reviews            |
| Moved       | `(moved)`       | Emotionally touched     | Heartfelt moments           |
| Proud       | `(proud)`       | Accomplished, satisfied | Achievements, praise        |
| Relaxed     | `(relaxed)`     | At ease, casual         | Casual conversation         |
| Grateful    | `(grateful)`    | Thankful, appreciative  | Thanks, appreciation        |
| Curious     | `(curious)`     | Inquisitive, interested | Questions, exploration      |
| Sarcastic   | `(sarcastic)`   | Ironic, mocking         | Humor, criticism            |

## Advanced Emotions (25)

| Emotion       | Tag               | Description              | Example Context        |
| ------------- | ----------------- | ------------------------ | ---------------------- |
| Disdainful    | `(disdainful)`    | Contemptuous, scornful   | Criticism, rejection   |
| Unhappy       | `(unhappy)`       | Discontent, dissatisfied | Complaints, feedback   |
| Anxious       | `(anxious)`       | Very worried, uneasy     | Urgent matters         |
| Hysterical    | `(hysterical)`    | Uncontrollably emotional | Extreme reactions      |
| Indifferent   | `(indifferent)`   | Uncaring, neutral        | Neutral responses      |
| Uncertain     | `(uncertain)`     | Doubtful, unsure         | Speculation, questions |
| Doubtful      | `(doubtful)`      | Skeptical, questioning   | Disbelief, questioning |
| Confused      | `(confused)`      | Puzzled, perplexed       | Clarification requests |
| Disappointed  | `(disappointed)`  | Let down, dissatisfied   | Unmet expectations     |
| Regretful     | `(regretful)`     | Sorry, remorseful        | Apologies, mistakes    |
| Guilty        | `(guilty)`        | Culpable, responsible    | Confessions, apologies |
| Ashamed       | `(ashamed)`       | Deeply embarrassed       | Serious mistakes       |
| Jealous       | `(jealous)`       | Envious, resentful       | Comparisons            |
| Envious       | `(envious)`       | Wanting what others have | Admiration with desire |
| Hopeful       | `(hopeful)`       | Optimistic about future  | Future plans           |
| Optimistic    | `(optimistic)`    | Positive outlook         | Encouragement          |
| Pessimistic   | `(pessimistic)`   | Negative outlook         | Warnings, doubts       |
| Nostalgic     | `(nostalgic)`     | Longing for the past     | Memories, stories      |
| Lonely        | `(lonely)`        | Isolated, alone          | Emotional content      |
| Bored         | `(bored)`         | Uninterested, weary      | Disinterest            |
| Contemptuous  | `(contemptuous)`  | Showing contempt         | Strong criticism       |
| Sympathetic   | `(sympathetic)`   | Showing sympathy         | Condolences            |
| Compassionate | `(compassionate)` | Showing deep care        | Support, help          |
| Determined    | `(determined)`    | Resolved, decided        | Goals, commitments     |
| Resigned      | `(resigned)`      | Accepting defeat         | Giving up, acceptance  |

## Tone Markers (5)

| Tone       | Tag                 | Description          | When to Use                |
| ---------- | ------------------- | -------------------- | -------------------------- |
| Hurried    | `(in a hurry tone)` | Rushed, urgent       | Time-sensitive information |
| Shouting   | `(shouting)`        | Loud, calling out    | Getting attention          |
| Screaming  | `(screaming)`       | Very loud, panicked  | Emergencies, fear          |
| Whispering | `(whispering)`      | Very soft, secretive | Secrets, quiet scenes      |
| Soft       | `(soft tone)`       | Gentle, quiet        | Comfort, lullabies         |

## Audio Effects (10)

| Effect        | Tag               | Description                  | Suggested Text |
| ------------- | ----------------- | ---------------------------- | -------------- |
| Laughing      | `(laughing)`      | Full laughter                | Ha, ha, ha     |
| Chuckling     | `(chuckling)`     | Light laugh                  | Heh, heh       |
| Sobbing       | `(sobbing)`       | Crying heavily               | (optional)     |
| Crying Loudly | `(crying loudly)` | Intense crying               | (optional)     |
| Sighing       | `(sighing)`       | Exhale of relief/frustration | sigh           |
| Groaning      | `(groaning)`      | Sound of frustration         | ugh            |
| Panting       | `(panting)`       | Out of breath                | huff, puff     |
| Gasping       | `(gasping)`       | Sharp intake of breath       | gasp           |
| Yawning       | `(yawning)`       | Tired sound                  | yawn           |
| Snoring       | `(snoring)`       | Sleep sound                  | zzz            |

## Special Effects

| Effect              | Tag                     | Description              |
| ------------------- | ----------------------- | ------------------------ |
| Audience Laughter   | `(audience laughing)`   | Crowd laughing sound     |
| Background Laughter | `(background laughter)` | Ambient laughter         |
| Crowd Laughter      | `(crowd laughing)`      | Large group laughing     |
| Short Pause         | `(break)`               | Brief pause in speech    |
| Long Pause          | `(long-break)`          | Extended pause in speech |

## Usage Examples

### Single Emotion

```
(happy) What a beautiful day!
(sad) I'm sorry for your loss.
(excited) We won the championship!
```

### Combined Effects

```
(sad)(whispering) I'll miss you so much.
(angry)(shouting) Get out of here now!
(excited)(laughing) We did it! Ha ha ha!
```

### Natural Expressions

```
That's hilarious! Ha ha ha!  // Natural laughter
(sighing) Sigh... what a long day.
(panting) Huff... puff... almost there!
```

## Quick Selection Guide

### For Customer Service

* **Greetings**: `(friendly)`, `(cheerful)`, `(helpful)`
* **Understanding**: `(empathetic)`, `(concerned)`, `(sympathetic)`
* **Problem-solving**: `(confident)`, `(determined)`, `(professional)`
* **Apologies**: `(apologetic)`, `(regretful)`, `(sincere)`

### For Storytelling

* **Narration**: `(narrator)`, `(calm)`, `(mysterious)`
* **Character emotions**: Any from basic/advanced lists
* **Atmosphere**: `(whispering)`, `(dramatic)`, background effects
* **Action**: `(shouting)`, `(panting)`, `(struggling)`

### For Educational Content

* **Introduction**: `(enthusiastic)`, `(welcoming)`, `(friendly)`
* **Explanations**: `(calm)`, `(clear)`, `(patient)`
* **Questions**: `(curious)`, `(encouraging)`, `(thoughtful)`
* **Praise**: `(proud)`, `(delighted)`, `(impressed)`

### For Marketing

* **Excitement**: `(excited)`, `(enthusiastic)`, `(energetic)`
* **Trust**: `(confident)`, `(professional)`, `(sincere)`
* **Urgency**: `(urgent)`, `(in a hurry tone)`, `(important)`
* **Celebration**: `(celebrating)`, `(triumphant)`, `(joyful)`

## Emotion Categories

### Positive Emotions

`(happy)` `(excited)` `(delighted)` `(satisfied)` `(proud)` `(grateful)` `(confident)` `(relaxed)` `(hopeful)` `(optimistic)` `(moved)` `(compassionate)`

### Negative Emotions

`(sad)` `(angry)` `(frustrated)` `(depressed)` `(upset)` `(worried)` `(scared)` `(nervous)` `(disappointed)` `(regretful)` `(guilty)` `(ashamed)` `(lonely)` `(bored)`

### Neutral/Complex Emotions

`(calm)` `(curious)` `(surprised)` `(confused)` `(uncertain)` `(doubtful)` `(indifferent)` `(nostalgic)` `(sarcastic)` `(determined)` `(resigned)`

### Social/Interpersonal Emotions

`(empathetic)` `(sympathetic)` `(embarrassed)` `(jealous)` `(envious)` `(disdainful)` `(contemptuous)` `(disgusted)`

## Model Support Matrix

| Model             | Basic | Advanced | Tones | Effects | Intensity |
| ----------------- | ----- | -------- | ----- | ------- | --------- |
| Fish Speech 1.5   | ✓     | Limited  | ✓     | 6/10    | No        |
| Fish Audio S1     | ✓     | ✓        | ✓     | ✓       | ✓         |
| Fish Audio S2-Pro | ✓     | ✓        | ✓     | ✓       | ✓         |

## Tips for Natural Speech

1. **Start Simple**: Begin with basic emotions before combining
2. **Test Variations**: Different voices handle emotions differently
3. **Context Matters**: Match emotions to content logically
4. **Less is More**: Avoid overusing emotions in short text
5. **Natural Flow**: Space out emotional changes
6. **Sound Effects**: Include appropriate text after audio tags
7. **Preview Often**: Test how emotions sound with your voice

## Common Mistakes to Avoid

* ❌ Placing emotion tags mid-sentence in English
* ❌ Forgetting parentheses around tags
* ❌ Using unsupported custom tags
* ❌ Mixing conflicting emotions
* ❌ Overusing effects in short text
* ❌ Missing text for sound effects
* ❌ Using wrong language placement rules

## See Also

* [Emotion Control Guide](/developer-guide/core-features/emotions) - Technical implementation
* [Text-to-Speech Best Practices](/developer-guide/core-features/text-to-speech)
* [API Reference](/api-reference/introduction)
* [Try it live](https://fish.audio) - Test emotions in the playground


# Create Model
Source: https://docs.fish.audio/api-reference/endpoint/model/create-model

post /model
Create a new voice model

<Warning>
  Since this endpoint requires uploading file, it only accepts `multipart/form-data` and `application/msgpack`.
</Warning>


# Delete Model
Source: https://docs.fish.audio/api-reference/endpoint/model/delete-model

delete /model/{id}
Delete an existing model


# Get Model
Source: https://docs.fish.audio/api-reference/endpoint/model/get-model

get /model/{id}
Get details of a specific model


# List Models
Source: https://docs.fish.audio/api-reference/endpoint/model/list-models

get /model
Get a list of all models


# Update Model
Source: https://docs.fish.audio/api-reference/endpoint/model/update-model

patch /model/{id}
Update an existing model


# Speech to Text
Source: https://docs.fish.audio/api-reference/endpoint/openapi-v1/speech-to-text

post /v1/asr
Transcribe audio to text

<Warning>
  This BETA endpoint only accepts `application/form-data` and `application/msgpack`.
</Warning>


# Text to Speech
Source: https://docs.fish.audio/api-reference/endpoint/openapi-v1/text-to-speech

post /v1/tts
Convert text to speech

<Warning>
  This endpoint only accepts `application/json` and `application/msgpack`.

  For best results, upload reference audio using the [create model](/api-reference/endpoint/model/create-model) before using this one. This improves speech quality and reduces latency.

  To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the [instructions](/developer-guide/core-features/text-to-speech#direct-api-usage).
</Warning>

<Note>
  Audio formats supported:

  * WAV / PCM
    * Sample Rate: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
    * Default Sample Rate: 44.1kHz
    * 16-bit, mono
  * MP3
    * Sample Rate: 32kHz, 44.1kHz
    * Default Sample Rate: 44.1kHz
    * mono
    * Bitrate: 64kbps, 128kbps (default), 192kbps
  * Opus
    * Sample Rate: 48kHz
    * Default Sample Rate: 48kHz
    * mono
    * Bitrate: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
</Note>


# Get API Credit
Source: https://docs.fish.audio/api-reference/endpoint/wallet/get-api-credit

get /wallet/{user_id}/api-credit
Get current API credit balance


# Get User Premium
Source: https://docs.fish.audio/api-reference/endpoint/wallet/get-user-package

get /wallet/{user_id}/package
Get current user premium information


# WebSocket TTS Streaming
Source: https://docs.fish.audio/api-reference/endpoint/websocket/tts-live

Real-time text-to-speech streaming via WebSocket

<Note>
  The WebSocket TTS endpoint enables bidirectional streaming for low-latency text-to-speech generation with MessagePack serialization.
</Note>

<Note>
  The `request` payload inside `StartEvent` uses the same parameters as the HTTP [Text to Speech API](/api-reference/endpoint/openapi-v1/text-to-speech). For more detailed field guidance, model-specific behavior, and examples, see that page. In WebSocket mode, `request.text` is typically empty in `StartEvent`, and the text content is sent through subsequent `TextEvent` messages.
</Note>


# Introduction
Source: https://docs.fish.audio/api-reference/introduction

How to use the Fish Audio API

## Welcome

You can generate a new API key at [https://fish.audio/app/api-keys/](https://fish.audio/app/api-keys/).

## Quick Start

See our [Quick Start](/developer-guide/getting-started/quickstart) guide to generate audio in under 2 minutes.

## Create a Voice Clone

Use our [/model endpoint](/api-reference/endpoint/model/create-model) to create a voice clone model.

## Generate Speech

Use our [/v1/tts endpoint](/api-reference/endpoint/openapi-v1/text-to-speech) to generate speech.

## Real-time Streaming

Use our [Python SDK](/developer-guide/sdk-guide/python/websocket) or [JavaScript SDK](/developer-guide/sdk-guide/javascript/websocket) for real-time audio streaming with WebSocket.

## Rate Limits

You can find the rate limits for each endpoint in the [Rate Limits](/developer-guide/models-pricing/pricing-and-rate-limits) section.


# API Reference
Source: https://docs.fish.audio/api-reference/sdk/javascript/api-reference

Complete reference for Fish Audio JavaScript SDK

## Client

Import and initialize the client:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
```

## Text to Speech

### convert()

Generate speech from text.

```typescript theme={null}
const audio = await fishAudio.textToSpeech.convert({ text: "Hello" });
```

Parameters: `request` (TTSRequest), `model?` (Backends) <br />
Returns: `Promise<ReadableStream<Uint8Array>>`

### convertRealtime()

Realtime streaming TTS over WebSocket.

```typescript theme={null}
async function* textStream() { yield "Hello, "; yield "world!"; }
const conn = await fishAudio.textToSpeech.convertRealtime({ text: "" }, textStream());
```

Parameters: `request` (TTSRequest with `text: ""`), `textStream` (`AsyncIterable<string>`), `backend?` (Backends) <br />
Returns: `RealtimeConnection` (`EventEmitter`-like connection) emitting `RealtimeEvents`

## Speech to Text

### convert()

Transcribe audio to text.

```typescript theme={null}
const res = await fishAudio.speechToText.convert({ audio: myAudio });
console.log(res.text);
```

Parameters: `request` (STTRequest) <br />
Returns: `STTResponse`

## Voices

### search()

List/search available voice models.

```typescript theme={null}
const results = await fishAudio.voices.search();
```

Parameters: `request?` (ModelListRequest) <br />
Returns: `ModelListResponse`

### get()

Get model details.

```typescript theme={null}
const model = await fishAudio.voices.get("model_id");
```

Parameters: `voiceId` (string) <br />
Returns: `ModelEntity`

### ivc.create()

Create a new voice model from audio samples.

```typescript theme={null}
const res = await fishAudio.voices.ivc.create({ title, voices: [file], cover_image: file });
```

Parameters: `request` (ModelCreateRequest) <br />
Returns: `ModelEntity`

### update()

Update model metadata.

```typescript theme={null}
await fishAudio.voices.update("model_id", { title: "New Title" });
```

Parameters: `voiceId` (string), `request` (UpdateModelRequest) <br />
Returns: `UpdateVoiceResponse`

### delete()

Delete a model.

```typescript theme={null}
await fishAudio.voices.delete("model_id");
```

Parameters: `voiceId` (string) <br />
Returns: `DeleteVoiceResponse`

## User

### get\_api\_credit()

Check API credit balance.

```typescript theme={null}
await fishAudio.user.get_api_credit();
```

Returns: `APICreditResponse`

### get\_package()

Get subscription package details.

```typescript theme={null}
await fishAudio.user.get_package();
```

Returns: `PackageResponse`

## Request Classes

### TTSRequest

Text-to-speech parameters.

```typescript theme={null}
{
  text: "Hello",
  reference_id: "model_id",
  references: [ { audio: File, text: "sample" } ],
  format: "mp3",
  prosody: { speed: 1.0, volume: 0 },
}
```

Fields: `text`, `reference_id`, `references`, `format`, `mp3_bitrate`, `opus_bitrate`, `sample_rate`, `prosody`, `latency`, `chunk_length`, `normalize`, `temperature`, `top_p`

### STTRequest

Speech-to-text parameters.

```typescript theme={null}
{ audio: File, language?: "en", ignore_timestamps?: boolean }
```

Fields: `audio`, `language?`, `ignore_timestamps?`

### ReferenceAudio

Reference audio for voice cloning.

```typescript theme={null}
{ audio: File, text: "spoken text" }
```

Fields: `audio`, `text`

### Prosody

Speed and volume control.

```typescript theme={null}
{ speed: 1.2, volume: 5 }
```

Fields: `speed` (0.5–2.0), `volume` (-20 to 20)

### Backends

The backend model to use.

```typescript theme={null}
Backends = 's1' | 's2-pro';
```

## Response Classes

### STTResponse

Transcription result.

```typescript theme={null}
response.text      // Complete transcription
response.duration  // Duration in seconds
response.segments  // ASRSegment[]
```

### ASRSegment

Timestamped text segment.

Fields: `text` (string), `start` (number, seconds), `end` (number, seconds)

### ModelEntity

Voice model information.

Fields: `_id`, `title`, `description`, `visibility`, `created_at`, `updated_at`, `tags`

### ModelListResponse

List response for voices.

Fields: `items` (ModelEntity\[]), `total` (number)

### APICreditResponse

API credit information.

Fields: `_id` (string), `user_id` (string), `credit` (string), `created_at` (string), `updated_at` (string), `has_phone_sha256` (boolean), `has_free_credit?` (boolean)

### PackageResponse

Subscription package details.

Fields: `user_id` (string), `type` (string), `total` (number), `balance` (number), `created_at` (string), `updated_at` (string), `finished_at` (string)

## WebSocket Classes

### RealtimeEvents

Events emitted by `convertRealtime` connections.

| Event         | Meaning                |
| ------------- | ---------------------- |
| `OPEN`        | Connection established |
| `AUDIO_CHUNK` | Audio chunk received   |
| `ERROR`       | Error occurred         |
| `CLOSE`       | Connection closed      |

## Event Classes

### StartEvent

Stream start event.

Fields: `event` ("start"), `request` (TTSRequest)

### TextEvent

Text chunk event.

Fields: `event` ("text"), `text` (string)

### FlushEvent

Flush text chunks event.

Fields: `event` ("flush")

### CloseEvent

Stream close event.

Fields: `event` ("stop")

## Exceptions

### FishAudioError

Generic error with status code, body, rawResponse.

### FishAudioTimeoutError

Connection timeout error.


# Client
Source: https://docs.fish.audio/api-reference/sdk/python/client


<a />

# fishaudio.client

Main Fish Audio client classes.

<a />

## FishAudio Objects

```python theme={null}
class FishAudio()
```

Synchronous Fish Audio API client.

**Example**:

```python theme={null}
from fishaudio import FishAudio

client = FishAudio(api_key="your_api_key")

# Generate speech
audio = client.tts.convert(text="Hello world")
with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

# List voices
voices = client.voices.list(page_size=20)
print(f"Found {voices.total} voices")
```

<a />

#### \_\_init\_\_

```python theme={null}
def __init__(*,
             api_key: Optional[str] = None,
             base_url: str = "https://api.fish.audio",
             timeout: float = 240.0,
             httpx_client: Optional[httpx.Client] = None)
```

Initialize Fish Audio client.

**Arguments**:

* `api_key` - API key (can also use FISH\_API\_KEY env var)
* `base_url` - API base URL
* `timeout` - Request timeout in seconds
* `httpx_client` - Optional custom HTTP client

<a />

#### tts

```python theme={null}
@property
def tts() -> TTSClient
```

Access TTS (text-to-speech) operations.

<a />

#### asr

```python theme={null}
@property
def asr() -> ASRClient
```

Access ASR (speech-to-text) operations.

<a />

#### voices

```python theme={null}
@property
def voices() -> VoicesClient
```

Access voice management operations.

<a />

#### account

```python theme={null}
@property
def account() -> AccountClient
```

Access account/billing operations.

<a />

#### close

```python theme={null}
def close() -> None
```

Close the HTTP client.

<a />

## AsyncFishAudio Objects

```python theme={null}
class AsyncFishAudio()
```

Asynchronous Fish Audio API client.

**Example**:

```python theme={null}
from fishaudio import AsyncFishAudio

async def main():
    client = AsyncFishAudio(api_key="your_api_key")

    # Generate speech
    audio = client.tts.convert(text="Hello world")
    async with aiofiles.open("output.mp3", "wb") as f:
        async for chunk in audio:
            await f.write(chunk)

    # List voices
    voices = await client.voices.list(page_size=20)
    print(f"Found {voices.total} voices")

asyncio.run(main())
```

<a />

#### \_\_init\_\_

```python theme={null}
def __init__(*,
             api_key: Optional[str] = None,
             base_url: str = "https://api.fish.audio",
             timeout: float = 240.0,
             httpx_client: Optional[httpx.AsyncClient] = None)
```

Initialize async Fish Audio client.

**Arguments**:

* `api_key` - API key (can also use FISH\_API\_KEY env var)
* `base_url` - API base URL
* `timeout` - Request timeout in seconds
* `httpx_client` - Optional custom async HTTP client

<a />

#### tts

```python theme={null}
@property
def tts() -> AsyncTTSClient
```

Access TTS (text-to-speech) operations.

<a />

#### asr

```python theme={null}
@property
def asr() -> AsyncASRClient
```

Access ASR (speech-to-text) operations.

<a />

#### voices

```python theme={null}
@property
def voices() -> AsyncVoicesClient
```

Access voice management operations.

<a />

#### account

```python theme={null}
@property
def account() -> AsyncAccountClient
```

Access account/billing operations.

<a />

#### close

```python theme={null}
async def close() -> None
```

Close the HTTP client.


# Core
Source: https://docs.fish.audio/api-reference/sdk/python/core


<a />

# fishaudio.core.client\_wrapper

HTTP client wrapper for managing requests and authentication.

<a />

## BaseClientWrapper Objects

```python theme={null}
class BaseClientWrapper()
```

Base wrapper with shared logic for sync/async clients.

<a />

#### get\_headers

```python theme={null}
def get_headers(
        additional_headers: Optional[dict[str, str]] = None) -> dict[str, str]
```

Build headers including authentication and user agent.

<a />

## ClientWrapper Objects

```python theme={null}
class ClientWrapper(BaseClientWrapper)
```

Wrapper for httpx.Client that handles authentication and error handling.

<a />

#### request

```python theme={null}
def request(method: str,
            path: str,
            *,
            request_options: Optional[RequestOptions] = None,
            **kwargs: Any) -> httpx.Response
```

Make an HTTP request with error handling.

**Arguments**:

* `method` - HTTP method (GET, POST, etc.)
* `path` - API endpoint path
* `request_options` - Optional request-level overrides
* `**kwargs` - Additional arguments to pass to httpx.request

**Returns**:

httpx.Response object

**Raises**:

* `APIError` - On non-2xx responses

<a />

#### client

```python theme={null}
@property
def client() -> httpx.Client
```

Get underlying httpx.Client for advanced usage (e.g., WebSockets).

<a />

#### close

```python theme={null}
def close() -> None
```

Close the HTTP client.

<a />

## AsyncClientWrapper Objects

```python theme={null}
class AsyncClientWrapper(BaseClientWrapper)
```

Wrapper for httpx.AsyncClient that handles authentication and error handling.

<a />

#### request

```python theme={null}
async def request(method: str,
                  path: str,
                  *,
                  request_options: Optional[RequestOptions] = None,
                  **kwargs: Any) -> httpx.Response
```

Make an async HTTP request with error handling.

**Arguments**:

* `method` - HTTP method (GET, POST, etc.)
* `path` - API endpoint path
* `request_options` - Optional request-level overrides
* `**kwargs` - Additional arguments to pass to httpx.request

**Returns**:

httpx.Response object

**Raises**:

* `APIError` - On non-2xx responses

<a />

#### client

```python theme={null}
@property
def client() -> httpx.AsyncClient
```

Get underlying httpx.AsyncClient for advanced usage (e.g., WebSockets).

<a />

#### close

```python theme={null}
async def close() -> None
```

Close the HTTP client.

<a />

# fishaudio.core.request\_options

Request-level options for API calls.

<a />

## RequestOptions Objects

```python theme={null}
class RequestOptions()
```

Options that can be provided on a per-request basis to override client defaults.

**Attributes**:

* `timeout` - Override the client's default timeout (in seconds)
* `max_retries` - Override the client's default max retries
* `additional_headers` - Additional headers to include in the request
* `additional_query_params` - Additional query parameters to include

<a />

#### get\_timeout

```python theme={null}
def get_timeout() -> Optional[httpx.Timeout]
```

Convert timeout to httpx.Timeout if set.

<a />

# fishaudio.core.iterators

Audio stream wrappers with collection utilities.

<a />

## AudioStream Objects

```python theme={null}
class AudioStream()
```

Wrapper for sync audio byte streams with collection utilities.

This class wraps an iterator of audio bytes and provides a convenient
`.collect()` method to gather all chunks into a single bytes object.

**Examples**:

```python theme={null}
from fishaudio import FishAudio

client = FishAudio(api_key="...")

# Collect all audio at once
audio = client.tts.stream(text="Hello!").collect()

# Or stream chunks manually
for chunk in client.tts.stream(text="Hello!"):
    process_chunk(chunk)
```

<a />

#### \_\_init\_\_

```python theme={null}
def __init__(iterator: Iterator[bytes])
```

Initialize the audio iterator wrapper.

**Arguments**:

* `iterator` - The underlying iterator of audio bytes

<a />

#### \_\_iter\_\_

```python theme={null}
def __iter__() -> Iterator[bytes]
```

Allow direct iteration over audio chunks.

<a />

#### collect

```python theme={null}
def collect() -> bytes
```

Collect all audio chunks into a single bytes object.

This consumes the iterator and returns all audio data as bytes.
After calling this method, the iterator cannot be used again.

**Returns**:

Complete audio data as bytes

**Examples**:

```python theme={null}
audio = client.tts.stream(text="Hello!").collect()
with open("output.mp3", "wb") as f:
    f.write(audio)
```

<a />

## AsyncAudioStream Objects

```python theme={null}
class AsyncAudioStream()
```

Wrapper for async audio byte streams with collection utilities.

This class wraps an async iterator of audio bytes and provides a convenient
`.collect()` method to gather all chunks into a single bytes object.

**Examples**:

```python theme={null}
from fishaudio import AsyncFishAudio

client = AsyncFishAudio(api_key="...")

# Collect all audio at once
stream = await client.tts.stream(text="Hello!")
audio = await stream.collect()

# Or stream chunks manually
async for chunk in await client.tts.stream(text="Hello!"):
    await process_chunk(chunk)
```

<a />

#### \_\_init\_\_

```python theme={null}
def __init__(async_iterator: AsyncIterator[bytes])
```

Initialize the async audio iterator wrapper.

**Arguments**:

* `async_iterator` - The underlying async iterator of audio bytes

<a />

#### \_\_aiter\_\_

```python theme={null}
def __aiter__() -> AsyncIterator[bytes]
```

Allow direct async iteration over audio chunks.

<a />

#### collect

```python theme={null}
async def collect() -> bytes
```

Collect all audio chunks into a single bytes object.

This consumes the async iterator and returns all audio data as bytes.
After calling this method, the iterator cannot be used again.

**Returns**:

Complete audio data as bytes

**Examples**:

```python theme={null}
stream = await client.tts.stream(text="Hello!")
audio = await stream.collect()
with open("output.mp3", "wb") as f:
    f.write(audio)
```

<a />

# fishaudio.core.websocket\_options

WebSocket-level options for WebSocket connections.

<a />

## WebSocketOptions Objects

```python theme={null}
class WebSocketOptions()
```

Options for configuring WebSocket connections.

These options are passed directly to httpx\_ws's connect\_ws/aconnect\_ws functions.
For complete documentation, see [https://frankie567.github.io/httpx-ws/reference/httpx\_ws/](https://frankie567.github.io/httpx-ws/reference/httpx_ws/)

**Attributes**:

* `keepalive_ping_timeout_seconds` - Maximum delay the client will wait for an answer
  to its Ping event. If the delay is exceeded, WebSocketNetworkError will be
  raised and the connection closed. Default: 20 seconds.
* `keepalive_ping_interval_seconds` - Interval at which the client will automatically
  send a Ping event to keep the connection alive. Set to None to disable this
  mechanism. Default: 20 seconds.
* `max_message_size_bytes` - Message size in bytes to receive from the server.
* `Default` - 65536 bytes (64 KiB).
* `queue_size` - Size of the queue where received messages will be held until they
  are consumed. If the queue is full, the client will stop receiving messages
  from the server until the queue has room available. Default: 512.

**Notes**:

Parameter descriptions adapted from httpx\_ws documentation.

<a />

#### to\_httpx\_ws\_kwargs

```python theme={null}
def to_httpx_ws_kwargs() -> dict[str, Any]
```

Convert to kwargs dict for httpx\_ws aconnect\_ws/connect\_ws.

<a />

# fishaudio.core.omit

OMIT sentinel for distinguishing None from not-provided parameters.


# Exceptions
Source: https://docs.fish.audio/api-reference/sdk/python/exceptions


<a />

# fishaudio.exceptions

Custom exceptions for the Fish Audio SDK.

<a />

## FishAudioError Objects

```python theme={null}
class FishAudioError(Exception)
```

Base exception for all Fish Audio SDK errors.

<a />

## APIError Objects

```python theme={null}
class APIError(FishAudioError)
```

Raised when the API returns an error response.

<a />

## AuthenticationError Objects

```python theme={null}
class AuthenticationError(APIError)
```

Raised when authentication fails (401).

<a />

## PermissionError Objects

```python theme={null}
class PermissionError(APIError)
```

Raised when permission is denied (403).

<a />

## NotFoundError Objects

```python theme={null}
class NotFoundError(APIError)
```

Raised when a resource is not found (404).

<a />

## RateLimitError Objects

```python theme={null}
class RateLimitError(APIError)
```

Raised when rate limit is exceeded (429).

<a />

## ServerError Objects

```python theme={null}
class ServerError(APIError)
```

Raised when the server encounters an error (5xx).

<a />

## WebSocketError Objects

```python theme={null}
class WebSocketError(FishAudioError)
```

Raised when WebSocket connection or streaming fails.

<a />

## ValidationError Objects

```python theme={null}
class ValidationError(FishAudioError)
```

Raised when request validation fails.

<a />

## DependencyError Objects

```python theme={null}
class DependencyError(FishAudioError)
```

Raised when a required dependency is missing.


# Overview
Source: https://docs.fish.audio/api-reference/sdk/python/overview

Fish Audio Python SDK for text-to-speech and voice cloning

![python.png](https://raw.githubusercontent.com/fishaudio/fish-audio-python/refs/heads/main/.github/assets/python.png)

# Fish Audio Python SDK

[![PyPI version](https://img.shields.io/pypi/v/fish-audio-sdk.svg)](https://badge.fury.io/py/fish-audio-sdk)
[![Python Version](https://img.shields.io/badge/python-3.9+-blue)](https://pypi.org/project/fish-audio-sdk/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/fish-audio-sdk)](https://pypi.org/project/fish-audio-sdk/)
[![codecov](https://img.shields.io/codecov/c/github/fishaudio/fish-audio-python)](https://codecov.io/gh/fishaudio/fish-audio-python)
[![License](https://img.shields.io/github/license/fishaudio/fish-audio-python)](https://github.com/fishaudio/fish-audio-python/blob/main/LICENSE)

The official Python library for the Fish Audio API

**Documentation:** [Python SDK Guide](https://docs.fish.audio/developer-guide/sdk-guide/python/) | [API Reference](https://docs.fish.audio/api-reference/sdk/python/)

> \[!IMPORTANT]
>
> ## Changes to PyPI Versioning
>
> For existing users on Fish Audio Python SDK, please note that the starting version is now `1.0.0`. The last version before this was `2025.6.3`. You may need to adjust your version constraints accordingly.
>
> The original API in the `fish_audio_sdk` package has NOT been removed, but you will not receive any updates if you continue using the old versioning scheme.
>
> The simplest fix is to update your dependency to `fish-audio-sdk>=1.0.0` to continue receiving updates, or by pinning to a specific version like `fish-audio-sdk==1.0.0` when installing via your package manager. There are no changes to the API itself in this transition.
>
> If you're using the legacy `fish_audio_sdk` and would like to switch to the newer, more robust `fishaudio` package, see the [migration guide](https://docs.fish.audio/archive/python-sdk-legacy/migration-guide) to upgrade.

## Installation

```bash theme={null}
pip install fish-audio-sdk

# With audio playback utilities
pip install fish-audio-sdk[utils]
```

## Authentication

Get your API key from [fish.audio/app/api-keys](https://fish.audio/app/api-keys):

```bash theme={null}
export FISH_API_KEY=your_api_key_here
```

Or provide directly:

```python theme={null}
from fishaudio import FishAudio

client = FishAudio(api_key="your_api_key")
```

## Quick Start

**Synchronous:**

```python theme={null}
from fishaudio import FishAudio
from fishaudio.utils import play, save

client = FishAudio()

# Generate audio
audio = client.tts.convert(text="Hello, world!")

# Play or save
play(audio)
save(audio, "output.mp3")
```

**Asynchronous:**

```python theme={null}
import asyncio
from fishaudio import AsyncFishAudio
from fishaudio.utils import play, save

async def main():
    client = AsyncFishAudio()
    audio = await client.tts.convert(text="Hello, world!")
    play(audio)
    save(audio, "output.mp3")

asyncio.run(main())
```

## Core Features

### Text-to-Speech

**With custom voice:**

```python theme={null}
# Use a specific voice by ID
audio = client.tts.convert(
    text="Custom voice",
    reference_id="802e3bc2b27e49c2995d23ef70e6ac89"
)
```

**With speed control:**

```python theme={null}
audio = client.tts.convert(
    text="Speaking faster!",
    speed=1.5  # 1.5x speed
)
```

**Reusable configuration:**

```python theme={null}
from fishaudio.types import TTSConfig, Prosody

config = TTSConfig(
    prosody=Prosody(speed=1.2, volume=-5),
    reference_id="933563129e564b19a115bedd57b7406a",
    format="wav",
    latency="balanced"
)

# Reuse across generations
audio1 = client.tts.convert(text="First message", config=config)
audio2 = client.tts.convert(text="Second message", config=config)
```

**Chunk-by-chunk processing:**

```python theme={null}
# Stream and process chunks as they arrive
for chunk in client.tts.stream(text="Long content..."):
    send_to_websocket(chunk)

# Or collect all chunks
audio = client.tts.stream(text="Hello!").collect()
```

[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/text-to-speech)

### Speech-to-Text

```python theme={null}
# Transcribe audio
with open("audio.wav", "rb") as f:
    result = client.asr.transcribe(audio=f.read(), language="en")

print(result.text)

# Access timestamped segments
for segment in result.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")
```

[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/speech-to-text)

### Real-time Streaming

Stream dynamically generated text for conversational AI and live applications:

**Synchronous:**

```python theme={null}
def text_chunks():
    yield "Hello, "
    yield "this is "
    yield "streaming!"

audio_stream = client.tts.stream_websocket(text_chunks(), latency="balanced")
play(audio_stream)
```

**Asynchronous:**

```python theme={null}
async def text_chunks():
    yield "Hello, "
    yield "this is "
    yield "streaming!"

audio_stream = await client.tts.stream_websocket(text_chunks(), latency="balanced")
play(audio_stream)
```

[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/websocket)

### Voice Cloning

**Instant cloning:**

```python theme={null}
from fishaudio.types import ReferenceAudio

# Clone voice on-the-fly
with open("reference.wav", "rb") as f:
    audio = client.tts.convert(
        text="Cloned voice speaking",
        references=[ReferenceAudio(
            audio=f.read(),
            text="Text spoken in reference"
        )]
    )
```

**Persistent voice models:**

```python theme={null}
# Create voice model for reuse
with open("voice_sample.wav", "rb") as f:
    voice = client.voices.create(
        title="My Voice",
        voices=[f.read()],
        description="Custom voice clone"
    )

# Use the created model
audio = client.tts.convert(
    text="Using my saved voice",
    reference_id=voice.id
)
```

[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/voice-cloning)

## Resource Clients

| Resource         | Description        | Key Methods                                           |
| ---------------- | ------------------ | ----------------------------------------------------- |
| `client.tts`     | Text-to-speech     | `convert()`, `stream()`, `stream_websocket()`         |
| `client.asr`     | Speech recognition | `transcribe()`                                        |
| `client.voices`  | Voice management   | `list()`, `get()`, `create()`, `update()`, `delete()` |
| `client.account` | Account info       | `get_credits()`, `get_package()`                      |

## Error Handling

```python theme={null}
from fishaudio.exceptions import (
    AuthenticationError,
    RateLimitError,
    ValidationError,
    FishAudioError
)

try:
    audio = client.tts.convert(text="Hello!")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded")
except ValidationError as e:
    print(f"Invalid request: {e}")
except FishAudioError as e:
    print(f"API error: {e}")
```

## Resources

* **Documentation:** [SDK Guide](https://docs.fish.audio/developer-guide/sdk-guide/python/) | [API Reference](https://docs.fish.audio/api-reference/sdk/python/)
* **Package:** [PyPI](https://pypi.org/project/fish-audio-sdk/) | [GitHub](https://github.com/fishaudio/fish-audio-python)
* **Legacy SDK:** [Documentation](https://docs.fish.audio/archive/python-sdk-legacy) | [Migration Guide](https://docs.fish.audio/archive/python-sdk-legacy/migration-guide)

## License

This project is licensed under the Apache-2.0 License - see the [LICENSE](LICENSE) file for details.


# Resources
Source: https://docs.fish.audio/api-reference/sdk/python/resources


<a />

# fishaudio.resources.voices

Voice management namespace client.

<a />

## VoicesClient Objects

```python theme={null}
class VoicesClient()
```

Synchronous voice management operations.

<a />

#### list

```python theme={null}
def list(
    *,
    page_size: int = 10,
    page_number: int = 1,
    title: Optional[str] = OMIT,
    tags: Optional[Union[list[str], str]] = OMIT,
    self_only: bool = False,
    author_id: Optional[str] = OMIT,
    language: Optional[Union[list[str], str]] = OMIT,
    title_language: Optional[Union[list[str], str]] = OMIT,
    sort_by: str = "task_count",
    request_options: Optional[RequestOptions] = None
) -> PaginatedResponse[Voice]
```

List available voices/models.

**Arguments**:

* `page_size` - Number of results per page
* `page_number` - Page number (1-indexed)
* `title` - Filter by title
* `tags` - Filter by tags (single tag or list)
* `self_only` - Only return user's own voices
* `author_id` - Filter by author ID
* `language` - Filter by language(s)
* `title_language` - Filter by title language(s)
* `sort_by` - Sort field ("task\_count" or "created\_at")
* `request_options` - Request-level overrides

**Returns**:

Paginated response with total count and voice items

**Example**:

```python theme={null}
client = FishAudio(api_key="...")

# List all voices
voices = client.voices.list(page_size=20)
print(f"Total: {voices.total}")
for voice in voices.items:
    print(f"{voice.title}: {voice.id}")

# Filter by tags
tagged = client.voices.list(tags=["male", "english"])
```

<a />

#### get

```python theme={null}
def get(voice_id: str,
        *,
        request_options: Optional[RequestOptions] = None) -> Voice
```

Get voice by ID.

**Arguments**:

* `voice_id` - Voice model ID
* `request_options` - Request-level overrides

**Returns**:

Voice model details

**Example**:

```python theme={null}
client = FishAudio(api_key="...")
voice = client.voices.get("voice_id_here")
print(voice.title, voice.description)
```

<a />

#### create

```python theme={null}
def create(*,
           title: str,
           voices: builtins.list[bytes],
           description: Optional[str] = OMIT,
           texts: Optional[builtins.list[str]] = OMIT,
           tags: Optional[builtins.list[str]] = OMIT,
           cover_image: Optional[bytes] = OMIT,
           visibility: Visibility = "private",
           train_mode: str = "fast",
           enhance_audio_quality: bool = True,
           request_options: Optional[RequestOptions] = None) -> Voice
```

Create/clone a new voice.

**Arguments**:

* `title` - Voice model name
* `voices` - List of audio file bytes for training
* `description` - Voice description
* `texts` - Transcripts for voice samples
* `tags` - Tags for categorization
* `cover_image` - Cover image bytes
* `visibility` - Visibility setting (public, unlist, private)
* `train_mode` - Training mode (currently only "fast" supported)
* `enhance_audio_quality` - Whether to enhance audio quality
* `request_options` - Request-level overrides

**Returns**:

Created voice model

**Example**:

```python theme={null}
client = FishAudio(api_key="...")

with open("voice1.wav", "rb") as f1, open("voice2.wav", "rb") as f2:
    voice = client.voices.create(
        title="My Voice",
        voices=[f1.read(), f2.read()],
        description="Custom voice clone",
        tags=["custom", "english"]
    )
print(f"Created: {voice.id}")
```

<a />

#### update

```python theme={null}
def update(voice_id: str,
           *,
           title: Optional[str] = OMIT,
           description: Optional[str] = OMIT,
           cover_image: Optional[bytes] = OMIT,
           visibility: Optional[Visibility] = OMIT,
           tags: Optional[builtins.list[str]] = OMIT,
           request_options: Optional[RequestOptions] = None) -> None
```

Update voice metadata.

**Arguments**:

* `voice_id` - Voice model ID
* `title` - New title
* `description` - New description
* `cover_image` - New cover image bytes
* `visibility` - New visibility setting
* `tags` - New tags
* `request_options` - Request-level overrides

**Example**:

```python theme={null}
client = FishAudio(api_key="...")
client.voices.update(
    "voice_id_here",
    title="Updated Title",
    visibility="public"
)
```

<a />

#### delete

```python theme={null}
def delete(voice_id: str,
           *,
           request_options: Optional[RequestOptions] = None) -> None
```

Delete a voice.

**Arguments**:

* `voice_id` - Voice model ID
* `request_options` - Request-level overrides

**Example**:

```python theme={null}
client = FishAudio(api_key="...")
client.voices.delete("voice_id_here")
```

<a />

## AsyncVoicesClient Objects

```python theme={null}
class AsyncVoicesClient()
```

Asynchronous voice management operations.

<a />

#### list

```python theme={null}
async def list(
    *,
    page_size: int = 10,
    page_number: int = 1,
    title: Optional[str] = OMIT,
    tags: Optional[Union[list[str], str]] = OMIT,
    self_only: bool = False,
    author_id: Optional[str] = OMIT,
    language: Optional[Union[list[str], str]] = OMIT,
    title_language: Optional[Union[list[str], str]] = OMIT,
    sort_by: str = "task_count",
    request_options: Optional[RequestOptions] = None
) -> PaginatedResponse[Voice]
```

List available voices/models (async). See sync version for details.

<a />

#### get

```python theme={null}
async def get(voice_id: str,
              *,
              request_options: Optional[RequestOptions] = None) -> Voice
```

Get voice by ID (async). See sync version for details.

<a />

#### create

```python theme={null}
async def create(*,
                 title: str,
                 voices: builtins.list[bytes],
                 description: Optional[str] = OMIT,
                 texts: Optional[builtins.list[str]] = OMIT,
                 tags: Optional[builtins.list[str]] = OMIT,
                 cover_image: Optional[bytes] = OMIT,
                 visibility: Visibility = "private",
                 train_mode: str = "fast",
                 enhance_audio_quality: bool = True,
                 request_options: Optional[RequestOptions] = None) -> Voice
```

Create/clone a new voice (async). See sync version for details.

<a />

#### update

```python theme={null}
async def update(voice_id: str,
                 *,
                 title: Optional[str] = OMIT,
                 description: Optional[str] = OMIT,
                 cover_image: Optional[bytes] = OMIT,
                 visibility: Optional[Visibility] = OMIT,
                 tags: Optional[builtins.list[str]] = OMIT,
                 request_options: Optional[RequestOptions] = None) -> None
```

Update voice metadata (async). See sync version for details.

<a />

#### delete

```python theme={null}
async def delete(voice_id: str,
                 *,
                 request_options: Optional[RequestOptions] = None) -> None
```

Delete a voice (async). See sync version for details.

<a />

# fishaudio.resources.account

Account namespace client for billing and credits.

<a />

## AccountClient Objects

```python theme={null}
class AccountClient()
```

Synchronous account operations.

<a />

#### get\_credits

```python theme={null}
def get_credits(*,
                check_free_credit: Optional[bool] = OMIT,
                request_options: Optional[RequestOptions] = None) -> Credits
```

Get API credit balance.

**Arguments**:

* `check_free_credit` - Whether to check free credit availability
* `request_options` - Request-level overrides

**Returns**:

Credits information

**Example**:

```python theme={null}
client = FishAudio(api_key="...")
credits = client.account.get_credits()
print(f"Available credits: {float(credits.credit)}")

# Check free credit availability
credits = client.account.get_credits(check_free_credit=True)
if credits.has_free_credit:
    print("Free credits available!")
```

<a />

#### get\_package

```python theme={null}
def get_package(*,
                request_options: Optional[RequestOptions] = None) -> Package
```

Get package information.

**Arguments**:

* `request_options` - Request-level overrides

**Returns**:

Package information

**Example**:

```python theme={null}
client = FishAudio(api_key="...")
package = client.account.get_package()
print(f"Balance: {package.balance}/{package.total}")
```

<a />

## AsyncAccountClient Objects

```python theme={null}
class AsyncAccountClient()
```

Asynchronous account operations.

<a />

#### get\_credits

```python theme={null}
async def get_credits(
        *,
        check_free_credit: Optional[bool] = OMIT,
        request_options: Optional[RequestOptions] = None) -> Credits
```

Get API credit balance (async).

**Arguments**:

* `check_free_credit` - Whether to check free credit availability
* `request_options` - Request-level overrides

**Returns**:

Credits information

**Example**:

```python theme={null}
client = AsyncFishAudio(api_key="...")
credits = await client.account.get_credits()
print(f"Available credits: {float(credits.credit)}")

# Check free credit availability
credits = await client.account.get_credits(check_free_credit=True)
if credits.has_free_credit:
    print("Free credits available!")
```

<a />

#### get\_package

```python theme={null}
async def get_package(*,
                      request_options: Optional[RequestOptions] = None
                      ) -> Package
```

Get package information (async).

**Arguments**:

* `request_options` - Request-level overrides

**Returns**:

Package information

**Example**:

```python theme={null}
client = AsyncFishAudio(api_key="...")
package = await client.account.get_package()
print(f"Balance: {package.balance}/{package.total}")
```

<a />

# fishaudio.resources.tts

TTS (Text-to-Speech) namespace client.

<a />

## TTSClient Objects

```python theme={null}
class TTSClient()
```

Synchronous TTS operations.

<a />

#### stream

```python theme={null}
def stream(*,
           text: str,
           reference_id: Optional[str] = None,
           references: Optional[list[ReferenceAudio]] = None,
           format: Optional[AudioFormat] = None,
           latency: Optional[LatencyMode] = None,
           speed: Optional[float] = None,
           config: TTSConfig = TTSConfig(),
           model: Model = "s2-pro",
           request_options: Optional[RequestOptions] = None) -> AudioStream
```

Stream text-to-speech audio chunks.

**Arguments**:

* `text` - Text to synthesize
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `request_options` - Request-level overrides

**Returns**:

AudioStream object that can be iterated for audio chunks

**Example**:

```python theme={null}
from fishaudio import FishAudio

client = FishAudio(api_key="...")

# Stream and process chunks
for chunk in client.tts.stream(text="Hello world"):
    process_audio_chunk(chunk)

# Or collect all at once
audio = client.tts.stream(text="Hello world").collect()
```

<a />

#### convert

```python theme={null}
def convert(*,
            text: str,
            reference_id: Optional[str] = None,
            references: Optional[list[ReferenceAudio]] = None,
            format: Optional[AudioFormat] = None,
            latency: Optional[LatencyMode] = None,
            speed: Optional[float] = None,
            config: TTSConfig = TTSConfig(),
            model: Model = "s2-pro",
            request_options: Optional[RequestOptions] = None) -> bytes
```

Convert text to speech and return complete audio as bytes.

This is a convenience method that streams all audio chunks and combines them.
For chunk-by-chunk processing, use stream() instead.

**Arguments**:

* `text` - Text to synthesize
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `request_options` - Request-level overrides

**Returns**:

Complete audio as bytes

**Example**:

```python theme={null}
from fishaudio import FishAudio
from fishaudio.utils import play, save

client = FishAudio(api_key="...")

# Get complete audio
audio = client.tts.convert(text="Hello world")

# Play it
play(audio)

# Or save it
save(audio, "output.mp3")
```

<a />

#### stream\_websocket

```python theme={null}
def stream_websocket(
        text_stream: Iterable[Union[str, TextEvent, FlushEvent]],
        *,
        reference_id: Optional[str] = None,
        references: Optional[list[ReferenceAudio]] = None,
        format: Optional[AudioFormat] = None,
        latency: Optional[LatencyMode] = None,
        speed: Optional[float] = None,
        config: TTSConfig = TTSConfig(),
        model: Model = "s2-pro",
        max_workers: int = 10,
        ws_options: Optional[WebSocketOptions] = None) -> Iterator[bytes]
```

Stream text and receive audio in real-time via WebSocket.

Perfect for conversational AI, live captioning, and streaming applications.

**Arguments**:

* `text_stream` - Iterator of text chunks to stream
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `max_workers` - ThreadPoolExecutor workers for concurrent sender
* `ws_options` - WebSocket connection options for configuring timeouts, message size limits, etc.
  Useful for long-running generations that may exceed default timeout values.
  See WebSocketOptions class for available parameters.

**Returns**:

Iterator of audio bytes

**Example**:

```python theme={null}
from fishaudio import FishAudio, TTSConfig, ReferenceAudio, WebSocketOptions

client = FishAudio(api_key="...")

def text_generator():
    yield "Hello, "
    yield "this is "
    yield "streaming text!"

# Simple usage with defaults
with open("output.mp3", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(text_generator()):
        f.write(audio_chunk)

# With format and speed parameters
with open("output.wav", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        format="wav",
        speed=1.3
    ):
        f.write(audio_chunk)

# With reference_id parameter
with open("output.mp3", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(text_generator(), reference_id="your_model_id"):
        f.write(audio_chunk)

# With references parameter
with open("output.mp3", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        references=[ReferenceAudio(audio=audio_bytes, text="sample")]
    ):
        f.write(audio_chunk)

# With WebSocket options for long-running generations
# Useful if you're generating very long responses that may take >20 seconds
ws_options = WebSocketOptions(keepalive_ping_timeout_seconds=60.0)
with open("output.mp3", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        ws_options=ws_options
    ):
        f.write(audio_chunk)

# Parameters override config values
config = TTSConfig(format="mp3", latency="balanced")
with open("output.wav", "wb") as f:
    for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        format="wav",  # Parameter wins
        config=config
    ):
        f.write(audio_chunk)
```

<a />

## AsyncTTSClient Objects

```python theme={null}
class AsyncTTSClient()
```

Asynchronous TTS operations.

<a />

#### stream

```python theme={null}
async def stream(
        *,
        text: str,
        reference_id: Optional[str] = None,
        references: Optional[list[ReferenceAudio]] = None,
        format: Optional[AudioFormat] = None,
        latency: Optional[LatencyMode] = None,
        speed: Optional[float] = None,
        config: TTSConfig = TTSConfig(),
        model: Model = "s2-pro",
        request_options: Optional[RequestOptions] = None) -> AsyncAudioStream
```

Stream text-to-speech audio chunks (async).

**Arguments**:

* `text` - Text to synthesize
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `request_options` - Request-level overrides

**Returns**:

AsyncAudioStream object that can be iterated for audio chunks

**Example**:

```python theme={null}
from fishaudio import AsyncFishAudio

client = AsyncFishAudio(api_key="...")

# Stream and process chunks
async for chunk in await client.tts.stream(text="Hello world"):
    await process_audio_chunk(chunk)

# Or collect all at once
stream = await client.tts.stream(text="Hello world")
audio = await stream.collect()
```

<a />

#### convert

```python theme={null}
async def convert(*,
                  text: str,
                  reference_id: Optional[str] = None,
                  references: Optional[list[ReferenceAudio]] = None,
                  format: Optional[AudioFormat] = None,
                  latency: Optional[LatencyMode] = None,
                  speed: Optional[float] = None,
                  config: TTSConfig = TTSConfig(),
                  model: Model = "s2-pro",
                  request_options: Optional[RequestOptions] = None) -> bytes
```

Convert text to speech and return complete audio as bytes (async).

This is a convenience method that streams all audio chunks and combines them.
For chunk-by-chunk processing, use stream() instead.

**Arguments**:

* `text` - Text to synthesize
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `request_options` - Request-level overrides

**Returns**:

Complete audio as bytes

**Example**:

```python theme={null}
from fishaudio import AsyncFishAudio
from fishaudio.utils import play, save

client = AsyncFishAudio(api_key="...")

# Get complete audio
audio = await client.tts.convert(text="Hello world")

# Play it
play(audio)

# Or save it
save(audio, "output.mp3")
```

<a />

#### stream\_websocket

```python theme={null}
async def stream_websocket(text_stream: AsyncIterable[Union[str, TextEvent,
                                                            FlushEvent]],
                           *,
                           reference_id: Optional[str] = None,
                           references: Optional[list[ReferenceAudio]] = None,
                           format: Optional[AudioFormat] = None,
                           latency: Optional[LatencyMode] = None,
                           speed: Optional[float] = None,
                           config: TTSConfig = TTSConfig(),
                           model: Model = "s2-pro",
                           ws_options: Optional[WebSocketOptions] = None)
```

Stream text and receive audio in real-time via WebSocket (async).

Perfect for conversational AI, live captioning, and streaming applications.

**Arguments**:

* `text_stream` - Async iterator of text chunks to stream
* `reference_id` - Voice reference ID (overrides config.reference\_id if provided)
* `references` - Reference audio samples (overrides config.references if provided)
* `format` - Audio format - "mp3", "wav", "pcm", or "opus" (overrides config.format if provided)
* `latency` - Latency mode - "normal" or "balanced" (overrides config.latency if provided)
* `speed` - Speech speed multiplier, e.g. 1.5 for 1.5x speed (overrides config.prosody.speed if provided)
* `config` - TTS configuration (audio settings, voice, model parameters)
* `model` - TTS model to use
* `ws_options` - WebSocket connection options for configuring timeouts, message size limits, etc.
  Useful for long-running generations that may exceed default timeout values.
  See WebSocketOptions class for available parameters.

**Returns**:

Async iterator of audio bytes

**Example**:

```python theme={null}
from fishaudio import AsyncFishAudio, TTSConfig, ReferenceAudio, WebSocketOptions

client = AsyncFishAudio(api_key="...")

async def text_generator():
    yield "Hello, "
    yield "this is "
    yield "async streaming!"

# Simple usage with defaults
async with aiofiles.open("output.mp3", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(text_generator()):
        await f.write(audio_chunk)

# With format and speed parameters
async with aiofiles.open("output.wav", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        format="wav",
        speed=1.3
    ):
        await f.write(audio_chunk)

# With reference_id parameter
async with aiofiles.open("output.mp3", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(text_generator(), reference_id="your_model_id"):
        await f.write(audio_chunk)

# With references parameter
async with aiofiles.open("output.mp3", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        references=[ReferenceAudio(audio=audio_bytes, text="sample")]
    ):
        await f.write(audio_chunk)

# With WebSocket options for long-running generations
# Useful if you're generating very long responses that may take >20 seconds
ws_options = WebSocketOptions(keepalive_ping_timeout_seconds=60.0)
async with aiofiles.open("output.mp3", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        ws_options=ws_options
    ):
        await f.write(audio_chunk)

# Parameters override config values
config = TTSConfig(format="mp3", latency="balanced")
async with aiofiles.open("output.wav", "wb") as f:
    async for audio_chunk in client.tts.stream_websocket(
        text_generator(),
        format="wav",  # Parameter wins
        config=config
    ):
        await f.write(audio_chunk)
```

<a />

# fishaudio.resources.realtime

Real-time WebSocket streaming helpers.

<a />

#### iter\_websocket\_audio

```python theme={null}
def iter_websocket_audio(ws) -> Iterator[bytes]
```

Process WebSocket audio messages (sync).

Receives messages from WebSocket, yields audio chunks, handles errors.
Unknown events are ignored and iteration continues.

**Arguments**:

* `ws` - WebSocket connection from httpx\_ws.connect\_ws

**Yields**:

Audio bytes

**Raises**:

* `WebSocketError` - On disconnect or error finish event

<a />

#### aiter\_websocket\_audio

```python theme={null}
async def aiter_websocket_audio(ws) -> AsyncIterator[bytes]
```

Process WebSocket audio messages (async).

Receives messages from WebSocket, yields audio chunks, handles errors.
Unknown events are ignored and iteration continues.

**Arguments**:

* `ws` - WebSocket connection from httpx\_ws.aconnect\_ws

**Yields**:

Audio bytes

**Raises**:

* `WebSocketError` - On disconnect or error finish event

<a />

# fishaudio.resources.asr

ASR (Automatic Speech Recognition) namespace client.

<a />

## ASRClient Objects

```python theme={null}
class ASRClient()
```

Synchronous ASR operations.

<a />

#### transcribe

```python theme={null}
def transcribe(
        *,
        audio: bytes,
        language: Optional[str] = OMIT,
        include_timestamps: bool = True,
        request_options: Optional[RequestOptions] = None) -> ASRResponse
```

Transcribe audio to text.

**Arguments**:

* `audio` - Audio file bytes
* `language` - Language code (e.g., "en", "zh"). Auto-detected if not provided.
* `include_timestamps` - Whether to include timestamp information for segments
* `request_options` - Request-level overrides

**Returns**:

ASRResponse with transcription text, duration, and segments

**Example**:

```python theme={null}
client = FishAudio(api_key="...")

with open("audio.mp3", "rb") as f:
    audio_bytes = f.read()

result = client.asr.transcribe(audio=audio_bytes, language="en")
print(result.text)
for segment in result.segments:
    print(f"{segment.start}-{segment.end}: {segment.text}")
```

<a />

## AsyncASRClient Objects

```python theme={null}
class AsyncASRClient()
```

Asynchronous ASR operations.

<a />

#### transcribe

```python theme={null}
async def transcribe(
        *,
        audio: bytes,
        language: Optional[str] = OMIT,
        include_timestamps: bool = True,
        request_options: Optional[RequestOptions] = None) -> ASRResponse
```

Transcribe audio to text (async).

**Arguments**:

* `audio` - Audio file bytes
* `language` - Language code (e.g., "en", "zh"). Auto-detected if not provided.
* `include_timestamps` - Whether to include timestamp information for segments
* `request_options` - Request-level overrides

**Returns**:

ASRResponse with transcription text, duration, and segments

**Example**:

```python theme={null}
client = AsyncFishAudio(api_key="...")

async with aiofiles.open("audio.mp3", "rb") as f:
    audio_bytes = await f.read()

result = await client.asr.transcribe(audio=audio_bytes, language="en")
print(result.text)
for segment in result.segments:
    print(f"{segment.start}-{segment.end}: {segment.text}")
```


# Types
Source: https://docs.fish.audio/api-reference/sdk/python/types


<a />

# fishaudio.types.voices

Voice and model management types.

<a />

## Sample Objects

```python theme={null}
class Sample(BaseModel)
```

A sample audio for a voice model.

**Attributes**:

* `title` - Title/name of the audio sample
* `text` - Transcription of the spoken content in the sample
* `task_id` - Unique identifier for the sample task
* `audio` - URL or path to the audio file

<a />

## Author Objects

```python theme={null}
class Author(BaseModel)
```

Voice model author information.

**Attributes**:

* `id` - Unique author identifier
* `nickname` - Author's display name
* `avatar` - URL to author's avatar image

<a />

## Voice Objects

```python theme={null}
class Voice(BaseModel)
```

A voice model.

Represents a TTS voice that can be used for synthesis.

**Attributes**:

* `id` - Unique voice model identifier (use as reference\_id in TTS)
* `type` - Model type. Options: "svc" (singing voice conversion), "tts" (text-to-speech)
* `title` - Voice model title/name
* `description` - Detailed description of the voice model
* `cover_image` - URL to the voice model's cover image
* `train_mode` - Training mode used. Options: "fast"
* `state` - Current model state (e.g., "ready", "training", "failed")
* `tags` - List of tags for categorization (e.g., \["male", "english", "young"])
* `samples` - List of audio samples demonstrating the voice
* `created_at` - Timestamp when the model was created
* `updated_at` - Timestamp when the model was last updated
* `languages` - List of supported language codes (e.g., \["en", "zh"])
* `visibility` - Model visibility. Options: "public", "private", "unlist"
* `lock_visibility` - Whether visibility setting is locked
* `like_count` - Number of likes the model has received
* `mark_count` - Number of bookmarks/favorites
* `shared_count` - Number of times the model has been shared
* `task_count` - Number of times the model has been used for generation
* `liked` - Whether the current user has liked this model. Default: False
* `marked` - Whether the current user has bookmarked this model. Default: False
* `author` - Information about the voice model's creator

<a />

# fishaudio.types.account

Account-related types (credits, packages, etc.).

<a />

## Credits Objects

```python theme={null}
class Credits(BaseModel)
```

User's API credit balance.

**Attributes**:

* `id` - Unique credits record identifier
* `user_id` - User identifier
* `credit` - Current credit balance (decimal for precise accounting)
* `created_at` - Timestamp when the credits record was created
* `updated_at` - Timestamp when the credits were last updated
* `has_phone_sha256` - Whether the user has a verified phone number. Optional
* `has_free_credit` - Whether the user has received free credits. Optional

<a />

## Package Objects

```python theme={null}
class Package(BaseModel)
```

User's prepaid package information.

**Attributes**:

* `id` - Unique package identifier
* `user_id` - User identifier
* `type` - Package type identifier
* `total` - Total units in the package
* `balance` - Remaining units in the package
* `created_at` - Timestamp when the package was purchased
* `updated_at` - Timestamp when the package was last updated
* `finished_at` - Timestamp when the package was fully consumed. None if still active

<a />

# fishaudio.types.tts

TTS-related types.

<a />

## ReferenceAudio Objects

```python theme={null}
class ReferenceAudio(BaseModel)
```

Reference audio for voice cloning/style.

**Attributes**:

* `audio` - Audio file bytes for the reference sample
* `text` - Transcription of what is spoken in the reference audio. Should match exactly
  what's spoken and include punctuation for proper prosody.

<a />

## Prosody Objects

```python theme={null}
class Prosody(BaseModel)
```

Speech prosody settings (speed and volume).

**Attributes**:

* `speed` - Speech speed multiplier. Range: 0.5-2.0. Default: 1.0.
* `Examples` - 1.5 = 50% faster, 0.8 = 20% slower
* `volume` - Volume adjustment in decibels. Range: -20.0 to 20.0. Default: 0.0 (no change).
  Positive values increase volume, negative values decrease it.

<a />

#### from\_speed\_override

```python theme={null}
@classmethod
def from_speed_override(cls,
                        speed: float,
                        base: Optional["Prosody"] = None) -> "Prosody"
```

Create Prosody with speed override, preserving volume from base.

**Arguments**:

* `speed` - Speed value to use
* `base` - Base prosody to preserve volume from (if any)

**Returns**:

New Prosody instance with overridden speed

<a />

## TTSConfig Objects

```python theme={null}
class TTSConfig(BaseModel)
```

TTS generation configuration.

Reusable configuration for text-to-speech requests. Create once, use multiple times.
All parameters have sensible defaults.

**Attributes**:

* `format` - Audio output format. Options: "mp3", "wav", "pcm", "opus". Default: "mp3"
* `sample_rate` - Audio sample rate in Hz. If None, uses format-specific default.
* `mp3_bitrate` - MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128
* `opus_bitrate` - Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32
* `normalize` - Whether to normalize/clean the input text. Default: True
* `chunk_length` - Characters per generation chunk. Range: 100-300. Default: 200.
  Lower values = faster initial response, higher values = better quality
* `latency` - Generation mode. Options: "normal" (higher quality), "balanced" (faster). Default: "balanced"
* `reference_id` - Voice model ID from fish.audio (e.g., "802e3bc2b27e49c2995d23ef70e6ac89").
  Find IDs in voice URLs or via voices.list()
* `references` - List of reference audio samples for instant voice cloning. Default: \[]
* `prosody` - Speech speed and volume settings. Default: None (uses natural prosody)
* `top_p` - Nucleus sampling parameter for token selection. Range: 0.0-1.0. Default: 0.7
* `temperature` - Randomness in generation. Range: 0.0-1.0. Default: 0.7.
  Higher = more varied, lower = more consistent
* `max_new_tokens` - Maximum number of tokens to generate. Default: 1024
* `repetition_penalty` - Penalty for repeated tokens. Default: 1.2
* `min_chunk_length` - Minimum chunk length for generation. Default: 50
* `condition_on_previous_chunks` - Whether to condition generation on previous chunks. Default: True
* `early_stop_threshold` - Threshold for early stopping. Default: 1.0

<a />

## TTSRequest Objects

```python theme={null}
class TTSRequest(BaseModel)
```

Request parameters for text-to-speech generation.

This model is used internally for WebSocket streaming.
For the HTTP API, parameters are passed directly to methods.

**Attributes**:

* `text` - Text to synthesize into speech
* `chunk_length` - Characters per generation chunk. Range: 100-300. Default: 200
* `format` - Audio output format. Options: "mp3", "wav", "pcm", "opus". Default: "mp3"
* `sample_rate` - Audio sample rate in Hz. If None, uses format-specific default
* `mp3_bitrate` - MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128
* `opus_bitrate` - Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32
* `references` - List of reference audio samples for voice cloning. Default: \[]
* `reference_id` - Voice model ID for using a specific voice. Default: None
* `normalize` - Whether to normalize/clean the input text. Default: True
* `latency` - Generation mode. Options: "normal", "balanced". Default: "balanced"
* `prosody` - Speech speed and volume settings. Default: None
* `top_p` - Nucleus sampling for token selection. Range: 0.0-1.0. Default: 0.7
* `temperature` - Randomness in generation. Range: 0.0-1.0. Default: 0.7
* `max_new_tokens` - Maximum number of tokens to generate. Default: 1024
* `repetition_penalty` - Penalty for repeated tokens. Default: 1.2
* `min_chunk_length` - Minimum chunk length for generation. Default: 50
* `condition_on_previous_chunks` - Whether to condition generation on previous chunks. Default: True
* `early_stop_threshold` - Threshold for early stopping. Default: 1.0

<a />

## StartEvent Objects

```python theme={null}
class StartEvent(BaseModel)
```

WebSocket start event to initiate TTS streaming.

**Attributes**:

* `event` - Event type identifier, always "start"
* `request` - TTS configuration for the streaming session

<a />

## TextEvent Objects

```python theme={null}
class TextEvent(BaseModel)
```

WebSocket event to send a text chunk for synthesis.

**Attributes**:

* `event` - Event type identifier, always "text"
* `text` - Text chunk to synthesize

<a />

## FlushEvent Objects

```python theme={null}
class FlushEvent(BaseModel)
```

WebSocket event to force immediate audio generation from buffered text.

Use this to ensure all buffered text is synthesized without waiting for more input.

**Attributes**:

* `event` - Event type identifier, always "flush"

<a />

## CloseEvent Objects

```python theme={null}
class CloseEvent(BaseModel)
```

WebSocket event to end the streaming session.

**Attributes**:

* `event` - Event type identifier, always "stop"

<a />

# fishaudio.types.shared

Shared types used across the SDK.

<a />

## PaginatedResponse Objects

```python theme={null}
class PaginatedResponse(BaseModel, Generic[T])
```

Generic paginated response.

**Attributes**:

* `total` - Total number of items across all pages
* `items` - List of items on the current page

<a />

#### warn\_if\_deprecated\_model

```python theme={null}
def warn_if_deprecated_model(model: str) -> None
```

Emit a deprecation warning if a legacy model is used.

<a />

# fishaudio.types.asr

ASR (Automatic Speech Recognition) related types.

<a />

## ASRSegment Objects

```python theme={null}
class ASRSegment(BaseModel)
```

A timestamped segment of transcribed text.

**Attributes**:

* `text` - The transcribed text for this segment
* `start` - Segment start time in seconds
* `end` - Segment end time in seconds

<a />

## ASRResponse Objects

```python theme={null}
class ASRResponse(BaseModel)
```

Response from speech-to-text transcription.

**Attributes**:

* `text` - Complete transcription of the entire audio
* `duration` - Total audio duration in milliseconds
* `segments` - List of timestamped text segments. Empty if include\_timestamps=False

<a />

#### duration

Duration in milliseconds


# Utils
Source: https://docs.fish.audio/api-reference/sdk/python/utils


<a />

# fishaudio.utils.play

Audio playback utility.

<a />

#### play

```python theme={null}
def play(audio: Union[bytes, Iterable[bytes]],
         *,
         notebook: bool = False,
         use_ffmpeg: bool = True) -> None
```

Play audio using various playback methods.

**Arguments**:

* `audio` - Audio bytes or iterable of bytes
* `notebook` - Use Jupyter notebook playback (IPython.display.Audio)
* `use_ffmpeg` - Use ffplay for playback (default, falls back to sounddevice)

**Raises**:

* `DependencyError` - If required playback tool is not installed

**Examples**:

```python theme={null}
from fishaudio import FishAudio, play

client = FishAudio(api_key="...")
audio = client.tts.convert(text="Hello world")

# Play directly
play(audio)

# In Jupyter notebook
play(audio, notebook=True)

# Force sounddevice fallback
play(audio, use_ffmpeg=False)
```

<a />

# fishaudio.utils.save

Audio saving utility.

<a />

#### save

```python theme={null}
def save(audio: Union[bytes, Iterable[bytes]], filename: str) -> None
```

Save audio to a file.

**Arguments**:

* `audio` - Audio bytes or iterable of bytes
* `filename` - Path to save the audio file

**Examples**:

```python theme={null}
from fishaudio import FishAudio, save

client = FishAudio(api_key="...")
audio = client.tts.convert(text="Hello world")

# Save to file
save(audio, "output.mp3")

# Works with iterators too
audio_stream = client.tts.convert(text="Another example")
save(audio_stream, "another.mp3")
```

<a />

# fishaudio.utils.stream

Audio streaming utility.

<a />

#### stream

```python theme={null}
def stream(audio_stream: Iterator[bytes]) -> bytes
```

Stream audio in real-time while playing it with mpv.

This function plays the audio as it's being generated and
simultaneously captures it to return the complete audio buffer.

**Arguments**:

* `audio_stream` - Iterator of audio byte chunks

**Returns**:

Complete audio bytes after streaming finishes

**Raises**:

* `DependencyError` - If mpv is not installed

**Examples**:

```python theme={null}
from fishaudio import FishAudio, stream

client = FishAudio(api_key="...")
audio_stream = client.tts.convert(text="Hello world")

# Stream and play in real-time, get complete audio
complete_audio = stream(audio_stream)

# Save the captured audio
with open("output.mp3", "wb") as f:
    f.write(complete_audio)
```


# Legacy
Source: https://docs.fish.audio/archive/python-sdk-legacy/index

Archived documentation for the legacy Session-based Python SDK

<Visibility>
  <AudioTranscript />
</Visibility>

<Warning>
  This documentation is for the legacy Python SDK using the Session-based API. This API is deprecated.

  **Please migrate to the [new Python SDK](/developer-guide/sdk-guide/python)** which uses a modern client-based architecture.

  See the [migration guide](/archive/python-sdk-legacy/migration-guide) for help upgrading.
</Warning>

## About the Legacy SDK

This archive contains documentation for the `fish_audio_sdk` module using the Session-based API. While this API still functions, it is no longer actively maintained and lacks the modern features available in the new SDK.

### What's Different in the New SDK

The new Python SDK (`fishaudio` module) offers:

* **Modern client-based architecture** - More intuitive and consistent with modern Python libraries
* **Full async support** - Native asyncio integration for better performance
* **Better type safety** - Comprehensive type hints and better IDE support
* **Improved error handling** - More detailed error messages and exception hierarchy
* **Enhanced utilities** - Built-in audio playback, streaming, and file management
* **Active maintenance** - Regular updates and new features

### Migration Path

We strongly recommend migrating to the new SDK. The [migration guide](/archive/python-sdk-legacy/migration-guide) provides:

* Side-by-side code comparisons
* Complete list of breaking changes
* Common migration patterns
* Troubleshooting tips

## Migration

<Card title="Migration Guide" icon="arrow-right" href="/archive/python-sdk-legacy/migration-guide">
  Complete guide to upgrading from the legacy SDK to the new client-based API
</Card>

## Legacy Documentation Pages

<CardGroup>
  <Card title="Installation" icon="download" href="/archive/python-sdk-legacy/installation">
    How to install the legacy SDK
  </Card>

  <Card title="Authentication" icon="key" href="/archive/python-sdk-legacy/authentication">
    Session initialization and API keys
  </Card>

  <Card title="Text-to-Speech" icon="microphone" href="/archive/python-sdk-legacy/text-to-speech">
    TTS with the Session-based API
  </Card>

  <Card title="Voice Cloning" icon="clone" href="/archive/python-sdk-legacy/voice-cloning">
    Reference audio and voice models
  </Card>

  <Card title="Speech-to-Text" icon="waveform" href="/archive/python-sdk-legacy/speech-to-text">
    ASR transcription with legacy API
  </Card>

  <Card title="WebSocket Streaming" icon="bolt" href="/archive/python-sdk-legacy/websocket">
    Real-time streaming with WebSocketSession
  </Card>
</CardGroup>


# Contributing
Source: https://docs.fish.audio/contributing

Help improve Fish Audio and contribute to our open source projects.

<AudioTranscript />

# Contributing to Fish Audio

First off, thanks for taking the time to contribute!

All types of contributions are encouraged and valued. See the sections below for different ways to help and details about how this project handles them. Please make sure to read the relevant section before making your contribution. It will make it a lot easier for us maintainers and smooth out the experience for all involved. The community looks forward to your contributions.

<Note>
  If you like the project but don't have time to contribute, there are other easy ways to support Fish Audio:

  * Star our repositories
  * Tweet about it
  * Reference Fish Audio in your project's readme
  * Mention the project at local meetups and tell your friends/colleagues
</Note>

## Code of Conduct

This project and everyone participating in it is governed by the Fish Audio Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to our community team.

## I Have a Question

Before you ask a question, please read the available [Documentation](https://docs.fish.audio).

It's best to search for existing [Issues](https://github.com/fishaudio) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in that issue. It is also advisable to search the internet for answers first.

If you still need to ask a question:

1. Open an [Issue](https://github.com/fishaudio) in the relevant repository
2. Provide as much context as you can about what you're running into
3. Provide project and platform versions (Node.js, Python, OS, etc.), depending on what seems relevant

We will take care of the issue as soon as possible.

## I Want To Contribute

<Warning>
  **Legal Notice**

  When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content, and that the content you contribute may be provided under the project license.
</Warning>

### Reporting Bugs

#### Before Submitting a Bug Report

A good bug report shouldn't leave others needing to chase you up for more information. Please investigate carefully, collect information, and describe the issue in detail:

* Make sure you are using the latest version
* Determine if your bug is really a bug and not an error on your side (e.g., incompatible environment components/versions)
* Check if there is already a bug report for your issue in the bug tracker
* Search the internet (including Stack Overflow) to see if others have discussed the issue
* Collect information about the bug:
  * Stack trace (Traceback)
  * OS, Platform and Version (Windows, Linux, macOS, x86, ARM)
  * Version of the interpreter, compiler, SDK, runtime environment, package manager
  * Your input and the output
  * Can you reliably reproduce the issue? Can you reproduce it with older versions?

#### How Do I Submit a Good Bug Report?

<Warning>
  You must never report security-related issues, vulnerabilities, or bugs including sensitive information to the issue tracker. Instead, sensitive bugs must be sent by email to our security team.
</Warning>

We use GitHub issues to track bugs and errors. If you run into an issue:

1. Open an [Issue](https://github.com/fishaudio) in the relevant repository
2. Explain the behavior you would expect and the actual behavior
3. Provide as much context as possible and describe the **reproduction steps** that someone else can follow to recreate the issue on their own
4. Provide the information you collected in the previous section

Once filed:

* The project team will label the issue accordingly
* A team member will try to reproduce the issue with your provided steps
* If there are no reproduction steps, the team will ask for them and mark the issue as `needs-repro`
* If the team reproduces the issue, it will be marked `needs-fix` and left to be implemented

### Suggesting Enhancements

This section guides you through submitting an enhancement suggestion for Fish Audio, including completely new features and minor improvements to existing functionality.

#### Before Submitting an Enhancement

* Make sure you are using the latest version
* Read the [documentation](https://docs.fish.audio) carefully to see if the functionality already exists
* Perform a [search](https://github.com/fishaudio) to see if the enhancement has already been suggested
* Consider whether your idea fits with the scope and aims of the project

#### How Do I Submit a Good Enhancement Suggestion?

Enhancement suggestions are tracked as GitHub issues:

* Use a **clear and descriptive title** for the issue
* Provide a **step-by-step description** of the suggested enhancement in as many details as possible
* **Describe the current behavior** and **explain which behavior you expected to see instead** and why
* Include **screenshots or screen recordings** if applicable
* **Explain why this enhancement would be useful** to most Fish Audio users

### Your First Code Contribution

We welcome first-time contributors! Here's how to get started:

1. **Fork the repository** you want to contribute to
2. **Clone your fork** locally
3. **Create a new branch** for your changes
4. **Make your changes** following our styleguides
5. **Test your changes** thoroughly
6. **Commit your changes** with clear commit messages
7. **Push to your fork** and submit a pull request

<Tip>
  Look for issues labeled `good first issue` or `help wanted` for beginner-friendly tasks.
</Tip>

### Improving The Documentation

Documentation improvements are always welcome! This includes:

* Fixing typos and grammatical errors
* Adding missing information or clarifications
* Improving code examples
* Adding new guides or tutorials
* Translating documentation

See our [documentation repository](https://github.com/fishaudio/fish-docs) to get started.

## Styleguides

### Commit Messages

* Use clear and meaningful commit messages
* Start with a verb in the present tense (e.g., "Add", "Fix", "Update", "Remove")
* Keep the first line under 72 characters
* Reference issues and pull requests when relevant
* Provide additional context in the commit body if needed

Example:

```
Add voice cloning support for Python SDK

- Implement VoiceCloneClient class
- Add comprehensive error handling
- Include usage examples in docstrings

Closes #123
```

### Code Style

* Follow the existing code style in each repository
* Use meaningful variable and function names
* Add comments for complex logic
* Write tests for new features
* Ensure all tests pass before submitting

## Attribution

This contribution guide is based on the **contributing.md** generator. Fish Audio is committed to open source and welcomes contributions from developers worldwide.


# Emotion & Expression Control
Source: https://docs.fish.audio/developer-guide/best-practices/emotion-control

Make your AI voices express emotions naturally

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Control how your AI voice expresses emotions, from happy and excited to sad and contemplative. Add natural pauses, laughter, and other human-like elements to make speech more engaging.

<Tip>
  The `(parenthesis)` syntax on this page applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

## How to Use

Simply wrap emotion tags in parentheses before your text:

```
(happy) What a beautiful day!
(sad) I'm sorry to hear that.
(excited) This is amazing news!
```

Include tone markers or audio effects:

```
(whispering) Let me tell you something.
(laughing) Ha ha ha, wow that's so funny!
```

## Important Rules

### Placement Matters

**For all languages:**

* Emotion tags MUST go at the beginning of sentences
* Tone controls can go anywhere in the text
* Sound effects can go anywhere in the text

**Correct:**

```
(happy) What a wonderful day!
```

**Incorrect:**

```
What a (happy) wonderful day!
```

## Best Practices

**Do:**

* Use one emotion per sentence
* Add sounds after relevant words
* Keep tags simple and clear
* Test different combinations

**Don't:**

* Overuse tags in short text
* Mix conflicting emotions
* Create custom tags
* Forget the parentheses

## Available Emotions

See the [Emotion Reference](/api-reference/emotion-reference) for the full list of supported emotions.

## Scene Examples

**Customer Service:**

```
(friendly) Hello! How can I help you today?
(empathetic) I understand your frustration.
(confident) I'll resolve this for you right away.
```

**Storytelling:**

```
(mysterious)(whispering) Once upon a midnight dreary...
(excited) Suddenly, the door burst open!
(scared)(shouting) Run for your lives!
```

**Educational Content:**

```
(enthusiastic) Welcome to today's lesson!
(curious) Have you ever wondered why the sky is blue?
(proud) Great job! You got it right!
```

## Real-World Examples

### Virtual Assistant

```
(friendly) Good morning! 
(helpful) I've prepared your schedule for today.
(concerned) You have three urgent emails.
(encouraging) Let's tackle them together!
```

### Audiobook Narration

```
(narrator) Chapter One: The Beginning
(mysterious) The old house stood silent in the fog.
(scared)(whispering) "Is anyone there?" she asked.
(relieved)(sighing) No one answered. Phew.
```

### Game Character

```
(brave) I'll defeat the dragon!
(struggling)(panting) This is... harder than... I thought!
(triumphant)(shouting) Victory is mine!
(laughing) Ha ha ha!
```

## Advanced Techniques

### Emotion Transitions

Gradually change emotions:

```
(happy) I got the promotion!
(uncertain) But... it means moving away.
(sad) I'll miss everyone here.
```

### Background Effects

Add atmosphere:

```
The comedy show was amazing (audience laughing)
Everyone was having fun (background laughter)
The crowd loved it (crowd laughing)
```

## Troubleshooting

### Emotion Not Working?

1. Check tag placement (beginning of sentence for emotions)
2. Verify spelling exactly matches the list
3. Don't use quotes around tags
4. Include parentheses

### Unnatural Sound?

* Add appropriate text after sound tags
* Don't overuse in short sentences
* Space out emotional changes
* Test with different voices

### Tips for Success

1. **Start simple** - Use basic emotions first
2. **Preview often** - Test how it sounds
3. **Be consistent** - Keep character emotions logical
4. **Less is more** - Don't overuse tags

## Get Creative

Experiment with combinations to create unique character voices and engaging narratives. The key is finding the right balance between emotional expression and natural speech flow.

## Support

Need help with emotions?

* **Try it live:** [fish.audio](https://fish.audio)
* **Community:** [Discord](https://discord.gg/fish-audio)
* **Email:** [support@fish.audio](mailto:support@fish.audio)


# Real-time Voice Streaming
Source: https://docs.fish.audio/developer-guide/best-practices/real-time-streaming

Stream voice generation in real-time for interactive applications

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Real-time streaming lets you generate speech as you type or speak, perfect for chatbots, virtual assistants, and live applications.

## When to Use Streaming

**Perfect for:**

* Live chat applications
* Virtual assistants
* Interactive storytelling
* Real-time translations
* Gaming dialogue

**Not ideal for:**

* Pre-recorded content
* Batch processing

## Getting Started

### Web Playground

Try real-time streaming instantly:

1. Visit [fish.audio](https://fish.audio)
2. Enable "Streaming Mode"
3. Start typing and hear voice generation in real-time

### Using the SDK

Stream text as it's being written:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from fishaudio import FishAudio

    # Initialize client
    client = FishAudio(api_key="your_api_key")

    # Stream text word by word
    def stream_text():
        text = "Hello, this is being generated in real time"
        for word in text.split():
            yield word + " "

    # Generate speech as text streams
    audio_stream = client.tts.stream_websocket(
        stream_text(),
        reference_id="your_voice_model_id",
        temperature=0.7,  # Controls variation
        top_p=0.7,  # Controls diversity
        latency="balanced"
    )

    with open("output.mp3", "wb") as f:
        for audio_chunk in audio_stream:
            f.write(audio_chunk)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { FishAudioClient, RealtimeEvents } from "fish-audio";
    import { writeFile } from "fs/promises";
    import path from "path";

    const apiKey = "your_api_key";
    const referenceId = "your_voice_model_id";

    async function* makeTextStream() {
      const chunks = [
        "Hello from Fish Audio! ",
        "This is a realtime text-to-speech test. ",
        "We are streaming multiple chunks over WebSocket.",
      ];
      for (const chunk of chunks) {
        yield chunk;
        await new Promise((r) => setTimeout(r, 200));
      }
    }

    async function main() {
      const client = new FishAudioClient({ apiKey });

      // For realtime, set text to "" and stream content via makeTextStream
      const request = {
        text: "",
        reference_id: referenceId,
      };

      const connection = await client.textToSpeech.convertRealtime(
        request,
        makeTextStream()
      );

      // Collect audio and write to a file when the stream ends
      const chunks = [];
      connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
      connection.on(RealtimeEvents.AUDIO_CHUNK, (audio) => {
        if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
          chunks.push(Buffer.from(audio));
        }
      });
      connection.on(RealtimeEvents.ERROR, (err) =>
        console.error("WebSocket error:", err)
      );
      connection.on(RealtimeEvents.CLOSE, async () => {
        const outPath = path.resolve(process.cwd(), "out.mp3");
        await writeFile(outPath, Buffer.concat(chunks));
        console.log("Saved to", outPath);
      });
    }

    main().catch((err) => {
      console.error(err);
      process.exit(1);
    });
    ```
  </Tab>
</Tabs>

## Configuration Options

### Speed vs Quality

**Latency Modes:**

* **Normal:** Best quality, \~500ms latency
* **Balanced:** Good quality, \~300ms latency

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Use latency parameter with stream_websocket
    audio_stream = client.tts.stream_websocket(
        text_chunks(),
        reference_id="model_id",
        latency="balanced"  # For faster response
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const request = {
      text: "",
      reference_id: "model_id",
      latency: "balanced", // For faster response
    };
    ```
  </Tab>
</Tabs>

### Voice Control

**Temperature** (0.1 - 1.0):

* Lower: More consistent, predictable
* Higher: More varied, expressive

**Top-p** (0.1 - 1.0):

* Lower: More focused
* Higher: More diverse

## Real-time Applications

### Chatbot Integration

Stream responses as they're generated:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def chatbot_response(user_input):
        # Get AI response (streaming)
        ai_text = get_ai_response(user_input)

        # Convert to speech in real-time
        audio_stream = client.tts.stream_websocket(ai_text)
        for audio_chunk in audio_stream:
            play_audio(audio_chunk)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    async function chatbotResponse(userInput) {
      // Get AI response (streaming)
      const aiTextStream = getAiResponse(userInput); // async iterable of strings

      // Convert to speech in real-time
      for await (const textChunk of aiTextStream) {
        for await (const audioChunk of ttsStream(textChunk)) {
          playAudio(audioChunk);
        }
      }
    }
    ```
  </Tab>
</Tabs>

### Live Translation

Translate and speak simultaneously:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def live_translate(source_audio):
        # Transcribe source audio
        text = transcribe(source_audio)
        
        # Translate text
        translated = translate(text, target_language)
        
        # Stream translated speech
        for chunk in stream_text(translated):
            generate_speech(chunk)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    async function liveTranslate(sourceAudio) {
      // Transcribe source audio
      const text = await transcribe(sourceAudio);

      // Translate text
      const translated = await translate(text, targetLanguage);

      // Stream translated speech
      for await (const chunk of streamText(translated)) {
        generateSpeech(chunk);
      }
    }
    ```
  </Tab>
</Tabs>

## Best Practices

### Text Buffering

**Do:**

* Send complete words with spaces
* Use punctuation for natural pauses
* Buffer 5-10 words for smoothness

**Don't:**

* Send individual characters
* Forget spaces between words
* Send huge chunks at once

### Connection Management

1. **Keep connections alive** for multiple generations
2. **Handle disconnections** gracefully
3. **Implement retry logic** for reliability

### Audio Playback

For smooth playback:

* Buffer 2-3 audio chunks
* Use cross-fading between chunks
* Handle network delays gracefully

## Common Use Cases

### Interactive Story

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def interactive_story():
        story_parts = [
            "Once upon a time,",
            "in a land far away,",
            "there lived a brave knight..."
        ]
        
        for part in story_parts:
            # Generate and play each part
            stream_speech(part)
            # Wait for user input
            user_choice = get_user_input()
            # Continue based on choice
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    function interactiveStory() {
      const storyParts = [
        "Once upon a time,",
        "in a land far away,",
        "there lived a brave knight...",
      ];

      for (const part of storyParts) {
        // Generate and play each part
        streamSpeech(part);
        // Wait for user input
        const userChoice = getUserInput();
        // Continue based on choice
      }
    }
    ```
  </Tab>
</Tabs>

### Virtual Assistant

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def virtual_assistant():
        while True:
            # Listen for wake word
            if detect_wake_word():
                # Start streaming response
                response = process_command()
                stream_speech(response)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    async function virtualAssistant() {
      while (true) {
        // Listen for wake word
        if (detectWakeWord()) {
          // Start streaming response
          const response = processCommand();
          streamSpeech(response);
        }
      }
    }
    ```
  </Tab>
</Tabs>

### Live Commentary

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def live_commentary(event_stream):
        for event in event_stream:
            # Generate commentary
            commentary = generate_commentary(event)
            # Stream immediately
            stream_speech(commentary)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    async function liveCommentary(eventStream) {
      for await (const event of eventStream) {
        // Generate commentary
        const commentary = generateCommentary(event);
        // Stream immediately
        streamSpeech(commentary);
      }
    }
    ```
  </Tab>
</Tabs>

## Troubleshooting

### Audio Gaps

**Problem:** Gaps between audio chunks<br />
**Solution:**

* Increase buffer size
* Use balanced latency mode
* Check network connection

### Delayed Response

**Problem:** Long wait before audio starts<br />
**Solution:**

* Use balanced latency mode
* Send initial text immediately
* Reduce chunk size

### Choppy Playback

**Problem:** Audio cuts in and out<br />
**Solution:**

* Buffer more chunks before playing
* Check network stability
* Use consistent chunk sizes

## Advanced Features

### Dynamic Voice Switching

Change voices mid-stream:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Start with one voice
    def text1():
        yield "Hello from voice one."

    audio1 = client.tts.stream_websocket(text1(), reference_id="voice1")
    for chunk in audio1:
        play_audio(chunk)

    # Switch to another
    def text2():
        yield "And now voice two!"

    audio2 = client.tts.stream_websocket(text2(), reference_id="voice2")
    for chunk in audio2:
        play_audio(chunk)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Start with one voice
    const request1 = { reference_id: "voice1" };
    streamSpeech("Hello from voice one.", request1);

    // Switch to another
    const request2 = { reference_id: "voice2" };
    streamSpeech("And now voice two!", request2);
    ```
  </Tab>
</Tabs>

### Emotion Injection

Add emotions dynamically:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    def emotional_speech(text, emotion):
        emotional_text = f"({emotion}) {text}"
        stream_speech(emotional_text)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    function emotionalSpeech(text, emotion) {
      const emotionalText = `(${emotion}) ${text}`;
      streamSpeech(emotionalText);
    }
    ```
  </Tab>
</Tabs>

### Speed Control

Adjust speaking speed:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from fishaudio.types import Prosody

    # Use speed and volume with stream_websocket
    audio_stream = client.tts.stream_websocket(
        text_chunks(),
        speed=1.5  # 1.5x speed
    )
    # Note: For full prosody control including volume, use TTSConfig
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const request = {
      text: "",
      prosody: {
        speed: 1.5, // 1.5x speed
        volume: 0,  // Normal volume
      },
    };
    ```
  </Tab>
</Tabs>

## Performance Tips

1. **Pre-load voices** for instant start
2. **Use connection pooling** for multiple streams
3. **Monitor latency** and adjust settings
4. **Cache common phrases** for instant playback

## Get Support

Need help with streaming?

* **Discord Community:** [Join our Discord](https://discord.gg/fish-audio)
* **Email Support:** [support@fish.audio](mailto:support@fish.audio)
* **Status Page:** [status.fish.audio](https://status.fish.audio)


# Voice Cloning Best Practices
Source: https://docs.fish.audio/developer-guide/best-practices/voice-cloning

Simple tips to get the best voice cloning results with Fish Audio

<Visibility>
  <AudioTranscript />
</Visibility>

## Getting Started

Voice cloning lets you create a digital version of any voice. Use at least 10 seconds of audio recording for studio-quality results right in the Playground or via the API.

## Recording Your Voice

### Find a Quiet Space

**Good places to record:**

* A bedroom with curtains and carpet
* Inside a parked car
* A quiet office or study room
* Any room with soft furniture

**Avoid recording near:**

* Open windows with traffic noise
* Running appliances (AC, fans, refrigerators)
* Other people talking
* TVs or music playing

### Use What You Have

**Best options:**

* USB microphone or gaming headset
* Phone voice recorder app (place it on a stable surface)
* Earbuds with microphone (hold them steady)

**Quick tip:** Keep the microphone about a hand's width from your mouth and speak normally.

## What to Say

**Best approach:** Record 2-3 clips of 15-20 seconds each that form a complete paragraph.

Here's a sample script you can read naturally:

```
"Hello, my name is Alex, and I enjoy reading books about technology 
and science. Yesterday, I walked through the park, observing the 
beautiful autumn leaves. The weather was quite pleasant, with a 
gentle breeze and warm sunshine. I often think about how amazing 
our world is, full of interesting discoveries waiting to be made."
```

### Recording Tips

**Must Have:**

* Only one person speaking
* Steady volume throughout
* Consistent tone and emotion
* Small pauses between sentences (about half a second)

**Nice to Have:**

* No background noise
* No room echo
* Professional mic (but phone is fine too!)

**Avoid:**

* Multiple speakers in one recording
* Big changes in volume or emotion
* Background music or TV
* Rushing through without pauses

## Troubleshooting

### Common Problems

**Voice sounds robotic?**

* Try recording for longer, 30-60 seconds
* Speak more naturally and add pauses

**Voice doesn't sound like you?**

* Make sure you're the only person speaking in the recording
* Check that there's no background music or TV

**Poor audio quality?**

* Find a quieter room to record
* Move closer to your microphone
* Try using a different recording device

## Important: Getting Permission

<Warning>
  Only clone voices you have permission to use:

  * Your own voice
  * Someone who gave you written permission
  * Never use voices from the internet without permission
  * Never use celebrity or public figure voices without permission
</Warning>

## How to Upload Your Recording

<Steps>
  <Step title="Go to Fish Audio">
    Visit [fish.audio](https://fish.audio) and log in
  </Step>

  <Step title="Click 'Create Voice'">
    Find the voice creation button in your dashboard
  </Step>

  <Step title="Upload Your Audio">
    Select your recorded file and give your voice a name
  </Step>

  <Step title="Wait for Processing">
    It usually takes just a few seconds
  </Step>

  <Step title="Try It Out">
    Type some text and hear your cloned voice speak!
  </Step>
</Steps>

## Making Different Voices

Want to create character voices or different styles? Try these:

### Different Emotions

Record the same text with different feelings:

* Happy and energetic
* Calm and relaxed
* Serious and professional

### Different Characters

Create unique voices for:

* Storytelling and audiobooks
* Game characters
* Educational content
* Podcast intros

## Get Help

Need assistance? We're here to help:

* **Community Forum**: [Join our Discord](https://discord.gg/fish-audio)
* **Email Support**: [support@fish.audio](mailto:support@fish.audio)
* **Video Tutorials**: Coming soon!


# Creating Voice Models
Source: https://docs.fish.audio/developer-guide/core-features/creating-models

Learn how to create custom voice models with Fish Audio

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.

## Web Interface

The easiest way to create a voice model:

<Steps>
  <Step title="Go to Fish Audio">
    Visit [fish.audio](https://fish.audio) and log in
  </Step>

  <Step title="Navigate to Models">
    Click on "Models" in your dashboard
  </Step>

  <Step title="Click Create Model">
    Select "Create New Model"
  </Step>

  <Step title="Upload Your Audio">
    Add 1 or more voice samples (at least 10 seconds each)
  </Step>

  <Step title="Configure Settings">
    Choose privacy settings and training options
  </Step>

  <Step title="Start Training">
    Click "Create" and wait for processing
  </Step>
</Steps>

## Using the API

### Using the SDK

Create models with the Python or JavaScript SDK:

<Tabs>
  <Tab title="Python">
    First, install the SDK:

    ```bash theme={null}
    pip install fish-audio-sdk
    ```

    Then create a model:

    ```python theme={null}
    from fish_audio_sdk import Session

    # Initialize session with your API key
    session = Session("your_api_key")

    # Create the model
    model = session.create_model(
        title="My Voice Model",
        description="Custom voice for storytelling",
        voices=[
            voice_file1.read(),
            voice_file2.read()
        ],
        cover_image=image_file.read()  # Optional
    )

    print(f"Model created: {model.id}")
    ```
  </Tab>

  <Tab title="JavaScript">
    First, install the SDK:

    ```bash theme={null}
    npm install fish-audio
    ```

    Then create a model:

    ```javascript theme={null}
    import { FishAudioClient } from "fish-audio";
    import { createReadStream } from "fs";

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

    const title = "My Voice Model";
    const audioFile1 = createReadStream("sample1.mp3");
    // Optionally add more samples:
    // const audioFile2 = createReadStream("sample2.wav");
    const coverImageFile = createReadStream("cover.png"); // optional

    try {
      const response = await fishAudio.voices.ivc.create({
        title,
        voices: [audioFile1],
        cover_image: coverImageFile,
        description: "Custom voice for storytelling",
        visibility: "private",
      });

      console.log("Voice created:", {
        id: response._id,
        title: response.title,
        state: response.state,
      });
    } catch (err) {
      console.error("Create voice request failed:", err);
    }
    ```
  </Tab>
</Tabs>

### Direct API

Create models directly using the REST API:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import requests

    response = requests.post(
        "https://api.fish.audio/model",
        files=[
            ("voices", open("sample1.mp3", "rb")),
            ("voices", open("sample2.wav", "rb"))
        ],
        data=[
            ("title", "My Voice Model"),
            ("description", "Custom voice model"),
            ("visibility", "private"),
            ("type", "tts"),
            ("train_mode", "fast"),
            ("enhance_audio_quality", "true")
        ],
        headers={
            "Authorization": "Bearer YOUR_API_KEY"
        }
    )

    result = response.json()
    print(f"Model ID: {result['id']}")
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { readFile } from "fs/promises";

    const form = new FormData();
    form.append("title", "My Voice Model");
    form.append("description", "Custom voice model");
    form.append("visibility", "private");
    form.append("type", "tts");
    form.append("train_mode", "fast");
    form.append("enhance_audio_quality", "true");

    const v1 = await readFile("sample1.mp3");
    const v2 = await readFile("sample2.wav");
    form.append("voices", new File([v1], "sample1.mp3"));
    form.append("voices", new File([v2], "sample2.wav"));

    const res = await fetch("https://api.fish.audio/model", {
      method: "POST",
      headers: { Authorization: "Bearer <YOUR_API_KEY>" },
      body: form,
    });

    const result = await res.json();
    console.log("Model ID:", result.id);
    ```
  </Tab>
</Tabs>

## Model Settings

### Required Parameters

| Parameter         | Description                                                           | Type           | Options                 |
| ----------------- | --------------------------------------------------------------------- | -------------- | ----------------------- |
| **title**         | Name of your model                                                    | `string`       | Any text                |
| **voices**        | Audio samples                                                         | `Array<File>`  | .mp3, .wav, .m4a, .opus |
| **type**\*        | Model type                                                            | `enum<string>` | `tts`                   |
| **train\_mode**\* | Model train mode, fast means model instantly available after creation | `enum<string>` | `fast`                  |

\*Automatically set by Python and JavaScript SDKs

### Optional Parameters

| Parameter                   | Description                                        | Type            | Options                                              |
| --------------------------- | -------------------------------------------------- | --------------- | ---------------------------------------------------- |
| **visibility**              | Who can use your model                             | `enum<string>`  | `private`, `public`, `unlist`<br />`default: public` |
| **description**             | Model description                                  | `string`        | Any text                                             |
| **cover\_image**            | Model cover image, required if the model is public | `File`          | .jpg, .png                                           |
| **texts**                   | Transcripts of audio samples                       | `Array<string>` | Must match number of audio files                     |
| **tags**                    | Tags for your model                                | `string[]`      | Any text                                             |
| **enhance\_audio\_quality** | Remove background noise                            | `boolean`       | `true`, `false`<br />`default: false`                |

For detailed explanations view our [API reference](/api-reference/endpoint/model/create-model).

## Audio Requirements

### Quality Guidelines

**Minimum Requirements:**

* At least 1 audio sample
* 10+ seconds per sample

**Best Practices:**

* Use multiple diverse samples
* 1 consistent speaker throughout
* Include different emotions and tones
* Record in a quiet environment
* Maintain steady volume

## Adding Transcripts

Including text transcripts improves model quality:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    response = requests.post(
        "https://api.fish.audio/model",
        files=[
            ("voices", open("hello.mp3", "rb")),
            ("voices", open("world.wav", "rb"))
        ],
        data=[
            ("title", "Enhanced Model"),
            ("texts", "Hello, this is my first recording."),
            ("texts", "Welcome to the world of AI voices."),
            # ... other parameters
        ],
        headers={"Authorization": "Bearer YOUR_API_KEY"}
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { FishAudioClient } from "fish-audio";
    import { createReadStream } from "fs";

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

    const response = await fishAudio.voices.ivc.create({
      title: "Enhanced Model",
      voices: [
        createReadStream("hello.mp3"),
        createReadStream("world.wav"),
      ],
      texts: [
        "Hello, this is my first recording.",
        "Welcome to the world of AI voices.",
      ],
      // other optional fields:
      // visibility: "private",
      // enhance_audio_quality: true,
    });

    console.log("Model ID:", response._id);
    ```
  </Tab>
</Tabs>

<Note>
  Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.
</Note>

## Using Your Model

Once training is complete:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Generate speech with your model
    response = requests.post(
        "https://api.fish.audio/v1/tts",
        json={
            "text": "Hello from my custom voice!",
            "model_id": model_id,
            "format": "mp3"
        },
        headers={"Authorization": "Bearer YOUR_API_KEY"}
    )

    # Save the audio
    with open("output.mp3", "wb") as f:
        f.write(response.content)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { FishAudioClient } from "fish-audio";
    import { writeFile } from "fs/promises";

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

    const audio = await fishAudio.textToSpeech.convert({
      text: "Hello from my custom voice!",
      model_id: "your_model_id_here",
      format: "mp3",
    });

    const buffer = Buffer.from(await new Response(audio).arrayBuffer());
    await writeFile("output.mp3", buffer);
    console.log("✓ Audio saved to output.mp3");
    ```
  </Tab>
</Tabs>

## Troubleshooting

### Common Issues

**Model training fails:**

* Check audio quality and format
* Ensure single speaker in all samples
* Verify files are not corrupted

**Poor voice quality:**

* Add more diverse audio samples
* Enable audio enhancement
* Use higher quality recording

## Best Practices

1. **Start Simple:** Begin with 2-3 samples in fast mode to test
2. **Iterate:** Refine with more samples and quality mode
3. **Document:** Keep track of which samples work best
4. **Test Thoroughly:** Try different texts and emotions
5. **Privacy First:** Keep personal models private

## Support

Need help creating models?

* **API Documentation:** [Full API Reference](/api-reference/introduction)
* **Discord Community:** [Join our Discord](https://discord.gg/fish-audio)
* **Email Support:** [support@fish.audio](mailto:support@fish.audio)


# Emotion Control
Source: https://docs.fish.audio/developer-guide/core-features/emotions

Add natural emotions and expressions to your AI-generated speech

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Fish Audio models support 64+ emotional expressions and voice styles that can be controlled through text markers in your input. Add natural pauses, laughter, and other human-like elements to make speech more engaging and realistic.

<Tip>
  The `(parenthesis)` syntax on this page applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

## How It Works

Simply wrap emotion tags in parentheses within your text:

```
(happy) What a beautiful day!
(sad) I'm sorry to hear that.
(excited) This is amazing news!
```

The TTS models will automatically recognize these markers and adjust the voice accordingly.

## Complete Emotion Reference

### Basic Emotions (24 expressions)

| Emotion     | Tag             | Description             | Example Context             |
| ----------- | --------------- | ----------------------- | --------------------------- |
| Happy       | `(happy)`       | Cheerful, upbeat tone   | Good news, greetings        |
| Sad         | `(sad)`         | Melancholic, downcast   | Sympathy, bad news          |
| Angry       | `(angry)`       | Frustrated, aggressive  | Complaints, warnings        |
| Excited     | `(excited)`     | Energetic, enthusiastic | Announcements, celebrations |
| Calm        | `(calm)`        | Peaceful, relaxed       | Instructions, meditation    |
| Nervous     | `(nervous)`     | Anxious, uncertain      | Disclaimers, apologies      |
| Confident   | `(confident)`   | Assertive, self-assured | Presentations, sales        |
| Surprised   | `(surprised)`   | Shocked, amazed         | Reactions, discoveries      |
| Satisfied   | `(satisfied)`   | Content, pleased        | Confirmations, reviews      |
| Delighted   | `(delighted)`   | Very pleased, joyful    | Celebrations, compliments   |
| Scared      | `(scared)`      | Frightened, fearful     | Warnings, horror stories    |
| Worried     | `(worried)`     | Concerned, troubled     | Concerns, questions         |
| Upset       | `(upset)`       | Disturbed, distressed   | Complaints, problems        |
| Frustrated  | `(frustrated)`  | Annoyed, exasperated    | Technical issues, delays    |
| Depressed   | `(depressed)`   | Very sad, hopeless      | Serious topics              |
| Empathetic  | `(empathetic)`  | Understanding, caring   | Support, counseling         |
| Embarrassed | `(embarrassed)` | Ashamed, awkward        | Apologies, mistakes         |
| Disgusted   | `(disgusted)`   | Repelled, revolted      | Negative reviews            |
| Moved       | `(moved)`       | Emotionally touched     | Heartfelt moments           |
| Proud       | `(proud)`       | Accomplished, satisfied | Achievements, praise        |
| Relaxed     | `(relaxed)`     | At ease, casual         | Casual conversation         |
| Grateful    | `(grateful)`    | Thankful, appreciative  | Thanks, appreciation        |
| Curious     | `(curious)`     | Inquisitive, interested | Questions, exploration      |
| Sarcastic   | `(sarcastic)`   | Ironic, mocking         | Humor, criticism            |

### Advanced Emotions (25 expressions)

| Emotion       | Tag               | Description              | Example Context        |
| ------------- | ----------------- | ------------------------ | ---------------------- |
| Disdainful    | `(disdainful)`    | Contemptuous, scornful   | Criticism, rejection   |
| Unhappy       | `(unhappy)`       | Discontent, dissatisfied | Complaints, feedback   |
| Anxious       | `(anxious)`       | Very worried, uneasy     | Urgent matters         |
| Hysterical    | `(hysterical)`    | Uncontrollably emotional | Extreme reactions      |
| Indifferent   | `(indifferent)`   | Uncaring, neutral        | Neutral responses      |
| Uncertain     | `(uncertain)`     | Doubtful, unsure         | Speculation, questions |
| Doubtful      | `(doubtful)`      | Skeptical, questioning   | Disbelief, questioning |
| Confused      | `(confused)`      | Puzzled, perplexed       | Clarification requests |
| Disappointed  | `(disappointed)`  | Let down, dissatisfied   | Unmet expectations     |
| Regretful     | `(regretful)`     | Sorry, remorseful        | Apologies, mistakes    |
| Guilty        | `(guilty)`        | Culpable, responsible    | Confessions, apologies |
| Ashamed       | `(ashamed)`       | Deeply embarrassed       | Serious mistakes       |
| Jealous       | `(jealous)`       | Envious, resentful       | Comparisons            |
| Envious       | `(envious)`       | Wanting what others have | Admiration with desire |
| Hopeful       | `(hopeful)`       | Optimistic about future  | Future plans           |
| Optimistic    | `(optimistic)`    | Positive outlook         | Encouragement          |
| Pessimistic   | `(pessimistic)`   | Negative outlook         | Warnings, doubts       |
| Nostalgic     | `(nostalgic)`     | Longing for the past     | Memories, stories      |
| Lonely        | `(lonely)`        | Isolated, alone          | Emotional content      |
| Bored         | `(bored)`         | Uninterested, weary      | Disinterest            |
| Contemptuous  | `(contemptuous)`  | Showing contempt         | Strong criticism       |
| Sympathetic   | `(sympathetic)`   | Showing sympathy         | Condolences            |
| Compassionate | `(compassionate)` | Showing deep care        | Support, help          |
| Determined    | `(determined)`    | Resolved, decided        | Goals, commitments     |
| Resigned      | `(resigned)`      | Accepting defeat         | Giving up, acceptance  |

### Tone Markers (5 expressions)

Control volume and intensity:

| Tone       | Tag                 | Description          | When to Use                |
| ---------- | ------------------- | -------------------- | -------------------------- |
| Hurried    | `(in a hurry tone)` | Rushed, urgent       | Time-sensitive information |
| Shouting   | `(shouting)`        | Loud, calling out    | Getting attention          |
| Screaming  | `(screaming)`       | Very loud, panicked  | Emergencies, fear          |
| Whispering | `(whispering)`      | Very soft, secretive | Secrets, quiet scenes      |
| Soft       | `(soft tone)`       | Gentle, quiet        | Comfort, lullabies         |

### Audio Effects (10 expressions)

Add natural human sounds:

| Effect        | Tag               | Description                  | Suggested Text |
| ------------- | ----------------- | ---------------------------- | -------------- |
| Laughing      | `(laughing)`      | Full laughter                | Ha, ha, ha     |
| Chuckling     | `(chuckling)`     | Light laugh                  | Heh, heh       |
| Sobbing       | `(sobbing)`       | Crying heavily               | (optional)     |
| Crying Loudly | `(crying loudly)` | Intense crying               | (optional)     |
| Sighing       | `(sighing)`       | Exhale of relief/frustration | sigh           |
| Groaning      | `(groaning)`      | Sound of frustration         | ugh            |
| Panting       | `(panting)`       | Out of breath                | huff, puff     |
| Gasping       | `(gasping)`       | Sharp intake of breath       | gasp           |
| Yawning       | `(yawning)`       | Tired sound                  | yawn           |
| Snoring       | `(snoring)`       | Sleep sound                  | zzz            |

### Special Effects

Additional markers for atmosphere and context:

| Effect              | Tag                     | Description              |
| ------------------- | ----------------------- | ------------------------ |
| Audience Laughter   | `(audience laughing)`   | Crowd laughing sound     |
| Background Laughter | `(background laughter)` | Ambient laughter         |
| Crowd Laughter      | `(crowd laughing)`      | Large group laughing     |
| Short Pause         | `(break)`               | Brief pause in speech    |
| Long Pause          | `(long-break)`          | Extended pause in speech |

You can also use natural expressions like "Ha,ha,ha" for laughter without tags.

## Usage Guidelines

### Placement Rules

**For English and Most Languages:**

* Emotion tags MUST go at the beginning of sentences
* Tone controls can go anywhere in the text
* Sound effects can go anywhere in the text

**Correct:**

```
(happy) What a wonderful day!
```

**Incorrect:**

```
What a (happy) wonderful day!
```

## Advanced Techniques

### Combining Effects

You can layer multiple emotions for complex expressions:

```
(sad)(whispering) I miss you so much.
(angry)(shouting) Get out of here now!
(excited)(laughing) We won! Ha ha!
```

### Emotion Transitions

Create natural emotional progressions:

```
(happy) I got the promotion!
(uncertain) But... it means relocating.
(sad) I'll miss everyone here.
(hopeful) Though it's a great opportunity.
(determined) I'm going to make it work!
```

### Background Effects

Add atmospheric sounds:

```
The comedy show was amazing (audience laughing)
Everyone was having fun (background laughter)
The crowd loved it (crowd laughing)
```

### Intensity Modifiers

Fine-tune emotional intensity with descriptive modifiers:

```
(slightly sad) I'm a bit disappointed.
(very excited) This is absolutely amazing!
(extremely angry) This is unacceptable!
```

## Language Support

All 13 supported languages can use emotion markers. Emotions must be at sentence start for these languages:

* **English, Chinese, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, Portuguese**

## Best Practices

### Do's

* Use one primary emotion per sentence
* Test different emotion combinations
* Match emotions to context logically
* Add appropriate text after sound effects (e.g., "Ha ha" after laughing)
* Use natural expressions when possible
* Space out emotional changes for realism

### Don'ts

* Don't overuse emotion tags in short text
* Don't mix conflicting emotions
* Don't create custom tags - use only supported ones
* Don't forget parentheses
* Don't place emotion tags mid-sentence in English

## Common Use Cases

### Customer Service

```
(friendly) Hello! How can I help you today?
(empathetic) I understand your frustration.
(confident) I'll resolve this for you right away.
(grateful) Thank you for your patience!
```

### Storytelling

```
(narrator) Once upon a time...
(mysterious)(whispering) The old house stood silent.
(scared) "Is anyone there?" she called out.
(relieved)(sighing) No one answered. Phew.
```

### Educational Content

```
(enthusiastic) Welcome to today's lesson!
(curious) Have you ever wondered why?
(encouraging) That's a great question!
(proud) Excellent work!
```

### Marketing & Sales

```
(excited) Introducing our newest product!
(confident) You won't find better quality anywhere.
(urgent) Limited time offer!
(satisfied) Join thousands of happy customers!
```

## Troubleshooting

### Emotion Not Working?

1. **Check placement** - Emotions must be at the beginning of sentences for English
2. **Verify spelling** - Tags must match exactly as listed
3. **Include parentheses** - Tags must be wrapped in parentheses

### Unnatural Sound?

* Space out emotional changes
* Use appropriate intensity
* Test with different voices
* Add context text after sound effects

### Performance Notes

* Emotion markers don't count toward token limits
* No additional latency for emotion processing
* All emotions available on all pricing tiers
* Maximum of 3 combined emotions per sentence recommended

## Quick Reference Tables

### Emotion Intensity Scale

| Base Emotion | Mild         | Moderate | Intense   |
| ------------ | ------------ | -------- | --------- |
| Happy        | satisfied    | happy    | delighted |
| Sad          | disappointed | sad      | depressed |
| Angry        | frustrated   | angry    | furious   |
| Scared       | nervous      | scared   | terrified |
| Excited      | interested   | excited  | ecstatic  |

### Common Combinations

| Scenario         | Emotion Combo            | Example                               |
| ---------------- | ------------------------ | ------------------------------------- |
| Whispered Secret | (mysterious)(whispering) | "I have something to tell you..."     |
| Angry Shout      | (angry)(shouting)        | "Stop right there!"                   |
| Sad Sigh         | (sad)(sighing)           | "I wish things were different. Sigh." |
| Excited Laugh    | (excited)(laughing)      | "We did it! Ha ha!"                   |
| Nervous Question | (nervous)(uncertain)     | "Are you sure about this?"            |

## See Also

* [Emotion Reference Guide](/api-reference/emotion-reference) - Complete emotion list with examples
* [API Reference](/api-reference/introduction) - Implementation details
* [Text-to-Speech Guide and Best Practices](/developer-guide/core-features/text-to-speech)


# Fine-grained Control
Source: https://docs.fish.audio/developer-guide/core-features/fine-grained-control

Advanced control over speech generation

<Visibility>
  <AudioTranscript />
</Visibility>

## Getting Started

To use fine-grained control, you can use either our SDK, API, or Playground.

SDK/API: We recommend disabling normalization by setting `"normalize": false` in the request body. This ensures that the API doesn't alter the intonation of control tags.

Playground: You can use V1.6 Control Model, without setting any other options.

<Note>
  Disabling normalization may reduce the stability of reading numbers, dates, and URLs. You'll need to handle these cases manually for best results.
</Note>

## Phoneme Control

Phoneme control allows you to specify exact pronunciations for words or characters. Currently, we support:

* CMU Arpabet (for English)
* Pinyin (for Chinese)

To use phoneme control, wrap the desired pronunciation in `<|phoneme_start|>` and `<|phoneme_end|>` tags. Each tag should contain a single word or character.

### English Example

Standard: "I am an engineer."
With phoneme control: "I am an `<|phoneme_start|>EH N JH AH N IH R<|phoneme_end|>`."

### Chinese Example

Standard: "我是一个工程师。"
With phoneme control: "我是一个`<|phoneme_start|>gong1<|phoneme_end|><|phoneme_start|>cheng2<|phoneme_end|><|phoneme_start|>shi1<|phoneme_end|>`。"

## Paralanguage

Paralanguage controls allow you to add natural speech elements and pauses to make the generated speech sound more human-like. There are two main types of controls:

### Pause Words

You can use common pause words like "um", "uh", "嗯", "啊" to control the rhythm of the speech.

### Special Effects

The following special effects can be added using parentheses:

| Effect           | Description        | First Available | Stage        |
| ---------------- | ------------------ | --------------- | ------------ |
| `(break)`        | Short pause        | V1.6            | Experimental |
| `(long-break)`   | Extended pause     | V1.6            | Experimental |
| `(breath)`       | Breathing sound    | V1.6            | Experimental |
| `(laugh)`        | Laughter sound     | V1.6            | Experimental |
| `(cough)`        | Coughing sound     | V1.6            | Experimental |
| `(lip-smacking)` | Lip smacking sound | V1.6            | Experimental |
| `(sigh)`         | Sighing sound      | V1.6            | Experimental |

<Warning>
  The effects `(laugh)`, `(cough)`, `(lip-smacking)`, and `(sigh)` are developing. You may need to repeat them multiple times for better results.
</Warning>

Example:
Standard: "I am an engineer."
With paralanguage: "I am, um, an (break) engineer."


# Speech to Text Guide
Source: https://docs.fish.audio/developer-guide/core-features/speech-to-text

Convert audio recordings into accurate text transcriptions

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Transform any audio recording into text with Fish Audio's speech recognition. Perfect for transcriptions, subtitles, and voice commands.

## Getting Started

### Web Interface

Transcribe audio instantly:

<Steps>
  <Step title="Visit Fish Audio">
    Go to [fish.audio](https://fish.audio) and log in
  </Step>

  <Step title="Navigate to Transcribe">
    Click on "Speech to Text" in your dashboard
  </Step>

  <Step title="Upload Audio">
    Select your audio file (MP3, WAV, M4A)
  </Step>

  <Step title="Get Transcription">
    Click "Transcribe" and copy your text
  </Step>
</Steps>

## Supported Formats

### Audio Files

**Accepted formats:**

* MP3 (recommended)
* WAV
* M4A
* OGG
* FLAC
* AAC

**File requirements:**

* Maximum size: 20MB
* Maximum duration: 60 minutes
* Minimum duration: 1 second

## Language Support

### Automatic Detection

The system automatically detects the language spoken in your audio. No configuration needed!

### Manual Selection

For better accuracy, specify the language:

**Major Languages:**

* English (en)
* Chinese (zh)
* Japanese (ja)

With **additional languages** to be supported soon!

## Audio Quality Tips

### For Best Results

**Recording Environment:**

* Quiet room with minimal echo
* No background music
* Clear, consistent speaking voice
* One speaker at a time

**Audio Settings:**

* Sample rate: 16kHz or higher
* Bit rate: 128kbps or higher
* Mono or stereo (mono preferred)

### Common Issues

**Poor transcription quality?**

* Remove background noise
* Increase microphone volume
* Speak clearly and not too fast
* Avoid multiple speakers talking over each other

## Use Cases

### Meeting Transcription

Convert recorded meetings into searchable text:

1. Record your meeting (Zoom, Teams, etc.)
2. Export the audio file
3. Upload to Fish Audio
4. Get formatted transcription with timestamps

### Podcast Transcripts

Create written versions of your podcasts:

* Generate show notes automatically
* Create searchable content
* Improve accessibility
* Enable translations

### Video Subtitles

Generate subtitles for your videos:

1. Extract audio from video
2. Transcribe with Fish Audio
3. Get timestamped text
4. Import into video editor

### Voice Notes

Convert voice memos to text:

* Dictate ideas quickly
* Transcribe later for editing
* Search through voice notes
* Share as text documents

## Advanced Features

### Timestamps

Get precise timing for each spoken segment:

```
[00:00:00] Welcome to our podcast.
[00:00:03] Today we're discussing AI technology.
[00:00:07] Let's dive right in.
```

Perfect for:

* Creating subtitles
* Navigating long recordings
* Synchronizing with video
* Building searchable archives

### Speaker Detection

Identify different speakers in conversations:

```
Speaker 1: "What do you think about the proposal?"
Speaker 2: "I think it has potential."
Speaker 1: "Let's discuss the details."
```

### Punctuation & Formatting

Automatic formatting includes:

* Sentence capitalization
* Punctuation marks
* Paragraph breaks
* Number formatting

## Tips for Different Content

### Interviews

**Best practices:**

* Use a good microphone for each speaker
* Record in a quiet environment
* Speak one at a time
* Keep consistent volume levels

### Lectures & Presentations

**Optimize for:**

* Clear articulation of technical terms
* Pause between topics
* Repeat important points
* Avoid reading too fast

### Phone Calls

**Considerations:**

* Phone audio is lower quality
* Expect slightly lower accuracy
* Speak clearly and slowly
* Avoid speakerphone if possible

## Accuracy Expectations

### What Affects Accuracy

**Positive factors:**

* Clear audio quality
* Native speaker accent
* Common vocabulary
* Single speaker

**Challenging factors:**

* Heavy accents
* Technical jargon
* Multiple speakers
* Background noise

### Typical Accuracy Rates

* **Professional recording:** 95-98%
* **Clean amateur recording:** 90-95%
* **Phone/video calls:** 85-90%
* **Noisy environments:** 75-85%

## Post-Processing Tips

### Editing Transcriptions

After transcription:

1. **Review for accuracy** - Check names and technical terms
2. **Add formatting** - Break into paragraphs
3. **Correct errors** - Fix any misheard words
4. **Add context** - Include speaker names

### Export Options

Save your transcriptions as:

* Plain text (.txt)
* Word document (.docx)
* Subtitle file (.srt)
* PDF document

## Common Applications

### Business

* Meeting minutes
* Interview transcripts
* Call recordings
* Training materials

### Education

* Lecture notes
* Research interviews
* Student recordings
* Language learning

### Content Creation

* Video scripts
* Podcast show notes
* Social media captions
* Blog post drafts

### Accessibility

* Hearing impaired support
* Multi-language content
* Searchable archives
* Documentation

## Troubleshooting

### No Text Output

**Check:**

* Audio file isn't corrupted
* File format is supported
* Audio contains speech
* Volume is audible

### Incorrect Language

**Solutions:**

* Manually select the correct language
* Ensure majority of audio is in one language
* Separate multi-language content

### Missing Words

**Common causes:**

* Speaking too fast
* Mumbling or unclear speech
* Technical terms not recognized
* Very quiet sections

## Privacy & Security

### Your Data

* Audio files are processed securely
* Transcriptions are private to your account
* Files are not used for training
* Delete anytime from your account

### Sensitive Content

For confidential audio:

* Use on-premise solutions if available
* Review privacy policy
* Consider redacting sensitive information
* Download and delete after processing

## Best Practices Summary

1. **Start with quality audio** - Good input = good output
2. **Choose the right environment** - Quiet spaces work best
3. **Speak clearly** - Articulate and consistent pace
4. **Review and edit** - All transcriptions benefit from review
5. **Use appropriate tools** - Different content needs different approaches

## Get Support

Need help with transcription?

* **Try it free:** [fish.audio](https://fish.audio)
* **Community:** [Discord](https://discord.gg/fish-audio)
* **Email:** [support@fish.audio](mailto:support@fish.audio)
* **Status:** [status.fish.audio](https://status.fish.audio)


# Text to Speech
Source: https://docs.fish.audio/developer-guide/core-features/text-to-speech

Convert text to natural-sounding speech with Fish Audio

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Transform any text into natural, expressive speech using Fish Audio's advanced TTS models. Choose from pre-made voices or use your own cloned voices.

<Visibility>
  <AudioSample />
</Visibility>

Discover the world's best cloned voices models on our [Discovery](https://fish.audio/discovery) page.

## Quick Start

### Web Interface

The easiest way to generate speech:

<Steps>
  <Step title="Visit Playground">
    Go to [fish.audio](https://fish.audio) and log in
  </Step>

  <Step title="Enter Your Text">
    Type or paste the text you want to convert
  </Step>

  <Step title="Choose a Voice">
    Select from available voices or use your own
  </Step>

  <Step title="Generate">
    Click "Generate" and download your audio
  </Step>
</Steps>

## Using the SDK

<Tabs>
  <Tab title="Python">
    <Steps>
      <Step title="Install the SDK">
        ```bash theme={null}
        pip install fish-audio-sdk
        ```
      </Step>

      <Step title="Basic Usage">
        Generate speech with just a few lines of code:

        ```python theme={null}
        from fishaudio import FishAudio
        from fishaudio.utils import save

        # Initialize client
        client = FishAudio(api_key="your_api_key_here")

        # Generate speech
        audio = client.tts.convert(
            text="Hello, world!",
            reference_id="your_voice_model_id"
        )
        save(audio, "output.mp3")

        print("✓ Audio saved to output.mp3")
        ```
      </Step>
    </Steps>
  </Tab>

  <Tab title="JavaScript">
    <Steps>
      <Step title="Install the SDK">
        ```bash theme={null}
        npm install fish-audio
        ```
      </Step>

      <Step title="Basic Usage">
        Generate speech with just a few lines of code:

        ```javascript theme={null}
        import { FishAudioClient } from "fish-audio";
        import { writeFile } from "fs/promises";

        // Initialize session
        const fishAudio = new FishAudioClient({ apiKey: "your_api_key_here" });

        const audio = await fishAudio.textToSpeech.convert({
            text: "Hello, world!",
            reference_id: "your_voice_model_id",
        });

        const buffer = Buffer.from(await new Response(audio).arrayBuffer());
        await writeFile("output.mp3", buffer);

        console.log("✓ Audio saved to output.mp3");
        ```
      </Step>
    </Steps>
  </Tab>
</Tabs>

## Voice Options

### Using Pre-made Voices

Browse and select voices from the playground:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Use a voice from the playground
    audio = client.tts.convert(
        text="Welcome to Fish Audio!",
        reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    # Use a voice from the playground
    const audio = await fishAudio.textToSpeech.convert({
        text: "Welcome to Fish Audio!",
        reference_id: "7f92f8afb8ec43bf81429cc1c9199cb1",
    });
    ```
  </Tab>
</Tabs>

### Using Your Cloned Voice

Use voices you've created:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Use your own cloned voice
    audio = client.tts.convert(
        text="This is my custom voice speaking",
        reference_id="your_model_id"
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    # Use your own cloned voice
    const audio = await fishAudio.textToSpeech.convert({
        text: "This is my custom voice speaking",
        reference_id: "your_model_id",
    });
    ```
  </Tab>
</Tabs>

### Using Reference Audio

Provide reference audio directly:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from fishaudio.types import ReferenceAudio

    # Use reference audio on-the-fly
    with open("voice_sample.wav", "rb") as f:
        audio = client.tts.convert(
            text="Hello from reference audio",
            references=[
                ReferenceAudio(
                    audio=f.read(),
                    text="Sample text from the audio"
                )
            ]
        )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Use reference audio on-the-fly
    const fileBuffer = await readFile("voice_sample.wav");
    const voiceFile = new File([fileBuffer], "voice_sample.wav");

    const audio = await fishAudio.textToSpeech.convert({
        text: "Hello from reference audio",
        references: [
            { audio: voiceFile, text: "Sample text from the audio" }
        ]
    });
    ```
  </Tab>
</Tabs>

## Model Selection

Choose the right model for your needs:

| Model      | Best For        | Quality   | Speed   |
| ---------- | --------------- | --------- | ------- |
| **s1**     | Prototyping     | Excellent | Fast    |
| **s2-pro** | Latest features | Excellent | Fastest |

Specify a model in your request:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Using the latest model (default)
    audio = client.tts.convert(text="Hello world")
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Using the latest S2-Pro model
    const audio = await fishAudio.textToSpeech.convert(
        { text: "Hello world" },
        "s2-pro"
    );
    ```
  </Tab>
</Tabs>

## Advanced Options

### Audio Formats

Choose your output format:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    audio = client.tts.convert(
        text="Your text here",
        format="mp3",  # Options: "mp3", "wav", "pcm", "opus"
        mp3_bitrate=128  # For MP3: 64, 128, or 192
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const audio = await fishAudio.textToSpeech.convert({
        text: "Your text here",
        format: "mp3", // Options: "mp3", "wav", "pcm", "opus"
        mp3_bitrate: 128, // For MP3: 64, 128, or 192
    });
    ```
  </Tab>
</Tabs>

### Chunk Length

Control text processing chunks:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    audio = client.tts.convert(
        text="Long text content...",
        chunk_length=200  # 100-300 characters per chunk
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const audio = await fishAudio.textToSpeech.convert({
        text: "Long text content...",
        chunk_length: 200, // 100-300 characters per chunk
    });
    ```
  </Tab>
</Tabs>

### Latency Mode

Optimize for speed or quality:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    audio = client.tts.convert(
        text="Quick response needed",
        latency="balanced"  # "normal" or "balanced"
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const audio = await fishAudio.textToSpeech.convert({
        text: "Quick response needed",
        latency: "balanced", // "normal" or "balanced"
    });
    ```
  </Tab>
</Tabs>

<Note>
  Balanced mode reduces latency to \~300ms but may slightly decrease stability.
</Note>

## Direct API Usage

For direct API calls without the SDK:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import httpx
    import ormsgpack

    # Prepare request
    request_data = {
        "text": "Hello, world!",
        "reference_id": "your_model_id",
        "format": "mp3"
    }

    # Make API call
    with httpx.Client() as client:
        response = client.post(
            "https://api.fish.audio/v1/tts",
            content=ormsgpack.packb(request_data),
            headers={
                "authorization": "Bearer YOUR_API_KEY",
                "content-type": "application/msgpack",
                "model": "s2-pro"
            }
        )
        
        # Save audio
        with open("output.mp3", "wb") as f:
            f.write(response.content)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { encode } from "@msgpack/msgpack";
    import { writeFile } from "fs/promises";

    const body = encode({
        text: "Hello, world!",
        reference_id: "your_model_id",
        format: "mp3",
    });

    const res = await fetch("https://api.fish.audio/v1/tts", {
        method: "POST",
        headers: {
            Authorization: "Bearer <YOUR_API_KEY>",
            "Content-Type": "application/msgpack",
            model: "s2-pro",
        },
        body,
    });

    const buffer = Buffer.from(await res.arrayBuffer());
    await writeFile("output.mp3", buffer);
    ```
  </Tab>
</Tabs>

## Streaming Audio

Stream audio for real-time applications:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Stream audio chunks
    audio_stream = client.tts.stream(
        text="Streaming this text in real-time",
        reference_id="model_id"
    )

    with open("stream_output.mp3", "wb") as f:
        for chunk in audio_stream:
            f.write(chunk)
            # Process chunk immediately for real-time playback
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Use a Websocket to stream real-time audio

    import { FishAudioClient, RealtimeEvents } from "fish-audio";
    import { writeFile } from "fs/promises";
    import path from "path";

    // Simple async generator that yields text chunks
    async function* makeTextStream() {
        const chunks = [
            "Hello from Fish Audio! ",
            "This is a realtime text-to-speech test. ",
            "We are streaming multiple chunks over WebSocket.",
        ];
        for (const chunk of chunks) {
            yield chunk;
        }
    }

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

    // For realtime, set text to "" and stream the content via makeTextStream
    const request = { text: "" };

    const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());

    // Collect audio and write to a file when the stream ends
    const chunks: Buffer[] = [];
    connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
    connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
        if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
            chunks.push(Buffer.from(audio));
        }
    });
    connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
    connection.on(RealtimeEvents.CLOSE, async () => {
        const outPath = path.resolve(process.cwd(), "out.mp3");
        await writeFile(outPath, Buffer.concat(chunks));
        console.log("Saved to", outPath);
    });
    ```
  </Tab>
</Tabs>

## Adding Emotions

<Tip>
  The `(parenthesis)` syntax below applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

Make your speech more expressive:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Add emotion markers to your text
    emotional_text = """
    (excited) I just won the lottery!
    (sad) But then I lost the ticket.
    (laughing) Just kidding, I found it!
    """

    audio = client.tts.convert(
        text=emotional_text,
        reference_id="model_id"
    )
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Add emotion markers to your text
    const emotionalText = `(excited) I just won the lottery!
    (sad) But then I lost the ticket.
    (laughing) Just kidding, I found it!`;

    const audio = await fishAudio.textToSpeech.convert({
        text: emotionalText,
        reference_id: "model_id",
    });
    ```
  </Tab>
</Tabs>

Available emotions:

* Basic: `(happy)`, `(sad)`, `(angry)`, `(excited)`, `(calm)`
* Tones: `(shouting)`, `(whispering)`, `(soft tone)`
* Effects: `(laughing)`, `(sighing)`, `(crying)`

For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see [Fine-grained Control](/developer-guide/core-features/fine-grained-control).

## Best Practices

### Text Preparation

**Do:**

* Use proper punctuation for natural pauses
* Add emotion markers for expression
* Break long texts into paragraphs
* Use consistent formatting

**Don't:**

* Use ALL CAPS (unless shouting)
* Mix multiple languages randomly
* Include special characters unnecessarily
* Forget punctuation

### Performance Tips

1. **Batch Processing:** Process multiple texts efficiently
2. **Cache Models:** Store frequently used model IDs
3. **Optimize Chunk Size:** Use 200 characters for best balance
4. **Handle Errors:** Implement retry logic for network issues

### Quality Optimization

For best results:

* Use high-quality reference audio for cloning
* Choose appropriate emotion markers
* Test different latency modes
* Monitor API rate limits

## Troubleshooting

### Common Issues

**No audio output:**

* Check API key validity
* Verify model ID exists
* Ensure proper audio format

**Poor quality:**

* Use better reference audio
* Try normal latency mode
* Check text formatting

**Slow generation:**

* Use balanced latency mode
* Reduce chunk length
* Check network connection

## Code Examples

### Batch Processing

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from fishaudio.utils import save

    texts = [
        "First announcement",
        "Second announcement",
        "Third announcement"
    ]

    for i, text in enumerate(texts):
        audio = client.tts.convert(
            text=text,
            reference_id="model_id"
        )
        save(audio, f"output_{i}.mp3")
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const texts = [
        "First announcement",
        "Second announcement",
        "Third announcement",
    ];

    for (let i = 0; i < texts.length; i++) {
        const audio = await fishAudio.textToSpeech.convert({
            text: texts[i],
            reference_id: "model_id",
        });
        const buffer = Buffer.from(await new Response(audio).arrayBuffer());
        await writeFile(`output_${i}.mp3`, buffer);
    }
    ```
  </Tab>
</Tabs>

### Error Handling

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import time
    from fishaudio.exceptions import FishAudioError

    def generate_with_retry(text, max_retries=3):
        for attempt in range(max_retries):
            try:
                audio = client.tts.convert(
                    text=text,
                    reference_id="model_id"
                )
                return audio
            except FishAudioError as e:
                if attempt < max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                else:
                    raise e
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    async function generateWithRetry(text, maxRetries = 3) {
        for (let attempt = 0; attempt < maxRetries; attempt++) {
            try {
                const audio = await fishAudio.textToSpeech.convert({
                    text,
                    reference_id: "model_id",
                });
                const buffer = Buffer.from(await new Response(audio).arrayBuffer());
                return buffer;
            } catch (err) {
                if (attempt < maxRetries - 1) {
                    const delayMs = 2 ** attempt * 1000;
                    await new Promise((r) => setTimeout(r, delayMs));
                } else {
                    throw err;
                }
            }
        }
    }

    const buffer = await generateWithRetry("Hello with retry");
    await writeFile("retry_output.mp3", buffer);
    ```
  </Tab>
</Tabs>

## API Reference

### Request Parameters

| Parameter         | Type    | Description          | Default  |
| ----------------- | ------- | -------------------- | -------- |
| **text**          | string  | Text to convert      | Required |
| **reference\_id** | string  | Model/voice ID       | None     |
| **format**        | string  | Audio format         | "mp3"    |
| **chunk\_length** | integer | Characters per chunk | 200      |
| **normalize**     | boolean | Normalize text       | true     |
| **latency**       | string  | Speed vs quality     | "normal" |

### Response

Returns audio data in the specified format as binary stream.

## Get Support

Need help with text-to-speech?

* [API Reference](/api-reference/introduction)
* **Discord Community:** [Join our Discord](https://discord.gg/fish-audio)
* **Email Support:** [support@fish.audio](mailto:support@fish.audio)


# Changelog
Source: https://docs.fish.audio/developer-guide/getting-started/changelog

Complete release history and version updates for all Fish Audio products

<Update label="Fish Audio S2" description="March 2026">
  ## Fish Audio S2

  Next-generation text-to-speech model with inline emotion cues, multi-speaker dialogue support, and 80+ languages.

  S2 introduces `[bracket]` syntax for natural language control over emotion and paralinguistic cues (e.g., `[whisper]`, `[laugh]`, `[emphasis]`). Tags are treated as standard text rather than dedicated control tokens, so you are not limited to a fixed set of expressions. Built on the Qwen3-4B backbone and fully open-source.

  Use model ID `s2-pro` in the API. S1 remains supported for existing integrations.

  [GitHub](https://github.com/fishaudio/fish-speech) | [HuggingFace](https://huggingface.co/fishaudio)
</Update>

<Update label="Fish Audio S1" description="June 2025">
  ## Fish Audio S1

  Historic rebrand from Fish Speech to Fish Audio. #1 ranking on TTS-Arena2 with industry-leading performance.

  S1 (4B params): 0.008 WER, 0.004 CER - Available on Fish Audio Playground
  S1-mini (0.5B params): 0.011 WER, 0.005 CER - Open source on Hugging Face

  64+ emotional expressions with RLHF integration and multilingual support for English, Chinese, Japanese, and more.

  [Read More about S1](https://fish.audio/blog/introducing-s1/)
</Update>

<Update label="v1.5.1" description="May 27, 2025">
  ## v1.5.1

  Fixed critical PyTorch security settings and improved inference speed significantly. Added ONNX export support for better deployment options and enhanced text processing for Arabic and Hebrew languages. Includes bug fixes for Apple Silicon (MPS) compatibility and reorganized library structure for cleaner codebase.
</Update>

<Update label="v1.5.0" description="December 21, 2024">
  ## v1.5.0

  Introduced v1.5 model architecture with improved dataset handling and bearer token authentication for APIs.

  Added reference audio caching by hash for faster performance and better Apple Silicon support. Includes OpenAPI documentation refactoring and base64 reference data support in JSON format.
</Update>

<Update label="v1.4.3" description="November 23, 2024">
  ## v1.4.3

  Introduced Fish Agent for conversational AI with streaming capabilities and real-time interactions.

  Added comprehensive Korean language documentation and fixed critical non-English speech issues. Improved WebUI streaming functionality and PyTorch version compatibility.
</Update>

<Update label="v1.4.2" description="October 25, 2024">
  ## v1.4.2

  Documentation-focused release with comprehensive updates for v1.4, macOS support, and multiple language translations.

  Improved Docker support and API enhancements for JSON format handling. Added audio selection to WebUI and fixed various stability issues including cache handling and backend performance.
</Update>

<Update label="v1.4.1" description="September 15, 2024">
  ## v1.4.1

  Infrastructure improvements focused on Docker optimization and multi-platform builds.

  Updated PyTorch version and replaced audio backend from sox for better performance. Enhanced CI/CD pipeline with buildx support and fixed various Docker-related issues.
</Update>

<Update label="v1.4.0" description="September 12, 2024">
  ## v1.4.0

  Major release with new VQGAN architecture for improved audio quality and faster inference.

  Updated WebUI with enhanced interface and better language switching. Added Japanese documentation translation and fixed inference warmup issues for better performance.
</Update>

<Update label="v1.2.1" description="September 8, 2024">
  ## v1.2.1

  Replaced Whisper with SenseVoice for better ASR and added native Apple Silicon support.

  Includes Portuguese (Brazil) localization, streaming audio functionality, and CPU-only inference improvements. Pinned PyTorch to 2.3.1 to fix inference speed issues and aligned API with official closed-source version.
</Update>

<Update label="v1.2" description="July 18, 2024">
  ## v1.2

  Introduced auto-reranking system for better results along with bilingual support and model quantization.

  Replaced standard Whisper with Faster Whisper for improved speed and added Japanese documentation. Enhanced model stability and inference performance with optimized v1.2 architecture.
</Update>

<Update label="v1.1.2" description="June 27, 2024">
  ## v1.1.2

  Minor release adding Chinese text normalization support and a streaming audio download button in the WebUI.

  Fixed LoRA merging issues and improved Firefly performance.
</Update>

<Update label="v1.1.1" description="June 8, 2024">
  ## v1.1.1

  Breaking changes: Replaced zibai with uvicorn for API server, new text-splitter with byte-based length calculation, and license change to CC-BY-NC-SA 4.0.

  Added Apple Silicon (MPS) support, Windows one-click installation, and automatic model downloading with resume capability. Improved WebUI with better file selection and download progress indicators.
</Update>

<Update label="v1.1.0" description="May 11, 2024">
  ## v1.1.0

  Added VITS decoder integration with full streaming support and queue management for real-time audio generation.

  Introduced internationalization (i18n) with Spanish translation and improved Windows packaging. Optimized GPU memory usage and CPU-only inference performance while adding LoRA support to the Gradio UI.
</Update>

<Update label="v1.0.0" description="April 30, 2024">
  ## v1.0.0

  Major milestone release introducing new VQ-GAN architecture with VITS decoder support, LoRA fine-tuning, and streaming inference capabilities.

  Breaking changes include removal of the Rust-based data server, new tokenizer replacing phonemizer, and updated model architecture (VQ + DiT + Reflow). Achieved 4x memory reduction during loading and added WebUI for training and annotation.
</Update>

<Update label="v0.2.0" description="December 23, 2023">
  ## v0.2.0

  First public release of Fish Speech featuring a complete text-to-speech pipeline with VQ-GAN audio codec and LLAMA-based language model.

  Includes multi-language support (Chinese, English, Japanese), Gradio WebUI for inference, HTTP API server, and Docker support. Added special optimizations for Chinese users including mirror downloads and localized documentation.
</Update>


# Overview of Fish Audio
Source: https://docs.fish.audio/developer-guide/getting-started/introduction

Discover Fish Audio's powerful voice generation platform and what you can build

<Visibility>
  <AudioTranscript />
</Visibility>

## What is Fish Audio?

Fish Audio is a cutting-edge AI platform for voice generation, voice cloning, and audio storytelling.
Our technology brings dynamic, natural-sounding voices to your applications, enabling immersive experiences across industries.

<Tip>
  Introducing our latest generation voice models:

  **Fish Audio S2-Pro:** Our latest model delivers unparalleled naturalness and emotion, setting a new standard for AI-generated speech. [Learn more about our models →](/developer-guide/models-pricing/models-overview)
</Tip>

## Core Capabilities

<CardGroup>
  <Card title="Text-to-Speech" icon="microphone">
    Generate natural, expressive speech from text in multiple languages and styles
  </Card>

  <Card title="Voice Cloning" icon="copy">
    Create custom voice models from as little as 15 seconds of audio
  </Card>

  <Card title="Audio Storytelling" icon="book-open">
    Build multi-character narratives with emotion and dynamic voice switching
  </Card>
</CardGroup>

## Try It Now

<CardGroup>
  <Card title="Explore Voices" icon="wand-magic-sparkles" href="https://fish.audio/discovery">
    Test our voices in the interactive playground - no code required
  </Card>

  <Card title="View Models" icon="layer-group" href="/developer-guide/models-pricing/models-overview">
    Browse available voice models and their capabilities
  </Card>
</CardGroup>

## Ready to Start?

Get your API key and make your first API call in minutes.

<Card title="Quick Start Guide" icon="rocket" href="/developer-guide/getting-started/quickstart">
  Generate your first AI voice in under 5 minutes
</Card>

## Platform Capabilities

Fish Audio empowers developers to create innovative voice experiences across diverse industries. Whether you're building consumer apps, enterprise solutions, or creative tools, our platform provides the flexibility and power you need.

### What You Can Build

<CardGroup>
  <Card title="Content Creation" icon="podcast">
    Automate podcast production, YouTube narration, and audiobook generation
  </Card>

  <Card title="Gaming" icon="gamepad">
    Create dynamic NPC dialogue and real-time character voices
  </Card>

  <Card title="Education" icon="graduation-cap">
    Build interactive language learning tools and accessible educational content
  </Card>

  <Card title="Customer Service" icon="headset">
    Deploy natural-sounding IVR systems and support agents
  </Card>

  <Card title="Accessibility" icon="universal-access">
    Develop screen readers and voice restoration tools
  </Card>

  <Card title="Entertainment" icon="music">
    Generate ASMR content, music vocals, interactive stories, and adult content
  </Card>
</CardGroup>

### Key Features

<CardGroup>
  <Card title="Ultra-low latency" icon="bolt">
    Stream audio in real-time for live applications
  </Card>

  <Card title="High-quality voices" icon="sparkles">
    Industry-leading naturalness and clarity
  </Card>

  <Card title="Multilingual support" icon="globe">
    Generate speech in 30+ languages
  </Card>

  <Card title="Emotion control" icon="masks-theater">
    Fine-tune prosody, emotion, and speaking style
  </Card>

  <Card title="Simple integration" icon="plug">
    RESTful API with SDKs for Python, Node.js, and more
  </Card>

  <Card title="Scalable" icon="chart-line">
    Handle everything from prototypes to production workloads
  </Card>
</CardGroup>

## Learn More

* [Models & Pricing](/developer-guide/models-pricing/models-overview) - Explore voice models and pricing options
* [Core Features](/developer-guide/core-features/text-to-speech) - Deep dive into TTS and voice cloning
* [SDKs & Tools](/developer-guide/sdk-guide/python/installation) - Install language-specific libraries
* [Best Practices](/developer-guide/best-practices/voice-cloning) - Production-ready tips and optimization for voice cloning, emotion and expression control, and real-time voice streaming


# Quick Start
Source: https://docs.fish.audio/developer-guide/getting-started/quickstart

Generate your first AI voice with Fish Audio in under 5 minutes

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

This guide will walk you through generating your first text-to-speech audio with Fish Audio. By the end, you'll have converted text into natural-sounding speech using our API.

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Your First TTS Request

Choose your preferred method to generate speech:

<Tabs>
  <Tab title="cURL">
    <Steps>
      <Step title="Set your API key">
        Store your API key as an environment variable (recommended approach):

        ```bash theme={null}
        export FISH_API_KEY="replace_me"
        ```
      </Step>

      <Step title="Make the TTS request">
        Run this [cURL](https://curl.se/) command to generate your first speech:

        ```bash theme={null}
        curl -X POST https://api.fish.audio/v1/tts \
          -H "Authorization: Bearer $FISH_API_KEY" \
          -H "Content-Type: application/json" \
          -H "model: s2-pro" \
          -d '{
            "text": "Hello! Welcome to Fish Audio. This is my first AI-generated voice.",
            "format": "mp3"
          }' \
          --output welcome.mp3
        ```
      </Step>

      <Step title="Play your audio">
        The audio has been saved as `welcome.mp3`. You can play it by:

        * Double-clicking the file or opening it in any media player
        * Or using the command line:

        ```bash theme={null}
        # On macOS
        afplay welcome.mp3

        # On Linux
        mpg123 welcome.mp3

        # On Windows
        start welcome.mp3
        ```
      </Step>
    </Steps>
  </Tab>

  <Tab title="Python">
    <Steps>
      <Step title="Install the SDK">
        ```bash theme={null}
        pip install fish-audio-sdk
        ```
      </Step>

      <Step title="Generate speech">
        Create a Python script:

        ```python theme={null}
        from fishaudio import FishAudio
        from fishaudio.utils import save

        # Initialize with your API key
        client = FishAudio(api_key="your_api_key_here")

        # Generate speech
        audio = client.tts.convert(text="Hello! Welcome to Fish Audio.")
        save(audio, "welcome.mp3")

        print("✓ Audio saved to welcome.mp3")
        ```
      </Step>

      <Step title="Run the script">
        ```bash theme={null}
        python generate_speech.py
        ```
      </Step>

      <Step title="Play your audio">
        The audio has been saved as `welcome.mp3`. You can play it by:

        * Double-clicking the file or opening it in any media player
        * Or using the command line:

        ```bash theme={null}
        # On macOS
        afplay welcome.mp3

        # On Linux
        mpg123 welcome.mp3

        # On Windows
        start welcome.mp3
        ```
      </Step>
    </Steps>
  </Tab>

  <Tab title="JavaScript">
    <Steps>
      <Step title="Install the SDK">
        ```bash theme={null}
        npm install fish-audio
        ```
      </Step>

      <Step title="Generate speech">
        Create a JavaScript script:

        ```javascript theme={null}
        import { FishAudioClient } from "fish-audio";
        import { writeFile } from "fs/promises";

        const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

        const audio = await fishAudio.textToSpeech.convert({
          text: "Hello, world!",
        });

        const buffer = Buffer.from(await new Response(audio).arrayBuffer());
        await writeFile("welcome.mp3", buffer);

        console.log("✓ Audio saved to welcome.mp3");
        ```
      </Step>

      <Step title="Run the script">
        ```bash theme={null}
        node generate_speech.mjs
        ```
      </Step>

      <Step title="Play your audio">
        The audio has been saved as `welcome.mp3`. You can play it by:

        * Double-clicking the file or opening it in any media player
        * Or using the command line:

        ```bash theme={null}
        # On macOS
        afplay welcome.mp3

        # On Linux
        mpg123 welcome.mp3

        # On Windows
        start welcome.mp3
        ```
      </Step>
    </Steps>
  </Tab>
</Tabs>

## Customizing Your Voice

The examples above use the default voice. To use a different voice, add the `reference_id` parameter with a model ID from [fish.audio](https://fish.audio). You can find the model ID in the URL or use the copy button when viewing any voice.

Choose a voice to try:

<Tabs>
  <Tab title="E-Girl Voice">
    From: [https://fish.audio/m/8ef4a238714b45718ce04243307c57a7](https://fish.audio/m/8ef4a238714b45718ce04243307c57a7)

    ```bash theme={null}
    export REFERENCE_ID="8ef4a238714b45718ce04243307c57a7"
    ```
  </Tab>

  <Tab title="Energetic Male">
    From: [https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89](https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89)

    ```bash theme={null}
    export REFERENCE_ID="802e3bc2b27e49c2995d23ef70e6ac89"
    ```
  </Tab>
</Tabs>

Then generate speech with your chosen voice:

<Tabs>
  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST https://api.fish.audio/v1/tts \
      -H "Authorization: Bearer $FISH_API_KEY" \
      -H "Content-Type: application/json" \
      -H "model: s2" \
      -d '{
        "text": "This is a custom voice from Fish Audio! You can explore hundreds of different voices on the platform, or even create your own.",
        "reference_id": "'"$REFERENCE_ID"'",
        "format": "mp3"
      }' \
      --output custom_voice.mp3
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    import os
    from fishaudio import FishAudio
    from fishaudio.utils import save

    client = FishAudio(api_key="your_api_key_here")

    # Generate speech with custom voice
    audio = client.tts.convert(
        text="This is a custom voice from Fish Audio! You can explore hundreds of different voices on the platform, or even create your own.",
        reference_id=os.environ.get("REFERENCE_ID")
    )
    save(audio, "custom_voice.mp3")
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { FishAudioClient } from "fish-audio";
    import { writeFile } from "fs/promises";

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

    const audio = await fishAudio.textToSpeech.convert({
      text: "This is a custom voice from Fish Audio! You can explore hundreds of different voices on the platform, or even create your own.",
      reference_id: process.env.REFERENCE_ID,
    });

    const buffer = Buffer.from(await new Response(audio).arrayBuffer());
    await writeFile("custom_voice.mp3", buffer);

    console.log("✓ Audio saved to custom_voice.mp3");
    ```
  </Tab>
</Tabs>

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# LiveKit
Source: https://docs.fish.audio/developer-guide/integrations/livekit

Build real-time voice AI agents with Fish Audio and LiveKit

<Visibility>
  <AudioTranscript />
</Visibility>

[LiveKit Agents](https://github.com/livekit/agents) is an open source framework for building real-time voice and multimodal AI agents. It handles streaming audio pipelines, turn detection, interruptions, and LLM orchestration so you can focus on your agent's behavior.

Fish Audio integrates with LiveKit through the `fishaudio` plugin, providing text-to-speech synthesis with support for both chunked and real-time WebSocket streaming modes.

## Prerequisites

* A [Fish Audio account](https://fish.audio) with an API key
* Python 3.9 or higher

## Installation

Install LiveKit Agents with Fish Audio support:

```bash theme={null}
pip install "livekit-agents[fishaudio]"
```

## Configuration

Set your Fish Audio API key as an environment variable:

```bash theme={null}
export FISH_API_KEY=your_api_key_here
```

## Basic usage

Add Fish Audio TTS to your LiveKit agent:

```python theme={null}
from livekit.plugins.fishaudio import TTS

tts = TTS(
    reference_id="your_voice_model_id",  # Optional: use a specific voice
    model="s1",
    sample_rate=24000,
    latency_mode="balanced"
)
```

### Key parameters

| Parameter       | Description                                                               |
| --------------- | ------------------------------------------------------------------------- |
| `api_key`       | Your Fish Audio API key (or use `FISH_API_KEY` env var)                   |
| `model`         | TTS model/backend to use (default: `s1`)                                  |
| `reference_id`  | Voice model ID from the [Fish Audio library](https://fish.audio/discover) |
| `output_format` | Audio format: `pcm`, `mp3`, `wav`, or `opus` (default: `pcm`)             |
| `sample_rate`   | Audio sample rate in Hz (default: `24000`)                                |
| `num_channels`  | Number of audio channels (default: `1`)                                   |
| `base_url`      | Custom API endpoint (default: `https://api.fish.audio`)                   |
| `latency_mode`  | `normal` (\~500ms) or `balanced` (\~300ms, default)                       |

### Streaming modes

The plugin supports two synthesis modes:

```python theme={null}
# Chunked (non-streaming) synthesis
stream = tts.synthesize("Hello, world!")

# Real-time WebSocket streaming
stream = tts.stream()
```

## Resources

* [LiveKit Agents Documentation](https://docs.livekit.io/agents/)
* [LiveKit GitHub](https://github.com/livekit/agents)
* [Fish Audio Plugin Reference](https://docs.livekit.io/reference/python/v1/livekit/plugins/fishaudio/index.html)
* [Fish Audio Voice Library](https://fish.audio/discovery)


# n8n
Source: https://docs.fish.audio/developer-guide/integrations/n8n

Automate workflows with Fish Audio and n8n

<Visibility>
  <AudioTranscript />
</Visibility>

[n8n](https://n8n.io/) is a fair-code licensed workflow automation platform. The Fish Audio community node brings text-to-speech, speech-to-text, and voice cloning capabilities to your n8n workflows.

## Installation

Install from n8n community nodes:

1. Go to **Settings** > **Community Nodes**
2. Select **Install**
3. Enter `n8n-nodes-fishaudio`
4. Accept the risks and install

See the [n8n community nodes guide](https://docs.n8n.io/integrations/community-nodes/installation/) for details.

## Configuration

1. Go to **Credentials** > **Add Credential**
2. Search for "Fish Audio API"
3. Enter your API key from [fish.audio/app/api-keys](https://fish.audio/app/api-keys)

## Features

The node supports:

* **Text-to-Speech** — Generate audio from text using any voice model
* **Speech-to-Text** — Transcribe audio files
* **Voice Models** — List, create, and manage custom voices
* **Account** — Check credit balance

The node is also available as an AI tool for use with n8n's AI Agent nodes.

## Resources

* [npm package](https://www.npmjs.com/package/n8n-nodes-fishaudio)
* [GitHub](https://github.com/fishaudio/fish-audio-n8n)
* [n8n Community Nodes](https://docs.n8n.io/integrations/community-nodes/)


# Pipecat
Source: https://docs.fish.audio/developer-guide/integrations/pipecat

Build voice AI agents with Fish Audio and Pipecat

<Visibility>
  <AudioTranscript />
</Visibility>

[Pipecat](https://github.com/pipecat-ai/pipecat) is an open source framework for building voice and multimodal conversational AI. It handles the orchestration of audio, AI services, and conversation pipelines so you can focus on what makes your agent unique.

Fish Audio integrates with Pipecat through `FishAudioTTSService`, which provides real-time text-to-speech synthesis using WebSocket streaming for low-latency conversational applications.

## Prerequisites

* A [Fish Audio account](https://fish.audio) with an API key
* Python 3.9 or higher

## Installation

Install Pipecat with Fish Audio support:

```bash theme={null}
pip install "pipecat-ai[fish]"
```

## Configuration

Set your Fish Audio API key as an environment variable:

```bash theme={null}
export FISH_API_KEY=your_api_key_here
```

## Basic usage

Add `FishAudioTTSService` to your Pipecat pipeline:

```python theme={null}
from pipecat.services.fish import FishAudioTTSService

tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    reference_id="your_voice_model_id",  # Optional: use a specific voice
    model_id="s1",
    params=FishAudioTTSService.InputParams(
        latency="normal",
        prosody_speed=1.0
    )
)
```

### Key parameters

| Parameter       | Description                                                               |
| --------------- | ------------------------------------------------------------------------- |
| `api_key`       | Your Fish Audio API key                                                   |
| `reference_id`  | Voice model ID from the [Fish Audio library](https://fish.audio/discover) |
| `model_id`      | TTS model version (default: `s1`)                                         |
| `output_format` | Audio format: `pcm`, `mp3`, `wav`, or `opus`                              |

### Prosody controls

Customize speech characteristics with `InputParams`:

```python theme={null}
params=FishAudioTTSService.InputParams(
    latency="balanced",      # "normal" or "balanced"
    prosody_speed=1.2,       # 0.5 to 2.0
    prosody_volume=0,        # Volume adjustment in dB
    normalize=True           # Audio normalization
)
```

## Resources

* [Pipecat Documentation](https://docs.pipecat.ai/server/services/tts/fish)
* [Pipecat GitHub](https://github.com/pipecat-ai/pipecat)
* [Fish Audio Voice Library](https://fish.audio/discovery)


# Choosing a Model
Source: https://docs.fish.audio/developer-guide/models-pricing/choosing-a-model

Select the right Fish Audio model for your use case and requirements

<Visibility>
  <AudioTranscript />
</Visibility>

We recommend using **Fish Audio S2-Pro** for all projects - our flagship model with industry-leading quality and performance.

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# Model Deprecations
Source: https://docs.fish.audio/developer-guide/models-pricing/deprecations

Track deprecated models and migration timelines for Fish Audio services

<Visibility>
  <AudioTranscript />
</Visibility>

## Available Models

Currently available models:

* **Fish Audio S2** (Recommended) - Latest generation with best performance
* **Fish Audio S1** - Highly expressive and natural sounding

## Deprecated Models

* **speech-1.6** - Fish Speech v1.6 has been deprecated on February, 28th, 2026
* **speech-1.5** - Fish Speech v1.5 has been deprecated on February, 28th, 2026

<Note>
  We strongly recommend using **Fish Audio S1** for all new projects to access the latest capabilities and performance improvements.
</Note>

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# Models Overview
Source: https://docs.fish.audio/developer-guide/models-pricing/models-overview

Explore Fish Audio's voice generation models and their capabilities

<Visibility>
  <AudioTranscript />
</Visibility>

## Available Models

Fish Audio offers state-of-the-art text-to-speech models optimized for different use cases and performance requirements.

### Recommended Model

<Card title="s2-pro" icon="star">
  **Fish Audio S2-Pro** - Our next-generation TTS model with best-in-class performance

  * Natural language control with `[bracket]` syntax — not limited to a fixed set (e.g., `[whispers sweetly]`, `[laughing nervously]`)
  * Multi-speaker dialogue support **(S2-Pro exclusive)**
  * 80+ languages
  * 100ms time-to-first-audio
  * Full SGLang-based serving stack
  * Open-source
</Card>

<Note>
  We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations.
</Note>

### Previous Model

<Card title="s1" icon="microchip">
  **Fish Audio S1** - High-quality voice generation

  * 4 billion parameters
  * 0.008 WER (0.8% word error rate)
  * Full emotional control capabilities with `(parenthesis)` syntax
</Card>

## Model Specifications

### Fish Audio S1 Performance Metrics

* **Word Error Rate (WER)**: 0.008 (0.8%)
* **Character Error Rate (CER)**: 0.004 (0.4%)
* **Real-time Factor**: \~1:7 on standard hardware
* **TTS-Arena2 Ranking**: #1 worldwide

## Supported Languages

### S2-Pro

S2-Pro supports 80+ languages with automatic language detection and inline emotion and paralinguistic cue support.

<Info>
  Language detection is automatic - simply provide text in your target language.
</Info>

### S1

S1 supports text-to-speech generation in 13 languages with full emotional expression capabilities.

```
English, Chinese, Japanese, German,
French, Spanish, Korean, Arabic,
Russian, Dutch, Italian, Polish, Portuguese
```

## Voice Styles and Emotions

Fish Audio models support emotional expressions and voice styles that can be controlled through text markers in your input.

### S2-Pro Natural Language Control

S2-Pro treats `[bracket]` tags as standard text rather than dedicated control tokens. Through training on massive datasets, the model learned implicit mappings between natural language descriptions and acoustic variations. This means you are not limited to a predefined set of tags — you can use any descriptive expression and the model will interpret it, such as `[whispers sweetly]` or `[laughing nervously]`.

Common examples include:

```
[whisper] [laugh] [emphasis] [sigh] [gasp] [pause]
[angry] [excited] [sad] [surprised] [inhale] [exhale]
```

<Tip>
  S2-Pro cues can be placed anywhere in your text to control emotion at specific positions. For example: `"I can't believe it [gasp] you actually did it [laugh]"`
</Tip>

### S1 Voice Styles and Emotions

S1 supports 64+ emotional expressions using `(parenthesis)` syntax.

### Basic Emotions (24 expressions)

```
(angry) (sad) (excited) (surprised) (satisfied) (delighted)
(scared) (worried) (upset) (nervous) (frustrated) (depressed)
(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
(grateful) (confident) (interested) (curious) (confused) (joyful)
```

### Advanced Emotions (25 expressions)

```
(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
(keen) (disapproving) (negative) (denying) (astonished) (serious)
(sarcastic) (conciliative) (comforting) (sincere) (sneering)
(hesitating) (yielding) (painful) (awkward) (amused)
```

### Tone Markers (5 expressions)

```
(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
```

### Audio Effects (10 expressions)

```
(laughing) (chuckling) (sobbing) (crying loudly) (sighing)
(panting) (groaning) (crowd laughing) (background laughter) (audience laughing)
```

<Tip>
  You can also use natural expressions like "Ha,ha,ha" for laughter. Experiment with combinations to achieve the perfect emotional tone for your application.
</Tip>

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# Pricing & Rate Limits
Source: https://docs.fish.audio/developer-guide/models-pricing/pricing-and-rate-limits

Understand Fish Audio pricing plans, usage costs, and API rate limits

<Visibility>
  <AudioTranscript />
</Visibility>

## API Pricing

The Fish Audio API uses pay-as-you-go pricing based on actual usage. There are no subscription fees or monthly minimums for API access.

### Text-to-Speech (TTS) Models

TTS pricing is based on the size of input text, measured in millions of UTF-8 bytes.

| Model Name | Price (USD)             |
| ---------- | ----------------------- |
| `s2-pro`   | \$15.00 / M UTF-8 bytes |
| `s1`       | \$15.00 / M UTF-8 bytes |

<Info>
  1M UTF-8 bytes is approximately 180,000 English words, or about 12 hours of speech
</Info>

### Automatic Speech Recognition (ASR) Models

| Model Name     | Price (USD)         |
| -------------- | ------------------- |
| `transcribe-1` | \$0.36 / audio hour |

**How ASR billing works:**

* Charges are based on the duration of audio processed
* Duration is rounded up to the nearest second

## Rate Limits

These limits help us ensure fair usage and maintain service quality for all users.

### Concurrent Request Limits

| Tier        | Spending Threshold | Concurrent Requests |
| ----------- | ------------------ | ------------------- |
| Starter     | \< \$100 paid      | 5 requests          |
| Elevated    | ≥ \$100 paid       | 15 requests         |
| High Volume | ≥ \$1,000 paid     | 50 requests         |
| Enterprise  | Custom             | Custom limits       |

<Tip>
  Concurrency tiers unlock as soon as your total prepaid amount reaches the threshold. You do not need to spend the full balance first. If your workload needs a higher concurrency tier, you can top up in advance to unlock the next tier immediately.
</Tip>

<Note>
  Please reach out to our team to enable enterprise volume pricing, rate limits, and billing.
</Note>

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# Story Studio
Source: https://docs.fish.audio/developer-guide/products/story-studio

Build immersive audio stories and narratives

<Visibility>
  <AudioTranscript />
</Visibility>

<Info>
  Coming soon! We're preparing comprehensive documentation for Story Studio.
</Info>

In the meantime, you can:

* Visit the [Fish Audio Playground](https://fish.audio) to explore our storytelling features
* Check back soon for detailed guides and tutorials

Join our [Discord](https://discord.gg/dF9Db2Tt3Y) for updates.


# Text to Speech
Source: https://docs.fish.audio/developer-guide/products/tts

Convert text into natural-sounding speech with Fish Audio's AI voices

<Visibility>
  <AudioTranscript />
</Visibility>

<Info>
  Coming soon! We're preparing comprehensive documentation for our Text-to-Speech web interface.
</Info>

In the meantime, you can:

* Visit the [Fish Audio Playground](https://fish.audio) to try our TTS features
* Check our [API documentation](/api-reference/endpoint/openapi-v1/text-to-speech) for programmatic access
* Read our [TTS Guide and Best Practices](/developer-guide/core-features/text-to-speech)

Check back soon or join our [Discord](https://discord.gg/dF9Db2Tt3Y) for updates.


# Voice Cloning
Source: https://docs.fish.audio/developer-guide/products/voice-cloning

Create custom voice models from audio samples

<Visibility>
  <AudioTranscript />
</Visibility>

<Info>
  Coming soon! We're preparing comprehensive documentation for our Voice Cloning web interface.
</Info>

In the meantime, you can:

* Visit the [Fish Audio Playground](https://fish.audio) to try voice cloning
* View our [Python SDK voice cloning guide](/developer-guide/sdk-guide/python/voice-cloning)
* Read our [voice cloning best practices](/developer-guide/best-practices/voice-cloning)

Check back soon or join our [Discord](https://discord.gg/dF9Db2Tt3Y) for updates.


# Agent Quickstart
Source: https://docs.fish.audio/developer-guide/resources/agent-quickstart

Low-noise entry points and canonical URLs for AI agents using Fish Audio documentation

## Purpose

This page is the recommended starting point for AI agents, RAG pipelines, and documentation crawlers that need accurate Fish Audio references with minimal markup noise.

## Built-In Agent Indexes

This documentation site already provides built-in LLM-friendly indexes:

* [llms.txt](https://docs.fish.audio/llms.txt) for the curated documentation index
* [llms-full.txt](https://docs.fish.audio/llms-full.txt) for broader site context

In most cases, agents should read `llms.txt` first and only fetch `llms-full.txt` when they need wider context across the whole documentation set.

## Install the Agent Skill

For coding agents that support [Agent Skills](https://github.com/vercel-labs/skills) (Claude Code, Cursor, Windsurf, Codex, and others), install the ready-made raw-API skill with a single command:

```bash theme={null}
npx skills add https://docs.fish.audio --skill fish-audio-api
```

The skill teaches the agent how to call the Fish Audio REST and WebSocket APIs directly from `curl`, Python, Node.js, or any HTTP client — no SDK required. It covers authentication, every endpoint in our [OpenAPI schema](https://docs.fish.audio/api-reference/openapi.json), MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue, and the WebSocket streaming protocol.

Discovery endpoint: [/.well-known/agent-skills/index.json](https://docs.fish.audio/.well-known/agent-skills/index.json). Run `npx skills add https://docs.fish.audio` (without `--skill`) to install every skill published here, including the auto-generated product overview skill.

## Retrieval Order

1. Read [llms.txt](https://docs.fish.audio/llms.txt) for the curated documentation index.
2. Read [llms-full.txt](https://docs.fish.audio/llms-full.txt) when broad site context is needed.
3. Read [OpenAPI](https://docs.fish.audio/api-reference/openapi.json) for REST schemas, parameters, and examples.
4. Read [AsyncAPI](https://docs.fish.audio/api-reference/asyncapi.yml) for the WebSocket streaming protocol.
5. Fetch individual `.md` pages only after narrowing to a specific task.

## Canonical API Facts

* Base API URL: `https://api.fish.audio`
* Authentication: `Authorization: Bearer <FISH_API_KEY>`
* TTS model selection: send a required `model` header. Recommended default: `s2-pro`
* Main REST endpoints:
  * `POST /v1/tts`
  * `POST /v1/asr`
  * `GET /model`
  * `POST /model`
  * `GET /model/{id}`
  * `PATCH /model/{id}`
  * `DELETE /model/{id}`
* Real-time streaming endpoint: `wss://api.fish.audio/v1/tts/live`

## High-Value URLs

### Start Here

* [Agent Quickstart](https://docs.fish.audio/developer-guide/resources/agent-quickstart.md)
* [Quick Start](https://docs.fish.audio/developer-guide/getting-started/quickstart.md)
* [AI Coding Agents](https://docs.fish.audio/developer-guide/resources/coding-agents.md)

### API Specs

* [OpenAPI](https://docs.fish.audio/api-reference/openapi.json)
* [AsyncAPI](https://docs.fish.audio/api-reference/asyncapi.yml)
* [API Introduction](https://docs.fish.audio/api-reference/introduction.md)

### Authentication And SDK Setup

* [Python Authentication](https://docs.fish.audio/developer-guide/sdk-guide/python/authentication.md)
* [JavaScript Authentication](https://docs.fish.audio/developer-guide/sdk-guide/javascript/authentication.md)
* [Python SDK Overview](https://docs.fish.audio/developer-guide/sdk-guide/python/overview.md)
* [JavaScript Installation](https://docs.fish.audio/developer-guide/sdk-guide/javascript/installation.md)

### Core Product Tasks

* [Text to Speech Guide](https://docs.fish.audio/developer-guide/core-features/text-to-speech.md)
* [Speech to Text Guide](https://docs.fish.audio/developer-guide/core-features/speech-to-text.md)
* [Creating Voice Models](https://docs.fish.audio/developer-guide/core-features/creating-models.md)
* [Emotion Control](https://docs.fish.audio/developer-guide/core-features/emotions.md)
* [Fine-grained Control](https://docs.fish.audio/developer-guide/core-features/fine-grained-control.md)

### Real-Time And Integrations

* [WebSocket TTS Streaming](https://docs.fish.audio/api-reference/endpoint/websocket/tts-live.md)
* [Real-time Voice Streaming Best Practices](https://docs.fish.audio/developer-guide/best-practices/real-time-streaming.md)
* [Python WebSocket Streaming](https://docs.fish.audio/developer-guide/sdk-guide/python/websocket.md)
* [JavaScript WebSocket](https://docs.fish.audio/developer-guide/sdk-guide/javascript/websocket.md)
* [LiveKit Integration](https://docs.fish.audio/developer-guide/integrations/livekit.md)
* [Pipecat Integration](https://docs.fish.audio/developer-guide/integrations/pipecat.md)

### Models, Pricing, And Lifecycle

* [Models Overview](https://docs.fish.audio/developer-guide/models-pricing/models-overview.md)
* [Choosing a Model](https://docs.fish.audio/developer-guide/models-pricing/choosing-a-model.md)
* [Pricing And Rate Limits](https://docs.fish.audio/developer-guide/models-pricing/pricing-and-rate-limits.md)
* [Model Deprecations](https://docs.fish.audio/developer-guide/models-pricing/deprecations.md)

## Task Routing

* If the task is "generate speech", start with Quick Start, the Text to Speech guide, and `POST /v1/tts`.
* If the task is "transcribe audio", start with the Speech to Text guide and `POST /v1/asr`.
* If the task is "clone or manage voices", start with Creating Voice Models and the `/model` endpoints.
* If the task is "stream audio in real time", start with AsyncAPI, WebSocket TTS Streaming, and the WebSocket SDK guides.
* If the task is "pick the right model or estimate cost", start with Models Overview and Pricing And Rate Limits.

## Notes For Agents

* Prefer `openapi.json` and `asyncapi.yml` for machine-readable schemas.
* Prefer `.md` URLs when you need a single human-authored page in Markdown form.
* Some richer pages use interactive MDX widgets. If a fetched page contains UI or component noise, fall back to this page, `llms.txt`, `llms-full.txt`, or the API spec files first.
* Treat this page as the canonical low-noise entry point for Fish Audio documentation retrieval.


# Brand Guidelines
Source: https://docs.fish.audio/developer-guide/resources/brand

Design guidelines for using Fish Audio brand assets

<Visibility>
  <AudioTranscript />
</Visibility>

## Logo

### Wordmark

Our preferred logo format combines the [Fish Audio Icon](#icon) with the wordmark side by side.
This is the primary version of our logo and should be used whenever possible for maximum brand recognition and clarity.

<Frame>
  <img alt="Fish Audio Clearspace Wordmark" />

  <img alt="Fish Audio Clearspace Wordmark" />
</Frame>

### Icon

Our icon features a whale composed of audio bars and sound waves, symbolizing the fusion of marine life with audio technology. This design represents our brand's commitment to natural, flowing, and powerful voice generation.

The Fish Audio icon should only be used when space constraints or context make it impractical to display the full wordmark. Always prefer the wordmark with icon combination when possible.

<Frame>
  <img alt="Fish Audio Clearspace Logo" />

  <img alt="Fish Audio Clearspace Logo" />
</Frame>

### Avoid

To maintain the integrity of our brand identity, please do not alter our logo in any of the following ways:

<Columns>
  <Frame>
    <img alt="Incorrect logo usage - distorted" />

    <img alt="Incorrect logo usage - distorted" />
  </Frame>

  <Frame>
    <img alt="Incorrect logo usage - rotated" />

    <img alt="Incorrect logo usage - rotated" />
  </Frame>
</Columns>

<Columns>
  <Frame>
    <img alt="Incorrect logo usage - wrong colors" />

    <img alt="Incorrect logo usage - wrong colors" />
  </Frame>

  <Frame>
    <img alt="Incorrect logo usage - effects" />

    <img alt="Incorrect logo usage - effects" />
  </Frame>
</Columns>

## Colors

Our official brand colors consist of black and white for primary logo applications, complemented by secondary grays for subtle variations and an accent purple for visual highlights in marketing materials.

<Frame>
  <div>
    <ColorSwatch name="Black (Primary)" />

    <ColorSwatch name="White (Primary)" />

    <ColorSwatch name="Purple (Accent)" />

    <ColorSwatch name="Light Gray" />

    <ColorSwatch name="Dark Gray" />
  </div>
</Frame>

## Typography

Our brand uses **Onest Semibold** in the logo wordmark. This documentation is also set in Onest, so you're experiencing our brand typography right now.

[Download Onest on Google Fonts](https://fonts.google.com/specimen/Onest)

## Usage Guidelines

The Fish Audio name and logos are trademarks of Hanabi AI Inc. You may freely use and redistribute our brand assets when referencing Fish Audio. By using our brand assets, you agree that we own them and that any goodwill generated by your use benefits Fish Audio.

### Do

* Use our brand assets freely in your projects, applications, and content
* Share our brand assets in blog posts, tutorials, documentation, and educational materials
* Follow the visual guidelines shown above (spacing, colors, sizing)
* Link to fish.audio when using our brand online

### Don't

* Use our logo as part of your own product name or branding
* Imply partnership, sponsorship, or endorsement without permission
* Feature our logo more prominently than your own brand

### Questions?

If you're unsure whether your use case is appropriate or need special permission, please contact us at [support@fish.audio](mailto:support@fish.audio).

## Download Assets

<DownloadAssets />


# AI Coding Agents
Source: https://docs.fish.audio/developer-guide/resources/coding-agents

Connect AI coding assistants to Fish Audio documentation via MCP for real-time API guidance

<Visibility>
  <AudioTranscript />
</Visibility>

## Overview

Integrate Fish Audio's comprehensive documentation directly into your AI coding assistants. Using MCP (Model Context Protocol), coding agents like Claude Code, Cursor, and Windsurf can access our latest API references, guides, and examples in real-time.

<Tip>
  The Fish Audio MCP server provides instant access to:

  * Complete API documentation
  * SDK usage examples
  * Best practices and implementation patterns
  * Troubleshooting guides

  Connect once and get accurate, up-to-date Fish Audio knowledge in your coding environment.
</Tip>

<Note>
  This documentation site also exposes built-in LLM-friendly indexes:

  * [llms.txt](https://docs.fish.audio/llms.txt) for the curated page index
  * [llms-full.txt](https://docs.fish.audio/llms-full.txt) for broader site context

  If your coding agent supports direct document fetching, start with `llms.txt` before pulling individual pages.
</Note>

## Install as an Agent Skill

Fish Audio publishes a ready-made [Agent Skill](https://github.com/vercel-labs/skills) that teaches your coding agent how to call the Fish Audio REST and WebSocket APIs directly, without an SDK. It covers authentication, every endpoint in our OpenAPI schema, MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue, and the WebSocket streaming protocol.

<Tabs>
  <Tab title="Install the raw-API skill">
    ```bash theme={null}
    npx skills add https://docs.fish.audio --skill fish-audio-api
    ```

    This installs the skill into your agent's local skill directory (for example `~/.claude/skills/fish-audio-api/`). Once installed, ask your agent to "call the Fish Audio TTS API with curl" or "stream TTS over WebSocket in Python" and it will follow the skill's conventions.
  </Tab>

  <Tab title="Install all Fish Audio skills">
    ```bash theme={null}
    npx skills add https://docs.fish.audio
    ```

    Installs every skill advertised at [/.well-known/agent-skills/index.json](https://docs.fish.audio/.well-known/agent-skills/index.json), including the auto-generated product overview skill and the raw-API skill.
  </Tab>

  <Tab title="Inspect before installing">
    The discovery index lives at [/.well-known/agent-skills/index.json](https://docs.fish.audio/.well-known/agent-skills/index.json) and each skill's raw markdown is served at [/.well-known/agent-skills/\<skill>/SKILL.md](https://docs.fish.audio/.well-known/agent-skills/fish-audio-api/SKILL.md). Review the skill content first, then install with:

    ```bash theme={null}
    npx skills add https://docs.fish.audio --list          # show available skills
    npx skills add https://docs.fish.audio --skill fish-audio-api
    ```
  </Tab>
</Tabs>

<Tip>
  The `skills` CLI works with any agent that uses `SKILL.md` conventions — Claude Code, Cursor, Windsurf, Codex, and others. See [`npx skills --help`](https://github.com/vercel-labs/skills) for agent-specific install flags such as `-a claude-code` or `-a cursor`.
</Tip>

<Note>
  Prefer MCP if you want live documentation search inside your editor. Prefer the Agent Skill if you want a self-contained instruction file that works offline after install and doesn't rely on a running MCP server.
</Note>

## Why Use MCP Integration?

<CardGroup>
  <Card title="Real-Time Docs" icon="clock">
    Access the latest API documentation without leaving your editor
  </Card>

  <Card title="Accurate Code" icon="check">
    Generate working code based on current API specifications
  </Card>

  <Card title="Smart Assistance" icon="brain">
    Get context-aware help for debugging and optimization
  </Card>
</CardGroup>

## Setup

<Tabs>
  <Tab title="Claude Code">
    <Steps>
      <Step title="Run Installation Command">
        Open your terminal in your project directory and run:

        ```bash theme={null}
        claude mcp add --transport http fish-audio --scope project https://docs.fish.audio/mcp
        ```

        This creates a `.mcp.json` file in your project root with the Fish Audio documentation server configuration.

        <Accordion title="Understanding Installation Scopes">
          Claude Code supports three installation scopes:

          * **`--scope project`** (recommended): Stores configuration in `.mcp.json` at project root. Version-controlled and shared with your team.
          * **`--scope user`**: Available globally across all your projects, but private to your account.
          * **`--scope local`** (default): Project-specific but private to you only. Good for experimentation.

          For team collaboration, use project scope and commit the `.mcp.json` file to git.
        </Accordion>
      </Step>

      <Step title="Verify Connection">
        Check that the server is connected:

        ```bash theme={null}
        claude mcp list
        ```

        You should see `fish-audio` in the list of configured servers.
      </Step>

      <Step title="Test Integration">
        Ask Claude Code: "What Fish Audio models are available?" or "How do I use Fish Audio's TTS API?"
      </Step>
    </Steps>
  </Tab>

  <Tab title="Cursor">
    <Steps>
      <Step title="Open MCP Settings">
        Use `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette, then search for "Open MCP settings".
      </Step>

      <Step title="Add Custom MCP Server">
        Select "Add custom MCP" to open the `mcp.json` configuration file.
      </Step>

      <Step title="Configure Fish Audio Server">
        Add the Fish Audio documentation server:

        ```json theme={null}
        {
          "mcpServers": {
            "fish-audio": {
              "url": "https://docs.fish.audio/mcp"
            }
          }
        }
        ```
      </Step>

      <Step title="Save and Reload">
        Save the configuration file and reload Cursor to apply changes.
      </Step>

      <Step title="Test Integration">
        In Cursor's chat, ask: "What tools do you have available?" You should see the Fish Audio MCP server listed. Then try: "What Fish Audio TTS models are available?"
      </Step>
    </Steps>

    <Note>
      Cursor's MCP support was added in early 2025. Ensure you're running the latest version for full functionality.
    </Note>
  </Tab>

  <Tab title="Windsurf">
    <Steps>
      <Step title="Access MCP Configuration">
        Go to `File > Preferences > Windsurf Settings`, then navigate to `Cascade > Model Context Protocol (MCP) Servers`.
      </Step>

      <Step title="Add Custom Server">
        Click "Add custom server +" or "View raw config" to edit the configuration file at `~/.codeium/windsurf/mcp_config.json`.
      </Step>

      <Step title="Configure Fish Audio Server">
        Add the Fish Audio documentation server:

        ```json theme={null}
        {
          "mcpServers": {
            "fish-audio": {
              "url": "https://docs.fish.audio/mcp"
            }
          }
        }
        ```
      </Step>

      <Step title="Save and Refresh">
        Save the configuration and click the refresh button in Windsurf to apply changes.
      </Step>

      <Step title="Test Integration">
        Open Cascade chat (Ctrl+L) and ask: "Search Fish Audio docs for TTS API usage" or "What emotion parameters does Fish Audio support?"
      </Step>
    </Steps>

    <Note>
      Windsurf's MCP support was introduced in Wave 3 (February 2025). Ensure you're running the latest version.
    </Note>
  </Tab>
</Tabs>

## Using the Integration

### Example Queries

Once connected, ask your coding agent questions naturally:

<CardGroup>
  <Card title="Authentication" icon="key">
    "How do I authenticate with Fish Audio API?"
  </Card>

  <Card title="TTS Example" icon="microphone">
    "Show me Python code for text-to-speech"
  </Card>

  <Card title="Emotions" icon="masks-theater">
    "What emotion parameters are available?"
  </Card>

  <Card title="WebSocket" icon="plug">
    "Help me implement real-time streaming"
  </Card>
</CardGroup>

### Code Generation Examples

<Tabs>
  <Tab title="Basic TTS">
    Ask: "Generate a Python function for text-to-speech with Fish Audio"

    ```python theme={null}
    from fish_audio import FishAudioClient

    def text_to_speech(text: str, voice_id: str, output_file: str):
        """Convert text to speech using Fish Audio API"""
        client = FishAudioClient(api_key="your-api-key")

        response = client.tts.create(
            text=text,
            model_id=voice_id,
            format="mp3"
        )

        with open(output_file, "wb") as f:
            f.write(response.audio_data)

        return output_file
    ```
  </Tab>

  <Tab title="Voice Cloning">
    Ask: "Create a voice cloning pipeline with error handling"

    ```python theme={null}
    from fish_audio import FishAudioClient
    import logging

    def clone_voice(audio_path: str, name: str):
        """Clone a voice from audio sample"""
        client = FishAudioClient(api_key="your-api-key")

        try:
            # Upload audio sample
            with open(audio_path, "rb") as f:
                model = client.models.create(
                    name=name,
                    audio_data=f.read(),
                    description="Custom cloned voice"
                )

            logging.info(f"Voice cloned: {model.id}")
            return model.id

        except Exception as e:
            logging.error(f"Cloning failed: {e}")
            raise
    ```
  </Tab>

  <Tab title="Streaming">
    Ask: "Implement real-time TTS streaming"

    ```python theme={null}
    from fish_audio import FishAudioClient
    import asyncio

    async def stream_tts(text: str, voice_id: str):
        """Stream TTS audio in real-time"""
        client = FishAudioClient(api_key="your-api-key")

        async for chunk in client.tts.stream(
            text=text,
            model_id=voice_id,
            chunk_size=1024
        ):
            # Process audio chunk
            yield chunk
    ```
  </Tab>
</Tabs>

## Available Documentation

Your coding agent can access:

<CardGroup>
  <Card title="API Reference" icon="code">
    Complete endpoint documentation with parameters
  </Card>

  <Card title="SDK Guides" icon="book">
    Python SDK usage and examples
  </Card>

  <Card title="Best Practices" icon="lightbulb">
    Optimization patterns and tips
  </Card>

  <Card title="Models & Pricing" icon="tags">
    Available models and rate limits
  </Card>

  <Card title="Voice Cloning" icon="copy">
    Custom voice creation guides
  </Card>

  <Card title="Troubleshooting" icon="wrench">
    Common issues and solutions
  </Card>
</CardGroup>

## Advanced Usage

### Custom Commands

Create agent workflows for common tasks:

<CodeGroup>
  ```text Voice Pipeline theme={null}
  "Create a complete voice generation pipeline with:
  - Authentication
  - Voice selection
  - Emotion control
  - Error handling
  - Audio export"
  ```

  ```text Batch Processing theme={null}
  "Build a batch TTS processor that:
  - Reads from CSV
  - Handles rate limits
  - Retries on failure
  - Tracks progress"
  ```

  ```text WebSocket Client theme={null}
  "Implement a WebSocket client for:
  - Real-time streaming
  - Auto-reconnection
  - Buffer management
  - Error recovery"
  ```
</CodeGroup>

### Context-Aware Features

With MCP integration, your agent can:

* Suggest appropriate models based on use case
* Handle rate limiting automatically
* Provide inline documentation
* Validate API calls against specifications
* Recommend optimization strategies

## Troubleshooting

<AccordionGroup>
  <Accordion title="Connection Issues">
    If the MCP server isn't connecting:

    1. Verify internet connectivity
    2. Check `https://docs.fish.audio/mcp` is accessible
    3. Ensure your agent supports MCP protocol
    4. Restart your coding environment
    5. Clear any cached configurations
  </Accordion>

  <Accordion title="Outdated Information">
    The MCP server always serves the latest documentation:

    1. Refresh the MCP connection in settings
    2. Clear documentation cache if available
    3. Report persistent issues to [support@fish.audio](mailto:support@fish.audio)
  </Accordion>

  <Accordion title="Missing Features">
    If certain features aren't available:

    1. Verify you're using the latest agent version
    2. Check MCP protocol compatibility
    3. Ensure proper server configuration
    4. Contact support for assistance
  </Accordion>
</AccordionGroup>

## Security

<Info>
  **Your data is safe:** - MCP provides read-only access to public documentation

  * No API keys are transmitted through MCP - All connections use HTTPS
    encryption - No user queries or usage data is stored
</Info>

## Next Steps

<CardGroup>
  <Card title="API Introduction" icon="play" href="/api-reference/introduction">
    Start with Fish Audio API basics
  </Card>

  <Card title="Python SDK" icon="python" href="/developer-guide/sdk-guide/python/installation">
    Install and configure the Python SDK
  </Card>

  <Card title="TTS Guide and Best Practices" icon="microphone" href="/developer-guide/core-features/text-to-speech">
    Learn text-to-speech optimization
  </Card>

  <Card title="Voice Cloning Guide" icon="copy" href="/developer-guide/core-features/creating-models">
    Create custom voice models
  </Card>
</CardGroup>

## Support

Need help with MCP integration?

* **Technical Support**: [support@fish.audio](mailto:support@fish.audio)
* **Documentation Issues**: [GitHub](https://github.com/fishaudio)
* **Community**: [Discord](https://discord.gg/dF9Db2Tt3Y)


# Migration Guide
Source: https://docs.fish.audio/developer-guide/resources/migration

Switch from ElevenLabs, OpenAI, or other TTS providers to Fish Audio

<Visibility>
  <AudioTranscript />
</Visibility>

<Info>
  Coming soon! We're preparing comprehensive migration guides to help you seamlessly switch to Fish Audio.
</Info>

We're working on detailed migration guides for:

* ElevenLabs
* OpenAI TTS
* Google Cloud Text-to-Speech
* Amazon Polly
* Other TTS providers

Check back soon or join our [Discord](https://discord.gg/dF9Db2Tt3Y) for updates.


# Roadmap
Source: https://docs.fish.audio/developer-guide/resources/roadmap

Upcoming features and improvements for Fish Audio

<Visibility>
  <AudioTranscript />
</Visibility>

## Roadmap

Explore what's coming next for Fish Audio. Our roadmap reflects our current priorities and vision for the platform.

<Note>
  This roadmap is subject to change based on user feedback and technical considerations. Features may be added, modified, or removed as we continue to develop the platform.
</Note>

### Coming Soon

Details about our upcoming features and improvements will be published here.

## Feature Requests

Have a feature request or want to vote on priorities? We'd love to hear from you:

* **Email**: [support@fish.audio](mailto:support@fish.audio)
* **Discord**: Join our [community Discord](https://discord.gg/dF9Db2Tt3Y)
* **GitHub**: Open an issue on our [GitHub repository](https://github.com/fishaudio)

## Stay Updated

Subscribe to our [changelog](/developer-guide/getting-started/changelog) RSS feed to get notified when new features are released.


# Authentication
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/authentication

Manage API keys and client setup in the Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Client Initialization

Initialize a `FishAudioClient` with your API key to start using the SDK:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";

// Initialize with your API key
const fishAudio = new FishAudioClient({ apiKey: "your_api_key" });
```

### Using Environment Variables

For better security, store your API key in environment variables:

<Tabs>
  <Tab title="Using process.env">
    Set the environment variable in your shell:

    ```bash theme={null}
    export FISH_API_KEY=your_api_key_here
    ```

    Then initialize immediately:

    ```typescript theme={null}
    import { FishAudioClient } from "fish-audio";

    const fishAudio = new FishAudioClient();
    ```
  </Tab>

  <Tab title="Using dotenv">
    ```typescript theme={null}
    import { config } from "dotenv";
    import { FishAudioClient } from "fish-audio";

    // Load environment variables from .env file
    config();

    const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
    ```

    Create a `.env` file in your project root:

    ```bash theme={null}
    FISH_API_KEY=your_api_key_here
    ```
  </Tab>
</Tabs>

### Custom Endpoints

If you need to use a proxy or custom endpoint:

```typescript theme={null}
const fishAudio = new FishAudioClient({
  apiKey: "your_api_key",
  baseUrl: "https://your-proxy-domain.com",
});
```


# Installation
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/installation

Install and set up the Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

To use the Fish Audio API in server-side JavaScript environments like Node.js, Deno, or Bun,
you can use the official [Fish Audio SDK for TypeScript and JavaScript](https://www.npmjs.com/package/fish-audio).

## Requirements

* Node.js 18 or higher

## Install

Install the JavaScript SDK from npm. Choose your preferred package manager:

<Tabs>
  <Tab title="npm">
    ```bash theme={null}
    npm install fish-audio
    ```
  </Tab>

  <Tab title="yarn">
    ```bash theme={null}
    yarn add fish-audio
    ```
  </Tab>

  <Tab title="pnpm">
    ```bash theme={null}
    pnpm add fish-audio
    ```
  </Tab>
</Tabs>

## Support

Need help? Check out these resources:

* [API Reference](/api-reference/introduction) - Complete API documentation
* [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model
* [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech
* [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming
* [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community
* [Support Email](mailto:support@fish.audio) - Contact our support team


# Speech to Text
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/speech-to-text

Convert audio to text with Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Basic Usage

Transcribe audio to text:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

const result = await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
});

console.log(result.text);
console.log("Duration (s):", result.duration);
```

## Language Specification

Improve accuracy by specifying the language:

```typescript theme={null}
// English transcription
await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
  language: "en"
});

// Chinese transcription
await fishAudio.speechToText.convert({
  audio: createReadStream("audio.mp3"),
  language: "zh"
});
```

Common language codes: `en` (English), `zh` (Chinese), `es` (Spanish), `fr` (French), `de` (German), `ja` (Japanese), `ko` (Korean), `pt` (Portuguese)

<Tip>
  Automatic language detection works well, but specifying the language improves accuracy and speed.
</Tip>

## Working with Segments

Get detailed timing for each segment:

```typescript theme={null}
const response = await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3") });

// Full transcription
console.log(response.text);

// Segment details
for (const seg of response.segments ?? []) {
  console.log(`[${seg.start.toFixed(2)}s - ${seg.end.toFixed(2)}s] ${seg.text}`);
}
```

## Timestamps Control

Control timestamp generation:

```typescript theme={null}
// Include timestamps (default)
await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3"), ignore_timestamps: false });

// Skip timestamp processing for faster results
await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3"), ignore_timestamps: true });
```

<Note>
  `ignore_timestamps: false` (default) includes segment timestamps. Set to `true` to skip timestamp processing for faster transcription when you only need the text.
</Note>

## Audio Formats

Supported audio formats:

* MP3 (recommended)
* WAV
* M4A
* OGG
* FLAC
* AAC

File requirements:

* Maximum size: 20MB
* Maximum duration: 60 minutes
* Sample rate: 16kHz or higher recommended

## Transcribing TTS Output

Transcribe generated speech:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";

const fishAudio = new FishAudioClient();

// Generate speech
const ttsAudio = await fishAudio.textToSpeech.convert({ text: "Hello, this is a test" });

// Transcribe it
const asr = await fishAudio.speechToText.convert({ audio: ttsAudio });
console.log(asr.text);
```

## Error Handling

Handle common errors:

```typescript theme={null}
try {
  await fishAudio.speechToText.convert({ audio: createReadStream("audio.mp3") });
} catch (e: any) {
  const status = e?.status || e?.response?.status;
  if (status === 413) console.error("Audio file too large (max 20MB)");
  else if (status === 400) console.error("Invalid audio format");
  else throw e;
}
```

## Response Structure

The ASR response includes:

| Field      | Type          | Description               |
| ---------- | ------------- | ------------------------- |
| `text`     | string        | Complete transcription    |
| `duration` | number        | Audio duration (seconds)  |
| `segments` | ASRSegment\[] | Timestamped text segments |

Segment structure:

| Field   | Type   | Description          |
| ------- | ------ | -------------------- |
| `text`  | string | Segment text         |
| `start` | number | Start time (seconds) |
| `end`   | number | End time (seconds)   |

<Warning>
  Note the timing units: `duration` and segment times are in seconds.
</Warning>

## Request Parameters

| Parameter           | Type    | Description                | Default            |                     |          |
| ------------------- | ------- | -------------------------- | ------------------ | ------------------- | -------- |
| `audio`             | File    | Buffer                     | Readable stream    | Audio to transcribe | Required |
| `language`          | string  | Language code (e.g., "en") | None (auto-detect) |                     |          |
| `ignore_timestamps` | boolean | Skip timestamp processing  | false              |                     |          |


# Text to Speech
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/text-to-speech

Convert text to natural speech with Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Basic Usage

Generate speech from text:

```typescript theme={null}
import { FishAudioClient, play } from "fish-audio";

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

const audio = await fishAudio.textToSpeech.convert({
  text: "Hello, world!",
});

await play(audio);
```

## Using Voice Models

Specify a voice model for consistent voice generation:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";

const fishAudio = new FishAudioClient();

const audio = await fishAudio.textToSpeech.convert({
  text: "This is my custom voice",
  reference_id: "your_model_id", // Your model ID from fish.audio
});

await play(audio);
```

### Getting Model IDs

The `reference_id` is the model ID from the URL when viewing a model on Fish Audio:

* Model URL: `https://fish.audio/m/802e3bc2b27e49c2995d23ef70e6ac89`
* Reference ID: `802e3bc2b27e49c2995d23ef70e6ac89`

You can also get model IDs programmatically:

```typescript theme={null}
// List your models
const results = await fishAudio.voices.search({ self: true });
for (const model of results.items ?? []) {
  console.log(`${model.title}: ${model._id}`);
}

// Get specific model details
const model = await fishAudio.voices.get("your_model_id");
console.log(`Model: ${model.title}, ID: ${model._id}`);
```

## Emotions

<Tip>
  The `(parenthesis)` syntax below applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

Add emotional expressions to your text:

```typescript theme={null}
import type { TTSRequest } from "fish-audio";

const text = `
(happy) I'm excited to share this!
(sad) Unfortunately, it didn't work out.
(whispering) This is a secret.
`;

const request: TTSRequest = { text, reference_id: "model_id" };
```

Common emotions: `(happy)`, `(sad)`, `(angry)`, `(excited)`, `(calm)`, `(surprised)`, `(whispering)`, `(shouting)`, `(laughing)`, `(sighing)`

For more advanced control over speech generation, including phoneme-level control and additional paralanguage features, see [Fine-grained Control](/developer-guide/core-features/fine-grained-control).

## Audio Formats

Choose output format based on your needs:

```typescript theme={null}
// MP3 (default)
await fishAudio.textToSpeech.convert({ text: "...", format: "mp3", mp3_bitrate: 192 });

// WAV - uncompressed
await fishAudio.textToSpeech.convert({ text: "...", format: "wav", sample_rate: 44100 });

// Opus - efficient for streaming
await fishAudio.textToSpeech.convert({ text: "...", format: "opus", opus_bitrate: 48 });

// PCM - raw audio data
await fishAudio.textToSpeech.convert({ text: "...", format: "pcm", sample_rate: 16000 });
```

## Prosody Control

Adjust speech speed and volume:

```typescript theme={null}
const audio = await fishAudio.textToSpeech.convert({
  text: "Adjusted speech",
  prosody: {
    speed: 1.2,  // 0.5 - 2.0
    volume: 5,   // -20 - 20
  },
});
```

## Advanced Parameters

Fine-tune generation:

```typescript theme={null}
const audio = await client.textToSpeech.convert({
  text: "Your text here",
  chunk_length: 200,    // Characters per chunk (100-300)
  normalize: true,      // Normalize text
  latency: "balanced",  // "normal" or "balanced"
  temperature: 0.7,     // Randomness (0.0-1.0)
  top_p: 0.7,           // Token selection (0.0-1.0)
});
```

## Choosing Backend

Our state-of-the-art [S2-Pro model](/developer-guide/models-pricing/models-overview)
is the default backend model for TTS. Optionally specify the model via the second argument (`backend: Backends`).

```typescript theme={null}
const audio = await fishAudio.textToSpeech.convert({
  text: "Hello, world!",
}, "s2-pro");
```

## Streaming

For real-time streaming, see the [WebSocket guide](/developer-guide/sdk-guide/javascript/websocket).

## Error Handling

Handle common errors:

```typescript theme={null}
async function generateWithRetry(request: Record<string, unknown>, maxRetries = 3) {
  const fishAudio = new FishAudioClient();
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fishAudio.textToSpeech.convert(request);
    } catch (e: any) {
      const status = e?.status || e?.response?.status;
      if (status === 429) await new Promise(r => setTimeout(r, 2 ** attempt * 1000));
      else if (status === 401) throw new Error("Invalid API key");
      else throw e;
    }
  }
}
```

## Request Parameters

| Parameter      | Type      | Description          | Default    |
| -------------- | --------- | -------------------- | ---------- |
| `text`         | string    | Text to convert      | Required   |
| `reference_id` | string    | Voice model ID       | None       |
| `references`   | object\[] | Reference audio      | \[]        |
| `format`       | string    | Audio format         | "mp3"      |
| `chunk_length` | number    | Chunk size (100-300) | 200        |
| `normalize`    | boolean   | Normalize text       | true       |
| `latency`      | string    | Speed vs quality     | "balanced" |
| `prosody`      | object    | Speed/volume         | None       |
| `temperature`  | number    | Randomness           | 0.7        |
| `top_p`        | number    | Token selection      | 0.7        |

## Next Steps

* [Fine-grained control](/developer-guide/core-features/fine-grained-control) for phoneme-level control and paralanguage
* [Voice cloning](/developer-guide/sdk-guide/javascript/voice-cloning) for custom voices
* [WebSocket streaming](/developer-guide/sdk-guide/javascript/websocket) for real-time apps
* [Guide and Best Practices](/developer-guide/core-features/text-to-speech) for production use
* [API reference](/api-reference/endpoint/openapi-v1/text-to-speech) for direct API calls


# Voice Cloning
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/voice-cloning

Clone voices using reference audio with Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Overview

Voice cloning allows you to generate speech that matches a specific voice using reference audio. Fish Audio supports two approaches:

* Using pre-trained voice models (reference\_id)
* Providing reference audio directly in your request

<Tip>
  Use `reference_id` when you'll reuse a voice multiple times - it's faster and more efficient. Use `references` for one-off voice cloning or testing different voices without creating models.
</Tip>

## Using Reference Audio

Clone a voice by providing reference audio directly:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";
import type { TTSRequest, ReferenceAudio } from "fish-audio";
import { readFile } from "fs/promises";

const fishAudio = new FishAudioClient();

const audioBuffer = await readFile("voice_sample.wav");
const referenceFile = new File([audioBuffer], "voice_sample.wav");

const referenceAudio: ReferenceAudio = {
  audio: referenceFile,
  text: "Text spoken in the reference audio"
};

const request: TTSRequest = {
  text: "Hello, world!",
  references: [referenceAudio]
};

const audio = await client.textToSpeech.convert(request);
```

## Multiple References

Improve voice quality by providing multiple reference samples:

```typescript theme={null}
import type { TTSRequest, ReferenceAudio } from "fish-audio";
import { readFile } from "fs/promises";

const references = [] as ReferenceAudio[];

for (const i of [0, 1, 2]) {
  const buf = await readFile(`sample_${i}.wav`);
  references.push({ audio: new File([buf], `sample_${i}.wav`), text: `Text from sample ${i}` });
}

const request: TTSRequest = {
  text: "Better voice quality with multiple references",
  references,
};
```

## Creating Voice Models

For repeated use, create a persistent voice model:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";

const fishAudio = new FishAudioClient();

// Create a voice model from samples
const response = await fishAudio.voices.ivc.create({
  title: "My Custom Voice",
  voices: [
    createReadStream("voice_0.wav"),
    createReadStream("voice_1.wav"),
    createReadStream("voice_2.wav"),
  ],
  cover_image: createReadStream("cover.png"),
});

console.log("Created model:", response._id);

// Use the model
const audio = await fishAudio.textToSpeech.convert({
  text: "Using my saved voice model",
  reference_id: response._id,
});
```

## Best Practices

### Audio Quality

For best results, reference audio should:

* Be 10-30 seconds long per sample
* Have clear speech without background noise
* Match the language you'll generate
* Include varied intonation and emotion

### Sample Text

The text parameter in ReferenceAudio should:

* Match exactly what's spoken in the audio
* Include punctuation for proper prosody
* Be in the same language as generation

### Performance Tips

1. **Pre-upload models** for frequently used voices
2. **Use 2-3 reference samples** for optimal quality
3. **Keep samples under 30 seconds** each
4. **Normalize audio levels** before uploading

## Audio Format Requirements

Supported formats for reference audio:

* WAV (recommended)
* MP3
* M4A
* Other common audio formats

Sample rates:

* 16kHz minimum
* 44.1kHz recommended
* Mono or stereo (converted to mono)

## Example: Voice Bank

Build a library of cloned voices:

```typescript theme={null}
import { FishAudioClient } from "fish-audio";

const fishAudio = new FishAudioClient();

async function createVoiceBank() {
  const voiceBank: Record<string, string> = {};
  const models = await fishAudio.voices.search();
  for (const m of models.items ?? []) voiceBank[m.title] = m._id as string;
  return voiceBank;
}

async function generateWithVoice(text: string, voiceName: string) {
  const bank = await createVoiceBank();
  const modelId = bank[voiceName];
  if (!modelId) throw new Error(`Voice '${voiceName}' not found`);
  return fishAudio.textToSpeech.convert({ text, reference_id: modelId });
}
```

## Combining with Emotions

Add emotions to cloned voices:

```typescript theme={null}
// With a saved model
await fishAudio.textToSpeech.convert({
  text: "(happy) This is exciting news! (calm) Let me explain the details.",
  reference_id: "your_model_id",
});

// Or with direct references
await fishAudio.textToSpeech.convert({
  text: "(excited) Amazing discovery!",
  references: [referenceAudio],
});
```

## Error Handling

Common issues and solutions:

```typescript theme={null}
try {
  await fishAudio.textToSpeech.convert({ text: "Test speech", references: [referenceAudio] });
} catch (e: any) {
  const msg = String(e?.message || e);
  if (msg.includes("Invalid audio format")) console.error("Check audio format - use WAV or MP3");
  else if (msg.includes("Audio too short")) console.error("Reference audio should be at least 10 seconds");
  else throw e;
}
```


# WebSocket
Source: https://docs.fish.audio/developer-guide/sdk-guide/javascript/websocket

Real-time streaming with Fish Audio JavaScript SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Overview

WebSocket streaming enables real-time text-to-speech generation, perfect for conversational AI, live captioning, and streaming applications.

## Basic Streaming

Stream text and receive audio in real-time:

```typescript theme={null}
import { FishAudioClient, RealtimeEvents } from "fish-audio";
import { writeFile } from "fs/promises";
import path from "path";

// Simple async generator that yields text chunks
async function* makeTextStream() {
  const chunks = [
    "Hello from Fish Audio! ",
    "This is a realtime text-to-speech test. ",
    "We are streaming multiple chunks over WebSocket.",
  ];
  for (const chunk of chunks) {
    yield chunk;
  }
}

const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

// For realtime, set text to "" and stream the content via makeTextStream
const request = { text: "" };

const connection = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());

// Collect audio and write to a file when the stream ends
const chunks: Buffer[] = [];
connection.on(RealtimeEvents.OPEN, () => console.log("WebSocket opened"));
connection.on(RealtimeEvents.AUDIO_CHUNK, (audio: unknown): void => {
  if (audio instanceof Uint8Array || Buffer.isBuffer(audio)) {
    chunks.push(Buffer.from(audio));
  }
});
connection.on(RealtimeEvents.ERROR, (err) => console.error("WebSocket error:", err));
connection.on(RealtimeEvents.CLOSE, async () => {
  const outPath = path.resolve(process.cwd(), "out.mp3");
  await writeFile(outPath, Buffer.concat(chunks));
  console.log("Saved to", outPath);
});
```

<Tip>
  Set `text: ""` in the request when streaming. The actual text comes from your text stream generator.
</Tip>

## Using Voice Models

Stream with a specific voice:

```typescript theme={null}
const request = {
  text: "",                // Empty for streaming
  reference_id: "your_model_id",
  format: "mp3",
};

const conn = await fishAudio.textToSpeech.convertRealtime(request, makeTextStream());
conn.on(RealtimeEvents.AUDIO_CHUNK, () => { /* handle audio */ });
```

## Dynamic Text Generation

Stream text as it's generated:

```typescript theme={null}
async function* generateText() {
  const responses = [
    "Processing your request...",
    "Here's what I found:",
    "The answer is 42.",
  ];
  for (const response of responses) {
    for (const word of response.split(" ")) {
      yield word + " ";
      await new Promise(r => setTimeout(r, 20));
    }
  }
}

await fishAudio.textToSpeech.convertRealtime({ text: "" }, generateText());
```

## Line-by-Line Processing

Stream text line by line:

```typescript theme={null}
import { createReadStream } from "fs";
import readline from "readline";

async function* readFileLines(filepath: string) {
  const rl = readline.createInterface({ input: createReadStream(filepath) });
  for await (const line of rl) {
    yield line.trim() + " ";
  }
}

await fishAudio.textToSpeech.convertRealtime({ text: "" }, readFileLines("story.txt"));
```

## Errors

Handle connection errors via event listeners:

```typescript theme={null}
connection.on(RealtimeEvents.ERROR, (err) => {
  console.error("WebSocket error:", err);
  // Fallback to regular TTS or retry
});
```

## Configuration/Choosing Backend

Customize WebSocket behavior by configuring the client.<br />
Optionally specify the backend model to use.
Our state-of-the-art [S2-Pro model](/developer-guide/models-pricing/models-overview) is the default:

```typescript theme={null}
// Custom endpoint
const fishAudio = new FishAudioClient({
  apiKey: process.env.FISH_API_KEY,
  baseUrl: "https://api.fish.audio", // Use a proxy/custom endpoint if needed
});

// Select backend model
const conn = await fishAudio.textToSpeech.convertRealtime(
  request,
  makeTextStream(),
  backend: "s2-pro"
);
```

## Best Practices

1. **Chunk Size**: Yield text in natural phrases for best prosody
2. **Buffer Management**: Process audio chunks immediately to avoid memory buildup
3. **Connection Reuse**: Keep WebSocket sessions alive for multiple streams
4. **Error Recovery**: Implement retry logic for connection failures
5. **Format Selection**: Use PCM for real-time playback, MP3 for storage

## Events

The connection emits these events:

| Event         | Description                       |
| ------------- | --------------------------------- |
| `OPEN`        | WebSocket connection established  |
| `AUDIO_CHUNK` | Audio chunk received (Uint8Array) |
| `ERROR`       | Error occurred on the connection  |
| `CLOSE`       | Connection closed                 |


# Authentication
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/authentication

Configure API authentication for the Fish Audio Python SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Get Your API Key

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Client Initialization

Initialize the [`FishAudio`](/api-reference/sdk/python/client#fishaudio-objects) client with your API key:

<Tabs>
  <Tab title="Environment Variable (Recommended)">
    The most secure approach is using environment variables:

    ```python theme={null}
    from fishaudio import FishAudio

    # Automatically reads from FISH_API_KEY environment variable
    client = FishAudio()
    ```

    Set the environment variable in your shell:

    ```bash theme={null}
    export FISH_API_KEY=your_api_key_here
    ```

    Or create a `.env` file in your project root:

    ```bash theme={null}
    FISH_API_KEY=your_api_key_here
    ```

    Then load it using `python-dotenv`:

    ```python theme={null}
    from dotenv import load_dotenv
    from fishaudio import FishAudio

    # Load environment variables from .env file
    load_dotenv()

    client = FishAudio()
    ```

    <Tip>
      Using environment variables keeps your API key out of your codebase and makes it easy to use different keys for development and production.
    </Tip>
  </Tab>

  <Tab title="Direct API Key">
    Provide the API key directly when initializing the client:

    ```python theme={null}
    from fishaudio import FishAudio

    client = FishAudio(api_key="your_api_key_here")
    ```

    <Warning>
      This approach is less secure. Never commit code containing your actual API key. Use this only for quick testing or when loading the key from a secure secrets manager.
    </Warning>
  </Tab>

  <Tab title="Custom Base URL">
    If you're using a proxy or custom endpoint:

    ```python theme={null}
    from fishaudio import FishAudio

    client = FishAudio(
        api_key="your_api_key",
        base_url="https://your-proxy-domain.com"
    )
    ```

    This is useful for:

    * Corporate proxies
    * Development/staging environments
    * Self-hosted deployments
  </Tab>
</Tabs>

## Verifying Authentication

Test your authentication by making a simple API call to check your account credits:

```python focus={7-9} theme={null}
from fishaudio import FishAudio
from fishaudio.exceptions import AuthenticationError

try:
    client = FishAudio()

    # Check account credits (requires valid authentication)
    credits = client.account.get_credits()
    print(f"Authentication successful! Credits: {credits.credit}")

except AuthenticationError:
    print("Authentication failed. Check your API key.")
```

Handle [`AuthenticationError`](/api-reference/sdk/python/exceptions#authenticationerror-objects) when verifying authentication. The example uses [`get_credits()`](/api-reference/sdk/python/resources#get_credits) to verify the authentication works.

## Next Steps

<CardGroup>
  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Generate speech with the authenticated client
  </Card>

  <Card title="Voice Cloning" icon="clone" href="/developer-guide/sdk-guide/python/voice-cloning">
    Clone voices and create custom models
  </Card>

  <Card title="Account Management" icon="user" href="/api-reference/sdk/python/resources#account">
    Check credits and manage your account
  </Card>

  <Card title="Error Handling" icon="triangle-exclamation" href="/api-reference/sdk/python/exceptions">
    Handle authentication errors properly
  </Card>
</CardGroup>


# Overview
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/overview

The official Python library for the Fish Audio API

<Visibility>
  <AudioTranscript />
</Visibility>

This guide will walk you through installation, authentication, and core features.

<Note>
  If you're using the legacy Session-based API (`fish_audio_sdk`), see the [migration guide](/archive/python-sdk-legacy/migration-guide) to upgrade to the new SDK.
</Note>

## Installation

<Steps>
  <Step title="Install the SDK">
    Install via pip (Python 3.9 or higher required):

    ```bash theme={null}
    pip install fish-audio-sdk
    ```

    For audio playback utilities, install with the `utils` extra:

    ```bash theme={null}
    pip install fish-audio-sdk[utils]
    ```
  </Step>

  <Step title="Get your API key">
    <AccordionGroup>
      <Accordion icon="user-plus" title="Create a Fish Audio account">
        Sign up for a free Fish Audio account to get started with our API.

        1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
        2. Fill in your details to create an account, complete steps to verify your account.
        3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
      </Accordion>

      <Accordion icon="key" title="Get your API key">
        Once you have an account, you'll need an API key to authenticate your requests.

        1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
        2. Navigate to the API Keys section
        3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
        4. Copy your key and store it securely

        <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Set up authentication">
    Configure your API key using environment variables:

    ```bash theme={null}
    export FISH_API_KEY=your_api_key_here
    ```

    Or create a `.env` file in your project root:

    ```bash theme={null}
    FISH_API_KEY=your_api_key_here
    ```
  </Step>
</Steps>

## Quick Start

Get started with the [`FishAudio`](/api-reference/sdk/python/client#fishaudio-objects) client in less than a minute:

<CodeGroup>
  ```python Synchronous theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play, save

  # Initialize client (reads from FISH_API_KEY environment variable)
  client = FishAudio()

  # Generate and play audio
  audio = client.tts.convert(text="Hello, playing from Fish Audio!")
  play(audio)

  # Generate and save audio
  audio = client.tts.convert(text="Saving this audio to a file!")
  save(audio, "output.mp3")
  ```

  ```python Asynchronous theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play, save

  async def main():
      # Initialize async client
      client = AsyncFishAudio()

      # Generate and play audio
      audio = await client.tts.convert(text="Hello, playing from Fish Audio!")
      play(audio)

      # Generate and save audio
      audio = await client.tts.convert(text="Saving this audio to a file!")
      save(audio, "output.mp3")

  asyncio.run(main())
  ```
</CodeGroup>

## Core Features

### Text-to-Speech

Fully customizable text-to-speech generation:

<CodeGroup>
  ```python Synchronous focus={6-10} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # With a specific voice
  audio = client.tts.convert(
      text="Custom voice",
      reference_id="bf322df2096a46f18c579d0baa36f41d" # Adrian
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-12} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # With a specific voice
      audio = await client.tts.convert(
          text="Custom voice",
          reference_id="bf322df2096a46f18c579d0baa36f41d" # Adrian
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

<CodeGroup>
  ```python Synchronous focus={6-10} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # With speed control
  audio = client.tts.convert(
      text="I'm talking pretty fast, is this still too slow?",
      speed=1.5  # 1.5x speed
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-12} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # With speed control
      audio = await client.tts.convert(
          text="I'm talking pretty fast, is this still too slow?",
          speed=1.5  # 1.5x speed
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

Create reusable configurations with [`TTSConfig`](/api-reference/sdk/python/types#ttsconfig-objects). [`Prosody`](/api-reference/sdk/python/types#prosody-objects) controls speech characteristics like speed and volume:

<CodeGroup>
  ```python Synchronous focus={7-18} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import TTSConfig, Prosody
  from fishaudio.utils import play

  client = FishAudio()

  # Define config once
  my_config = TTSConfig(
      prosody=Prosody(speed=1.2, volume=-5),
      reference_id="933563129e564b19a115bedd57b7406a", # Sarah
      format="wav",
      latency="balanced"
  )

  # Reuse across multiple generations
  audio1 = client.tts.convert(text="Welcome to our product demonstration.", config=my_config)
  audio2 = client.tts.convert(text="Let me show you the key features.", config=my_config)
  audio3 = client.tts.convert(text="Thank you for watching this tutorial.", config=my_config)

  play(audio1)
  play(audio2)
  play(audio3)
  ```

  ```python Asynchronous focus={9-20} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import TTSConfig, Prosody
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Define config once
      my_config = TTSConfig(
          prosody=Prosody(speed=1.2, volume=-5),
          reference_id="933563129e564b19a115bedd57b7406a", # Sarah
          format="wav",
          latency="balanced"
      )

      # Reuse across multiple generations
      audio1 = await client.tts.convert(text="Welcome to our product demonstration.", config=my_config)
      audio2 = await client.tts.convert(text="Let me show you the key features.", config=my_config)
      audio3 = await client.tts.convert(text="Thank you for watching this tutorial.", config=my_config)

      play(audio1)
      play(audio2)
      play(audio3)

  asyncio.run(main())
  ```
</CodeGroup>

<Tip>
  For chunk-by-chunk processing, use [`stream()`](/api-reference/sdk/python/resources#stream) which returns an `AudioStream` (iterable). For real-time streaming with dynamic text, see [Real-time Streaming](#real-time-streaming) below.
</Tip>

Learn more in the [Text-to-Speech guide](/developer-guide/sdk-guide/python/text-to-speech).

### Speech-to-Text

Transcribe audio to text for various use cases:

<CodeGroup>
  ```python Synchronous focus={5-16} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Transcribe audio
  with open("audio.wav", "rb") as f:
      result = client.asr.transcribe(
          audio=f.read(),
          language="en"  # Optional: specify language
      )

  print(result.text)

  # Access segments
  for segment in result.segments:
      print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")
  ```

  ```python Asynchronous focus={7-18} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Transcribe audio
      with open("audio.wav", "rb") as f:
          result = await client.asr.transcribe(
              audio=f.read(),
              language="en"  # Optional: specify language
          )

      print(result.text)

      # Access segments
      for segment in result.segments:
          print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

  asyncio.run(main())
  ```
</CodeGroup>

Learn more in the [Speech-to-Text guide](/developer-guide/sdk-guide/python/speech-to-text).

### Real-time Streaming

Stream dynamically generated text for conversational AI and live applications. Perfect for integrating with LLM streaming responses, live captions, and chatbot interactions:

<CodeGroup>
  ```python Synchronous focus={7-15} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Stream dynamically generated text (e.g., from LLM)
  def text_chunks():
      yield "Hello, "
      yield "this is "
      yield "streaming text!"

  audio_stream = client.tts.stream_websocket(
      text_chunks(),
      latency="balanced"
  )

  play(audio_stream)
  ```

  ```python Asynchronous focus={9-17} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Stream dynamically generated text
      async def text_chunks():
          yield "Hello, "
          yield "this is "
          yield "streaming text!"

      audio_stream = await client.tts.stream_websocket(
          text_chunks(),
          latency="balanced"
      )

      play(audio_stream)

  asyncio.run(main())
  ```
</CodeGroup>

Learn more in the [WebSocket Streaming guide](/developer-guide/sdk-guide/python/websocket).

### Voice Cloning

**Instant voice cloning** - Clone a voice on-the-fly using [`ReferenceAudio`](/api-reference/sdk/python/types#referenceaudio-objects):

<CodeGroup>
  ```python Synchronous focus={6-12} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import ReferenceAudio

  client = FishAudio()

  # Instant voice cloning
  with open("reference.wav", "rb") as f:
      audio = client.tts.convert(
          text="This will sound like the reference voice",
          references=[ReferenceAudio(
              audio=f.read(),
              text="Text spoken in the reference audio"
          )]
      )
  ```

  ```python Asynchronous focus={8-14} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import ReferenceAudio

  async def main():
      client = AsyncFishAudio()

      # Instant voice cloning
      with open("reference.wav", "rb") as f:
          audio = await client.tts.convert(
              text="This will sound like the reference voice",
              references=[ReferenceAudio(
                  audio=f.read(),
                  text="Text spoken in the reference audio"
              )]
          )

  asyncio.run(main())
  ```
</CodeGroup>

**Voice models** - Create persistent voice models for repeated use:

<CodeGroup>
  ```python Synchronous focus={6-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Create persistent voice model
  with open("voice_sample.wav", "rb") as f:
      voice = client.voices.create(
          title="My Custom Voice",
          voices=[f.read()],
          description="Custom voice clone"
      )
  print(f"Created voice: {voice.id}")
  ```

  ```python Asynchronous focus={8-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Create persistent voice model
      with open("voice_sample.wav", "rb") as f:
          voice = await client.voices.create(
              title="My Custom Voice",
              voices=[f.read()],
              description="Custom voice clone"
          )
      print(f"Created voice: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

Learn more in the [Voice Cloning guide](/developer-guide/sdk-guide/python/voice-cloning).

## Client Initialization

<Tabs>
  <Tab title="Environment Variable">
    The recommended approach using environment variables:

    ```python theme={null}
    from fishaudio import FishAudio

    # Automatically reads from FISH_API_KEY environment variable
    client = FishAudio()
    ```
  </Tab>

  <Tab title="Direct API Key">
    Provide the API key directly:

    ```python theme={null}
    from fishaudio import FishAudio

    client = FishAudio(api_key="your_api_key")
    ```

    <Warning>
      Never commit API keys to version control. Use environment variables or secret management systems.
    </Warning>
  </Tab>

  <Tab title="Custom Endpoint">
    Configure a custom base URL:

    ```python theme={null}
    from fishaudio import FishAudio

    client = FishAudio(
        api_key="your_api_key",
        base_url="https://your-proxy-domain.com"
    )
    ```
  </Tab>
</Tabs>

## Sync vs Async

The SDK provides both synchronous and asynchronous clients:

<CodeGroup>
  ```python Synchronous theme={null}
  from fishaudio import FishAudio

  # For typical applications
  client = FishAudio()
  audio = client.tts.convert(text="Hello!")
  ```

  ```python Asynchronous theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      # For async applications (web servers, concurrent tasks)
      client = AsyncFishAudio()
      audio = await client.tts.convert(text="Hello!")

  asyncio.run(main())
  ```
</CodeGroup>

<Tip>
  Use [`AsyncFishAudio`](/api-reference/sdk/python/client#asyncfishaudio-objects) when:

  * Building async web applications (FastAPI, Sanic, etc.)
  * Processing multiple requests concurrently
  * Integrating with other async libraries
  * You need maximum performance
</Tip>

## Resource Clients

The SDK organizes functionality into resource clients:

| Resource                                                                      | Description        | Key Methods                                           |
| ----------------------------------------------------------------------------- | ------------------ | ----------------------------------------------------- |
| [`client.tts`](/api-reference/sdk/python/resources#ttsclient-objects)         | Text-to-speech     | `convert()`, `stream()`, `stream_websocket()`         |
| [`client.asr`](/api-reference/sdk/python/resources#asrclient-objects)         | Speech recognition | `transcribe()`                                        |
| [`client.voices`](/api-reference/sdk/python/resources#voicesclient-objects)   | Voice management   | `list()`, `get()`, `create()`, `update()`, `delete()` |
| [`client.account`](/api-reference/sdk/python/resources#accountclient-objects) | Account info       | `get_credits()`, `get_package()`                      |

## Utility Functions

The SDK includes helpful utilities (requires `utils` extra):

```python theme={null}
from fishaudio.utils import save, play, stream

# Save audio to file
save(audio, "output.mp3")

# Play audio (automatically detects environment)
play(audio)  # Works in Jupyter, regular Python, etc.

# Stream audio in real-time (requires mpv)
stream(audio_iterator)
```

Use [`play()`](/api-reference/sdk/python/utils#play) for playback and [`save()`](/api-reference/sdk/python/utils#save) for writing audio files.

Learn more in the [API Reference - Utils](/api-reference/sdk/python/utils).

## Error Handling

The SDK provides a comprehensive exception hierarchy:

```python theme={null}
from fishaudio import FishAudio
from fishaudio.exceptions import (
    FishAudioError,
    AuthenticationError,
    RateLimitError,
    ValidationError
)

client = FishAudio()

try:
    audio = client.tts.convert(text="Hello!")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
except ValidationError as e:
    print(f"Invalid request: {e}")
except FishAudioError as e:
    print(f"API error: {e}")
```

The SDK includes exceptions for [`AuthenticationError`](/api-reference/sdk/python/exceptions#authenticationerror-objects), [`RateLimitError`](/api-reference/sdk/python/exceptions#ratelimiterror-objects), [`ValidationError`](/api-reference/sdk/python/exceptions#validationerror-objects), and [`FishAudioError`](/api-reference/sdk/python/exceptions#fishaudioerror-objects) for common error scenarios.

Learn more in the [API Reference - Exceptions](/api-reference/sdk/python/exceptions).

## Next Steps

<CardGroup>
  <Card title="Authentication" icon="key" href="/developer-guide/sdk-guide/python/authentication">
    Set up API keys and client configuration
  </Card>

  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Generate natural-sounding speech
  </Card>

  <Card title="Voice Cloning" icon="clone" href="/developer-guide/sdk-guide/python/voice-cloning">
    Clone voices and manage voice models
  </Card>

  <Card title="Speech-to-Text" icon="waveform" href="/developer-guide/sdk-guide/python/speech-to-text">
    Transcribe audio to text
  </Card>

  <Card title="WebSocket Streaming" icon="bolt" href="/developer-guide/sdk-guide/python/websocket">
    Real-time audio streaming
  </Card>

  <Card title="API Reference" icon="book-open" href="/api-reference/sdk/python/overview">
    Complete API documentation
  </Card>
</CardGroup>

## Resources

* [GitHub Repository](https://github.com/fishaudio/fish-audio-python)
* [PyPI Package](https://pypi.org/project/fish-audio-sdk/)
* [Migration Guide](/archive/python-sdk-legacy/migration-guide) - Upgrade from legacy SDK
* [Best Practices](/developer-guide/best-practices/) - Production-ready tips
* [API Reference](/api-reference/sdk/python/) - Detailed documentation


# Speech-to-Text
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/speech-to-text

Transcribe audio to text with the Fish Audio Python SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Basic Transcription

Transcribe audio files to text with automatic language detection using [`asr.transcribe()`](/api-reference/sdk/python/resources#transcribe):

<CodeGroup>
  ```python Synchronous focus={6-10} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Transcribe audio
  with open("audio.mp3", "rb") as f:
      result = client.asr.transcribe(audio=f.read())

  print(f"Transcription: {result.text}")
  print(f"Duration: {result.duration}ms")
  ```

  ```python Asynchronous focus={8-12} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Transcribe audio
      with open("audio.mp3", "rb") as f:
          result = await client.asr.transcribe(audio=f.read())

      print(f"Transcription: {result.text}")
      print(f"Duration: {result.duration}ms")

  asyncio.run(main())
  ```
</CodeGroup>

The [`ASRResponse`](/api-reference/sdk/python/types#asrresponse-objects) object contains the full transcription and segment details.

## Language Specification

Specify the language for more accurate transcription:

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Specify language code
  with open("chinese_audio.mp3", "rb") as f:
      result = client.asr.transcribe(
          audio=f.read(),
          language="zh"  # Chinese
      )

  print(result.text)
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Specify language code
      with open("chinese_audio.mp3", "rb") as f:
          result = await client.asr.transcribe(
              audio=f.read(),
              language="zh"  # Chinese
          )

      print(result.text)

  asyncio.run(main())
  ```
</CodeGroup>

<Tip>
  Auto-detection works well for most cases, but specifying the language can improve accuracy, especially for languages with similar phonetics.
</Tip>

## Segment Timestamps

Access word-level or phrase-level timestamps:

<CodeGroup>
  ```python Synchronous focus={5-14} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Transcribe with segments
  with open("audio.mp3", "rb") as f:
      result = client.asr.transcribe(audio=f.read())

  # Access full text
  print(f"Full text: {result.text}")

  # Iterate through segments
  for segment in result.segments:
      print(f"[{segment.start}ms - {segment.end}ms]: {segment.text}")
  ```

  ```python Asynchronous focus={7-16} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Transcribe with segments
      with open("audio.mp3", "rb") as f:
          result = await client.asr.transcribe(audio=f.read())

      # Access full text
      print(f"Full text: {result.text}")

      # Iterate through segments
      for segment in result.segments:
          print(f"[{segment.start}ms - {segment.end}ms]: {segment.text}")

  asyncio.run(main())
  ```
</CodeGroup>

## Next Steps

<CardGroup>
  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Convert transcribed text back to speech
  </Card>

  <Card title="Voice Cloning" icon="clone" href="/developer-guide/sdk-guide/python/voice-cloning">
    Use transcribed audio for voice cloning
  </Card>

  <Card title="ASR API Reference" icon="book" href="/api-reference/sdk/python/resources#asr">
    Complete ASR API documentation
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/developer-guide/best-practices/">
    Production tips and optimization
  </Card>
</CardGroup>

## Related Resources

* [ASR Types Reference](/api-reference/sdk/python/types#asr) - ASR response data structures
* [Error Handling](/api-reference/sdk/python/exceptions) - Exception types and handling


# Text-to-Speech
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/text-to-speech

Generate natural-sounding speech with the Fish Audio Python SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Understanding TTS Methods

The SDK provides three methods for text-to-speech generation, each optimized for different use cases:

| Method                                                                       | Returns              | Best For                                                                 |
| ---------------------------------------------------------------------------- | -------------------- | ------------------------------------------------------------------------ |
| [`convert()`](/api-reference/sdk/python/resources#convert)                   | Complete audio bytes | Most use cases - simple, gets full audio at once                         |
| [`stream()`](/api-reference/sdk/python/resources#stream)                     | `AudioStream`        | Chunk-by-chunk processing, memory-efficient transfer                     |
| [`stream_websocket()`](/api-reference/sdk/python/resources#stream_websocket) | Audio bytes iterator | Real-time streaming with dynamic text (LLM responses, conversational AI) |

<Tip>
  Use `convert()` for most use cases. Use `stream()` for memory efficiency when handling large files. Use `stream_websocket()` when text is generated dynamically in real-time.
</Tip>

## Basic Usage

Generate speech from text with a single function call:

<CodeGroup>
  ```python Synchronous focus={6-9} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import save, play

  client = FishAudio()

  # Generate speech (returns bytes)
  audio = client.tts.convert(text="Hello, welcome to Fish Audio!")

  # Play or save the audio
  play(audio)
  save(audio, "output.mp3")
  ```

  ```python Asynchronous focus={8-11} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import save, play

  async def main():
      client = AsyncFishAudio()

      # Generate speech (returns bytes)
      audio = await client.tts.convert(text="Hello, welcome to Fish Audio!")

      # Play or save the audio
      play(audio)
      save(audio, "output.mp3")

  asyncio.run(main())
  ```
</CodeGroup>

## Using Voice Models

Specify a voice model for consistent voice characteristics:

<CodeGroup>
  ```python Synchronous focus={6-10} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Use a specific voice
  audio = client.tts.convert(
      text="This uses a specific voice model",
      reference_id="bf322df2096a46f18c579d0baa36f41d"  # Adrian
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-12} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Use a specific voice
      audio = await client.tts.convert(
          text="This uses a specific voice model",
          reference_id="bf322df2096a46f18c579d0baa36f41d"  # Adrian
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

### Finding Voice Models

Get voice model IDs from the Fish Audio website or programmatically:

<CodeGroup>
  ```python Synchronous focus={5-16} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # List available voices
  voices = client.voices.list(language="en", tags="male")

  for voice in voices.items:
      print(f"{voice.title}: {voice.id}")

  # Use a voice from the list
  audio = client.tts.convert(
      text="Generated with discovered voice",
      reference_id=voices.items[0].id
  )
  play(audio)
  ```

  ```python Asynchronous focus={7-18} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # List available voices
      voices = await client.voices.list(language="en", tags="male")

      for voice in voices.items:
          print(f"{voice.title}: {voice.id}")

      # Use a voice from the list
      audio = await client.tts.convert(
          text="Generated with discovered voice",
          reference_id=voices.items[0].id
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

Learn more in the [Voice Cloning guide](/developer-guide/sdk-guide/python/voice-cloning).

## Emotions and Expressions

<Tip>
  The `(parenthesis)` syntax below applies to the S1 model. S2 uses `[bracket]` syntax with natural language descriptions and is not limited to a fixed set of tags. See the [Models Overview](/developer-guide/models-pricing/models-overview#s2-natural-language-control) for details.
</Tip>

Add emotional expressions to make speech more natural:

<CodeGroup>
  ```python Synchronous focus={5-16} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  text = """
  (happy) I'm excited to announce this!
  (sad) Unfortunately, it didn't work out.
  (angry) This is so frustrating!
  (calm) Let me explain the details.
  """

  audio = client.tts.convert(
      text=text,
      reference_id="933563129e564b19a115bedd57b7406a"  # Sarah
  )
  play(audio)
  ```

  ```python Asynchronous focus={7-18} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      text = """
      (happy) I'm excited to announce this!
      (sad) Unfortunately, it didn't work out.
      (angry) This is so frustrating!
      (calm) Let me explain the details.
      """

      audio = await client.tts.convert(
          text=text,
          reference_id="933563129e564b19a115bedd57b7406a"  # Sarah
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

See the [Emotion Reference](/api-reference/emotion-reference) for all available emotions and [Fine-grained Control](/developer-guide/core-features/fine-grained-control) for advanced usage.

## Audio Formats

Choose the output format based on your needs:

<CodeGroup>
  ```python Synchronous focus={5-21} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # MP3 (default) - good balance of quality and size
  audio = client.tts.convert(
      text="MP3 format",
      format="mp3"
  )

  # WAV - uncompressed, highest quality
  audio = client.tts.convert(
      text="WAV format",
      format="wav"
  )

  # PCM - raw audio data for streaming
  audio = client.tts.convert(
      text="PCM format",
      format="pcm"
  )
  ```

  ```python Asynchronous focus={7-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # MP3 (default) - good balance of quality and size
      audio = await client.tts.convert(
          text="MP3 format",
          format="mp3"
      )

      # WAV - uncompressed, highest quality
      audio = await client.tts.convert(
          text="WAV format",
          format="wav"
      )

      # PCM - raw audio data for streaming
      audio = await client.tts.convert(
          text="PCM format",
          format="pcm"
      )

  asyncio.run(main())
  ```
</CodeGroup>

## Prosody Control

Adjust speech speed and volume for natural-sounding output:

<CodeGroup>
  ```python Synchronous focus={6-10} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Simple speed adjustment
  audio = client.tts.convert(
      text="This will be spoken faster",
      speed=1.5  # 1.5x speed (range: 0.5-2.0)
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-12} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Simple speed adjustment
      audio = await client.tts.convert(
          text="This will be spoken faster",
          speed=1.5  # 1.5x speed (range: 0.5-2.0)
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

For combined speed and volume control, use [`TTSConfig`](/api-reference/sdk/python/types#ttsconfig-objects) with [`Prosody`](/api-reference/sdk/python/types#prosody-objects):

<CodeGroup>
  ```python Synchronous focus={7-17} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import TTSConfig, Prosody
  from fishaudio.utils import play

  client = FishAudio()

  # Configure prosody with TTSConfig
  audio = client.tts.convert(
      text="Adjusted speech with custom speed and volume",
      config=TTSConfig(
          prosody=Prosody(
              speed=1.2,   # 20% faster
              volume=5     # Louder (range: -20 to 20)
          )
      )
  )
  play(audio)
  ```

  ```python Asynchronous focus={9-19} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import TTSConfig, Prosody
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Configure prosody with TTSConfig
      audio = await client.tts.convert(
          text="Adjusted speech with custom speed and volume",
          config=TTSConfig(
              prosody=Prosody(
                  speed=1.2,   # 20% faster
                  volume=5     # Louder (range: -20 to 20)
              )
          )
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

## Reusable TTS Configuration

Create a configuration once and reuse it across multiple generations:

<CodeGroup>
  ```python Synchronous focus={5-18} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import TTSConfig, Prosody

  client = FishAudio()

  # Define config once
  my_config = TTSConfig(
      prosody=Prosody(speed=1.2, volume=-5),
      reference_id="bf322df2096a46f18c579d0baa36f41d",  # Adrian
      format="wav",
      latency="balanced"
  )

  # Reuse across multiple generations
  audio1 = client.tts.convert(text="Welcome to our product demonstration.", config=my_config)
  audio2 = client.tts.convert(text="Let me show you the key features.", config=my_config)
  audio3 = client.tts.convert(text="Thank you for watching this tutorial.", config=my_config)
  ```

  ```python Asynchronous focus={7-20} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import TTSConfig, Prosody

  async def main():
      client = AsyncFishAudio()

      # Define config once
      my_config = TTSConfig(
          prosody=Prosody(speed=1.2, volume=-5),
          reference_id="bf322df2096a46f18c579d0baa36f41d",  # Adrian
          format="wav",
          latency="balanced"
      )

      # Reuse across multiple generations
      audio1 = await client.tts.convert(text="Welcome to our product demonstration.", config=my_config)
      audio2 = await client.tts.convert(text="Let me show you the key features.", config=my_config)
      audio3 = await client.tts.convert(text="Thank you for watching this tutorial.", config=my_config)

  asyncio.run(main())
  ```
</CodeGroup>

## Chunk-by-Chunk Streaming

Use `stream()` for memory-efficient transfer and progressive download. Chunks are network transmission units (not semantic audio segments):

<CodeGroup>
  ```python Synchronous focus={5-8} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Collect all chunks efficiently
  audio_stream = client.tts.stream(text="Long text here")
  audio = audio_stream.collect()  # Returns complete audio as bytes
  ```

  ```python Asynchronous focus={7-10} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Collect all chunks efficiently
      audio_stream = await client.tts.stream(text="Long text here")
      audio = await audio_stream.collect()  # Returns complete audio as bytes

  asyncio.run(main())
  ```
</CodeGroup>

For streaming to files or network without buffering in memory:

<CodeGroup>
  ```python Synchronous focus={5-9} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Stream directly to file (memory efficient for large audio)
  audio_stream = client.tts.stream(text="Very long text...")
  with open("output.mp3", "wb") as f:
      for chunk in audio_stream:
          f.write(chunk)  # Write each chunk as it arrives
  ```

  ```python Asynchronous focus={7-11} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Stream directly to file (memory efficient for large audio)
      audio_stream = await client.tts.stream(text="Very long text...")
      with open("output.mp3", "wb") as f:
          async for chunk in audio_stream:
              f.write(chunk)  # Write each chunk as it arrives

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  Use `stream()` when you have complete text upfront. For real-time streaming with dynamically generated text (LLMs, live captions), use `stream_websocket()` instead.
</Note>

## Real-time WebSocket Streaming

For real-time applications where text is generated dynamically, use [`stream_websocket()`](/api-reference/sdk/python/resources#stream_websocket). This is perfect for LLM integrations, conversational AI, and live captions:

### Basic WebSocket Streaming

<CodeGroup>
  ```python Synchronous focus={5-15} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Stream dynamically generated text
  def text_chunks():
      yield "Hello, "
      yield "this is "
      yield "streaming text!"

  audio_stream = client.tts.stream_websocket(
      text_chunks(),
      latency="balanced"
  )

  play(audio_stream)
  ```

  ```python Asynchronous focus={7-16} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Stream dynamically generated text
      async def text_chunks():
          yield "Hello, "
          yield "this is "
          yield "streaming text!"

      audio_stream = await client.tts.stream_websocket(
          text_chunks(),
          latency="balanced"
      )

      play(audio_stream)

  asyncio.run(main())
  ```
</CodeGroup>

### Understanding `FlushEvent`

The [`FlushEvent`](/api-reference/sdk/python/types#flushevent-objects) forces the TTS engine to immediately generate audio from the accumulated text buffer. This is useful when you want to ensure audio is generated at specific points, even if the buffer hasn't reached the optimal chunk size.

<CodeGroup>
  ```python Synchronous focus={6-18} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import FlushEvent

  client = FishAudio()

  # Use FlushEvent to force immediate generation
  def text_with_flush():
      yield "This is the first sentence. "
      yield "This is the second sentence. "
      yield FlushEvent()  # Force audio generation NOW
      yield "This starts a new segment. "
      yield "And continues here."
      yield FlushEvent()  # Force final generation

  audio_stream = client.tts.stream_websocket(text_with_flush())

  # Process each audio chunk as it arrives
  for chunk in audio_stream:
      print(f"Received audio chunk: {len(chunk)} bytes")
  ```

  ```python Asynchronous focus={8-20} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import FlushEvent

  async def main():
      client = AsyncFishAudio()

      # Use FlushEvent to force immediate generation
      async def text_with_flush():
          yield "This is the first sentence. "
          yield "This is the second sentence. "
          yield FlushEvent()  # Force audio generation NOW
          yield "This starts a new segment. "
          yield "And continues here."
          yield FlushEvent()  # Force final generation

      audio_stream = await client.tts.stream_websocket(text_with_flush())

      # Process each audio chunk as it arrives
      async for chunk in audio_stream:
          print(f"Received audio chunk: {len(chunk)} bytes")

  asyncio.run(main())
  ```
</CodeGroup>

<Tip>
  Without `FlushEvent`, the engine automatically generates audio when the buffer reaches an optimal size. Use `FlushEvent` to control exactly when audio should be generated, which can reduce perceived latency in interactive applications.
</Tip>

### `TextEvent` vs Plain Strings

You can yield plain strings (recommended for simplicity) or use [`TextEvent`](/api-reference/sdk/python/types#textevent-objects) for explicit control:

<CodeGroup>
  ```python Synchronous focus={6-17} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import TextEvent

  client = FishAudio()

  # Both approaches are equivalent
  def text_as_strings():
      yield "Hello, "
      yield "world!"

  def text_as_events():
      yield TextEvent(text="Hello, ")
      yield TextEvent(text="world!")

  # Use whichever style you prefer
  audio1 = client.tts.stream_websocket(text_as_strings())
  audio2 = client.tts.stream_websocket(text_as_events())
  ```

  ```python Asynchronous focus={8-19} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import TextEvent

  async def main():
      client = AsyncFishAudio()

      # Both approaches are equivalent
      async def text_as_strings():
          yield "Hello, "
          yield "world!"

      async def text_as_events():
          yield TextEvent(text="Hello, ")
          yield TextEvent(text="world!")

      # Use whichever style you prefer
      audio1 = await client.tts.stream_websocket(text_as_strings())
      audio2 = await client.tts.stream_websocket(text_as_events())

  asyncio.run(main())
  ```
</CodeGroup>

### LLM Integration Pattern

WebSocket streaming shines when integrating with LLM streaming responses. The TTS engine acts as an accumulator, buffering text until it has enough to generate natural-sounding audio:

<CodeGroup>
  ```python Synchronous focus={5-19} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Simulate streaming LLM response
  def llm_stream():
      """Simulates text chunks from an LLM"""
      tokens = [
          "The ", "weather ", "today ", "is ", "sunny ",
          "with ", "clear ", "skies. ", "Perfect ",
          "for ", "outdoor ", "activities!"
      ]
      for token in tokens:
          yield token

  # Stream to speech in real-time
  audio_stream = client.tts.stream_websocket(llm_stream())
  play(audio_stream)
  ```

  ```python Asynchronous focus={7-21} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Simulate streaming LLM response
      async def llm_stream():
          """Simulates text chunks from an LLM"""
          tokens = [
              "The ", "weather ", "today ", "is ", "sunny ",
              "with ", "clear ", "skies. ", "Perfect ",
              "for ", "outdoor ", "activities!"
          ]
          for token in tokens:
              yield token

      # Stream to speech in real-time
      audio_stream = await client.tts.stream_websocket(llm_stream())
      play(audio_stream)

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  The WebSocket connection automatically buffers incoming text and generates audio when it has accumulated enough context for natural-sounding speech. You don't need to manually batch tokens unless you want to force generation at specific points using `FlushEvent`.
</Note>

Learn more in the [WebSocket Streaming guide](/developer-guide/sdk-guide/python/websocket).

## Advanced Configuration

Comprehensive `TTSConfig` with all available parameters:

```python focus={3-24} theme={null}
from fishaudio.types import TTSConfig, Prosody

# All TTSConfig parameters
config = TTSConfig(
    # Audio output settings
    format="mp3",
    sample_rate=44100,         # Custom sample rate (optional)
    mp3_bitrate=192,           # 64, 128, or 192 kbps
    opus_bitrate=64,           # For Opus format: -1000, 24, 32, 48, or 64
    normalize=True,            # Normalize audio levels

    # Generation settings
    chunk_length=200,          # Characters per chunk (100-300)
    latency="balanced",        # "normal" or "balanced"

    # Voice/style settings
    reference_id="bf322df2096a46f18c579d0baa36f41d",  # Adrian
    prosody=Prosody(speed=1.1, volume=0),
    # references=[ReferenceAudio(...)]  # For instant cloning

    # Model parameters
    temperature=0.7,           # Randomness (0.0-1.0)
    top_p=0.7                  # Token selection (0.0-1.0)
)

# Use with any client
audio = client.tts.convert(text="Your text here", config=config)
```

<Tip>
  `TTSConfig` works the same for both sync and async clients. See [TTSConfig API Reference](/api-reference/sdk/python/types#ttsconfig-objects) for detailed documentation on each parameter and their defaults.
</Tip>

## Error Handling

Handle common TTS errors gracefully:

```python theme={null}
from fishaudio import FishAudio
from fishaudio.exceptions import (
    RateLimitError,
    ValidationError,
    NotFoundError,
    FishAudioError
)
import time

client = FishAudio()

try:
    audio = client.tts.convert(
        text="Your text here",
        reference_id="voice_id"
    )
except RateLimitError:
    print("Rate limit exceeded. Please wait before retrying.")
    time.sleep(60)  # Wait before retry
except NotFoundError:
    print("Voice model not found. Check the reference_id")
except ValidationError as e:
    print(f"Invalid request: {e}")
except FishAudioError as e:
    print(f"API error: {e}")
```

Common exceptions include [`RateLimitError`](/api-reference/sdk/python/exceptions#ratelimiterror-objects), [`ValidationError`](/api-reference/sdk/python/exceptions#validationerror-objects), [`NotFoundError`](/api-reference/sdk/python/exceptions#notfounderror-objects), and [`FishAudioError`](/api-reference/sdk/python/exceptions#fishaudioerror-objects).

## Best Practices

<AccordionGroup>
  <Accordion title="Chunk long text appropriately">
    For long texts, adjust `chunk_length` in `TTSConfig`:

    ```python theme={null}
    from fishaudio import FishAudio
    from fishaudio.types import TTSConfig

    client = FishAudio()

    audio = client.tts.convert(
        text="Very long text...",
        config=TTSConfig(chunk_length=250)  # Larger chunks for efficiency
    )
    ```
  </Accordion>

  <Accordion title="Cache frequently used audio">
    If you generate the same speech repeatedly, cache the results:

    ```python theme={null}
    import os
    from fishaudio import FishAudio
    from fishaudio.utils import save

    client = FishAudio()

    def get_or_generate_speech(text, cache_file):
        if os.path.exists(cache_file):
            with open(cache_file, "rb") as f:
                return f.read()

        audio = client.tts.convert(text=text)
        save(audio, cache_file)
        return audio
    ```
  </Accordion>

  <Accordion title="Handle rate limits gracefully">
    Implement exponential backoff for rate limits:

    ```python theme={null}
    from fishaudio import FishAudio
    from fishaudio.exceptions import RateLimitError
    import time

    client = FishAudio()

    def generate_with_retry(text, max_retries=3):
        for attempt in range(max_retries):
            try:
                return client.tts.convert(text=text)
            except RateLimitError as e:
                if attempt < max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                else:
                    raise
    ```
  </Accordion>

  <Accordion title="Use appropriate latency modes">
    Balance speed vs quality based on your use case:

    ```python theme={null}
    from fishaudio import FishAudio

    client = FishAudio()

    # For real-time applications
    audio = client.tts.convert(text="Fast response", latency="balanced")

    # For highest quality
    audio = client.tts.convert(text="Best quality", latency="normal")
    ```
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup>
  <Card title="Voice Cloning" icon="clone" href="/developer-guide/sdk-guide/python/voice-cloning">
    Create custom voice models
  </Card>

  <Card title="WebSocket Streaming" icon="bolt" href="/developer-guide/sdk-guide/python/websocket">
    Real-time audio streaming
  </Card>

  <Card title="Fine-grained Control" icon="sliders" href="/developer-guide/core-features/fine-grained-control">
    Phoneme-level control and paralanguage
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/developer-guide/best-practices/">
    Production tips and optimization
  </Card>
</CardGroup>

## Related Resources

* [TTS API Reference](/api-reference/sdk/python/resources#tts) - Complete API documentation
* [Audio Formats Guide](/developer-guide/core-features/text-to-speech#audio-formats) - Format comparison
* [Emotion Reference](/api-reference/emotion-reference) - All available emotions
* [Utils Reference](/api-reference/sdk/python/utils) - Audio utilities


# Voice Cloning
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/voice-cloning

Clone voices and create custom voice models with the Fish Audio Python SDK

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Instant Voice Cloning

Clone a voice on-the-fly without creating a persistent model using [`ReferenceAudio`](/api-reference/sdk/python/types#referenceaudio-objects):

<CodeGroup>
  ```python Synchronous focus={6-15} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Clone from reference audio
  with open("reference_voice.wav", "rb") as f:
      audio = client.tts.convert(
          text="This will sound like the reference voice",
          references=[ReferenceAudio(
              audio=f.read(),
              text="Text spoken in the reference audio"
          )]
      )
  play(audio)
  ```

  ```python Asynchronous focus={8-17} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Clone from reference audio
      with open("reference_voice.wav", "rb") as f:
          audio = await client.tts.convert(
              text="This will sound like the reference voice",
              references=[ReferenceAudio(
                  audio=f.read(),
                  text="Text spoken in the reference audio"
              )]
          )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  Instant voice cloning is perfect for one-time use cases. For repeated use of the same voice, create a persistent voice model instead.
</Note>

## Multiple Reference Samples

Improve voice quality by providing multiple reference samples:

<CodeGroup>
  ```python Synchronous focus={6-21} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Load multiple reference samples
  references = []
  samples = [
      ("sample1.wav", "First sample transcript"),
      ("sample2.wav", "Second sample transcript"),
      ("sample3.wav", "Third sample transcript")
  ]

  for audio_file, transcript in samples:
      with open(audio_file, "rb") as f:
          references.append(ReferenceAudio(
              audio=f.read(),
              text=transcript
          ))

  # Generate with multiple references
  audio = client.tts.convert(
      text="This voice is trained on multiple samples",
      references=references
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Load multiple reference samples
      references = []
      samples = [
          ("sample1.wav", "First sample transcript"),
          ("sample2.wav", "Second sample transcript"),
          ("sample3.wav", "Third sample transcript")
      ]

      for audio_file, transcript in samples:
          with open(audio_file, "rb") as f:
              references.append(ReferenceAudio(
                  audio=f.read(),
                  text=transcript
              ))

      # Generate with multiple references
      audio = await client.tts.convert(
          text="This voice is trained on multiple samples",
          references=references
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

## Creating Persistent Voice Models

Create a reusable voice model for consistent voice characteristics using [`voices.create()`](/api-reference/sdk/python/resources#create):

<CodeGroup>
  ```python Synchronous focus={5-20} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Prepare voice samples
  voice_samples = []
  with open("voice1.wav", "rb") as f1:
      voice_samples.append(f1.read())
  with open("voice2.wav", "rb") as f2:
      voice_samples.append(f2.read())

  # Create voice model
  voice = client.voices.create(
      title="My Custom Voice",
      voices=voice_samples,
      description="A custom voice for my project",
      tags=["custom", "english"],
      visibility="private"
  )

  print(f"Created voice: {voice.id}")
  ```

  ```python Asynchronous focus={7-22} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Prepare voice samples
      voice_samples = []
      with open("voice1.wav", "rb") as f1:
          voice_samples.append(f1.read())
      with open("voice2.wav", "rb") as f2:
          voice_samples.append(f2.read())

      # Create voice model
      voice = await client.voices.create(
          title="My Custom Voice",
          voices=voice_samples,
          description="A custom voice for my project",
          tags=["custom", "english"],
          visibility="private"
      )

      print(f"Created voice: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### With Transcripts

Providing transcripts is faster and more accurate than automatic transcription. When you provide transcripts, the system skips running ASR (speech recognition), resulting in better performance and quality:

<CodeGroup>
  ```python Synchronous focus={5-27} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Voice samples with transcripts
  samples = [
      ("voice1.wav", "This is the first sample"),
      ("voice2.wav", "This is the second sample"),
      ("voice3.wav", "This is the third sample")
  ]

  voices = []
  texts = []

  for audio_file, transcript in samples:
      with open(audio_file, "rb") as f:
          voices.append(f.read())
      texts.append(transcript)

  # Create voice with transcripts
  voice = client.voices.create(
      title="High Quality Voice",
      voices=voices,
      texts=texts,
      description="Voice with accurate transcripts",
      enhance_audio_quality=True
  )

  print(f"Created voice: {voice.id}")
  ```

  ```python Asynchronous focus={7-29} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Voice samples with transcripts
      samples = [
          ("voice1.wav", "This is the first sample"),
          ("voice2.wav", "This is the second sample"),
          ("voice3.wav", "This is the third sample")
      ]

      voices = []
      texts = []

      for audio_file, transcript in samples:
          with open(audio_file, "rb") as f:
              voices.append(f.read())
          texts.append(transcript)

      # Create voice with transcripts
      voice = await client.voices.create(
          title="High Quality Voice",
          voices=voices,
          texts=texts,
          description="Voice with accurate transcripts",
          enhance_audio_quality=True
      )

      print(f"Created voice: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### Audio Quality Enhancement

Enable automatic audio enhancement to clean up noisy reference audio:

```python theme={null}
voice = client.voices.create(
    title="Enhanced Voice",
    voices=voice_samples,
    enhance_audio_quality=True  # Clean up background noise and normalize levels
)
```

<Note>
  Audio enhancement helps process noisy or lower-quality reference audio. If your audio is already clean and well-recorded, this may not provide additional benefit.
</Note>

## Managing Voice Models

### List Voices

Discover available voices with filtering using [`voices.list()`](/api-reference/sdk/python/resources#list):

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # List all voices
  voices = client.voices.list(page_size=20)
  print(f"Total voices: {voices.total}")

  for voice in voices.items:
      print(f"{voice.title}: {voice.id}")
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # List all voices
      voices = await client.voices.list(page_size=20)
      print(f"Total voices: {voices.total}")

      for voice in voices.items:
          print(f"{voice.title}: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### Filter by Tags and Language

<CodeGroup>
  ```python Synchronous focus={5-21} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Filter by tags
  male_voices = client.voices.list(
      tags=["male", "english"],
      page_size=10
  )

  # Filter by language
  chinese_voices = client.voices.list(
      language="zh",
      page_size=10
  )

  # Get only your own voices
  my_voices = client.voices.list(
      self_only=True,
      page_size=20
  )
  ```

  ```python Asynchronous focus={7-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Filter by tags
      male_voices = await client.voices.list(
          tags=["male", "english"],
          page_size=10
      )

      # Filter by language
      chinese_voices = await client.voices.list(
          language="zh",
          page_size=10
      )

      # Get only your own voices
      my_voices = await client.voices.list(
          self_only=True,
          page_size=20
      )

  asyncio.run(main())
  ```
</CodeGroup>

### Get Voice Details

Use [`voices.get()`](/api-reference/sdk/python/resources#get) to retrieve voice details:

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Get specific voice
  voice = client.voices.get("bf322df2096a46f18c579d0baa36f41d")  # Adrian

  print(f"Title: {voice.title}")
  print(f"Description: {voice.description}")
  print(f"Tags: {voice.tags}")
  print(f"Languages: {voice.languages}")
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Get specific voice
      voice = await client.voices.get("bf322df2096a46f18c579d0baa36f41d")  # Adrian

      print(f"Title: {voice.title}")
      print(f"Description: {voice.description}")
      print(f"Tags: {voice.tags}")
      print(f"Languages: {voice.languages}")

  asyncio.run(main())
  ```
</CodeGroup>

### Update Voice Metadata

Update voice information using [`voices.update()`](/api-reference/sdk/python/resources#update):

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Update voice information
  client.voices.update(
      "bf322df2096a46f18c579d0baa36f41d",  # Adrian
      title="Updated Voice Name",
      description="Updated description",
      visibility="public",  # "public", "unlist", or "private"
      tags=["updated", "english", "male"]
  )
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Update voice information
      await client.voices.update(
          "bf322df2096a46f18c579d0baa36f41d",  # Adrian
          title="Updated Voice Name",
          description="Updated description",
          visibility="public",  # "public", "unlist", or "private"
          tags=["updated", "english", "male"]
      )

  asyncio.run(main())
  ```
</CodeGroup>

### Delete Voice

Remove voice models using [`voices.delete()`](/api-reference/sdk/python/resources#delete):

<CodeGroup>
  ```python Synchronous focus={5-7} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Delete a voice model
  client.voices.delete("bf322df2096a46f18c579d0baa36f41d")  # Adrian
  print("Voice deleted successfully")
  ```

  ```python Asynchronous focus={7-9} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Delete a voice model
      await client.voices.delete("bf322df2096a46f18c579d0baa36f41d")  # Adrian
      print("Voice deleted successfully")

  asyncio.run(main())
  ```
</CodeGroup>

<Warning>
  Deleting a voice is permanent and cannot be undone. Make sure you have backups of any important voice models.
</Warning>

## Next Steps

<CardGroup>
  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Use cloned voices for speech generation
  </Card>

  <Card title="WebSocket Streaming" icon="bolt" href="/developer-guide/sdk-guide/python/websocket">
    Stream audio with custom voices in real-time
  </Card>

  <Card title="Voices API Reference" icon="book" href="/api-reference/sdk/python/resources#voices">
    Complete voice management API documentation
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/developer-guide/best-practices/">
    Production tips and optimization strategies
  </Card>
</CardGroup>

## Related Resources

* [Voice Types Reference](/api-reference/sdk/python/types#voices) - Voice model data structures
* [Audio Formats Guide](/developer-guide/core-features/text-to-speech#audio-formats) - Supported audio formats
* [Fine-grained Control](/developer-guide/core-features/fine-grained-control) - Advanced voice customization


# WebSocket Streaming
Source: https://docs.fish.audio/developer-guide/sdk-guide/python/websocket

Stream text-to-speech in real-time with WebSocket connections

<Visibility>
  <AudioTranscript />
</Visibility>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Overview

Use [`stream_websocket()`](/api-reference/sdk/python/resources#stream_websocket) for real-time text streaming with LLMs and live captions. The connection automatically buffers incoming text and generates audio as it becomes available.

## Basic Usage

Stream text chunks and receive audio in real-time:

<CodeGroup>
  ```python Synchronous focus={5-17} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Define text generator
  def text_chunks():
      yield "Hello, "
      yield "this is "
      yield "real-time "
      yield "streaming!"

  # Stream audio via WebSocket
  audio_stream = client.tts.stream_websocket(
      text_chunks(),
      latency="balanced"  # Use "balanced" for real-time, "normal" for quality
  )

  # Play streamed audio
  play(audio_stream)
  ```

  ```python Asynchronous focus={8-20} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Define async text generator
      async def text_chunks():
          yield "Hello, "
          yield "this is "
          yield "real-time "
          yield "streaming!"

      # Stream audio via WebSocket
      audio_stream = await client.tts.stream_websocket(
          text_chunks(),
          latency="balanced"  # Use "balanced" for real-time, "normal" for quality
      )

      # Play streamed audio
      play(audio_stream)

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  For details on audio formats, voice selection, and advanced configuration options like `TTSConfig`, see the [Text-to-Speech guide](/developer-guide/sdk-guide/python/text-to-speech).
</Note>

## Using FlushEvent

Force immediate audio generation to create pauses using [`FlushEvent`](/api-reference/sdk/python/types#flushevent-objects):

<CodeGroup>
  ```python Synchronous focus={6-12} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import FlushEvent

  client = FishAudio()

  def text_with_flush():
      yield "First sentence. "
      yield "Second sentence. "
      yield FlushEvent()  # Forces generation NOW
      yield "Third sentence."

  audio_stream = client.tts.stream_websocket(text_with_flush())
  ```

  ```python Asynchronous focus={8-14} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import FlushEvent

  async def main():
      client = AsyncFishAudio()

      async def text_with_flush():
          yield "First sentence. "
          yield "Second sentence. "
          yield FlushEvent()  # Forces generation NOW
          yield "Third sentence."

      audio_stream = await client.tts.stream_websocket(text_with_flush())

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  See [Text-to-Speech guide](/developer-guide/sdk-guide/python/text-to-speech#understanding-flushevent) for detailed FlushEvent usage and advanced examples.
</Note>

## LLM Integration

WebSocket streaming is designed for integrating with LLM streaming responses. The TTS engine automatically buffers incoming text chunks and generates audio when it has enough context for natural speech:

<CodeGroup>
  ```python Synchronous focus={5-21} theme={null}
  from fishaudio import FishAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Simulate streaming LLM response
  def llm_stream():
      """Simulates text chunks from an LLM."""
      tokens = [
          "The ", "weather ", "today ", "is ", "sunny ",
          "with ", "clear ", "skies. ", "Perfect ",
          "for ", "outdoor ", "activities!"
      ]
      for token in tokens:
          yield token

  # Stream to speech in real-time
  audio_stream = client.tts.stream_websocket(
      llm_stream(),
      latency="balanced"
  )
  play(audio_stream)
  ```

  ```python Asynchronous focus={7-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Simulate streaming LLM response
      async def llm_stream():
          """Simulates text chunks from an LLM."""
          tokens = [
              "The ", "weather ", "today ", "is ", "sunny ",
              "with ", "clear ", "skies. ", "Perfect ",
              "for ", "outdoor ", "activities!"
          ]
          for token in tokens:
              yield token

      # Stream to speech in real-time
      audio_stream = await client.tts.stream_websocket(
          llm_stream(),
          latency="balanced"
      )
      play(audio_stream)

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  The WebSocket connection automatically buffers incoming text and generates audio when it has accumulated enough context for natural-sounding speech. You don't need to manually batch tokens unless you want to force generation at specific points using `FlushEvent`.
</Note>

## Next Steps

<CardGroup>
  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Learn about non-streaming TTS options, audio formats, TextEvent vs plain strings, and advanced configuration
  </Card>

  <Card title="Voice Cloning" icon="clone" href="/developer-guide/sdk-guide/python/voice-cloning">
    Use custom voices in streams and learn about voice selection
  </Card>

  <Card title="TTS API Reference" icon="book" href="/api-reference/sdk/python/resources#tts">
    Complete streaming API documentation
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/developer-guide/best-practices/">
    Production streaming optimization
  </Card>
</CardGroup>

## Related Resources

* [WebSocket Types](/api-reference/sdk/python/types#tts) - TextEvent, FlushEvent, and more
* [Utils Reference](/api-reference/sdk/python/utils) - Audio playback utilities
* [Error Handling](/api-reference/sdk/python/exceptions) - WebSocket exception handling
* [Fine-grained Control](/developer-guide/core-features/fine-grained-control) - Advanced speech control


# Docker Deployment
Source: https://docs.fish.audio/developer-guide/self-hosting/docker-deployment

Deploy Fish Audio models using Docker containers

<Visibility>
  <AudioTranscript />
</Visibility>

Fish Audio provides Docker images for both WebUI and API server deployments. You can use pre-built images from Docker Hub or build custom images locally.

## Prerequisites

Before deploying with Docker, ensure you have:

* **Docker** and **Docker Compose** installed
* **NVIDIA Docker runtime** (for GPU support)
* At least **12GB GPU memory** for CUDA inference
* Downloaded model weights (see [Running Inference](/developer-guide/self-hosting/running-inference#download-weights))

## Pre-built Images

Fish Audio provides ready-to-use Docker images on Docker Hub:

| Image                                      | Description             | Best For                         |
| ------------------------------------------ | ----------------------- | -------------------------------- |
| `fishaudio/fish-speech:latest-webui-cuda`  | WebUI with CUDA support | Interactive development with GPU |
| `fishaudio/fish-speech:latest-webui-cpu`   | WebUI CPU-only          | Testing without GPU              |
| `fishaudio/fish-speech:latest-server-cuda` | API server with CUDA    | Production deployments with GPU  |
| `fishaudio/fish-speech:latest-server-cpu`  | API server CPU-only     | Low-traffic CPU deployments      |

<Note>
  For production use, we recommend using specific version tags instead of `latest` to ensure consistency across deployments.
</Note>

## Quick Start with Docker Run

The fastest way to get started is using `docker run`:

### WebUI Deployment

```bash theme={null}
# Create directories for model weights and reference audio
mkdir -p checkpoints references

# Start WebUI with CUDA support (recommended)
docker run -d \
    --name fish-speech-webui \
    --gpus all \
    -p 7860:7860 \
    -v ./checkpoints:/app/checkpoints \
    -v ./references:/app/references \
    -e COMPILE=1 \
    fishaudio/fish-speech:latest-webui-cuda

# For CPU-only deployment
docker run -d \
    --name fish-speech-webui-cpu \
    -p 7860:7860 \
    -v ./checkpoints:/app/checkpoints \
    -v ./references:/app/references \
    fishaudio/fish-speech:latest-webui-cpu
```

Access the WebUI at `http://localhost:7860`

### API Server Deployment

```bash theme={null}
# Start API server with CUDA support
docker run -d \
    --name fish-speech-server \
    --gpus all \
    -p 8080:8080 \
    -v ./checkpoints:/app/checkpoints \
    -v ./references:/app/references \
    -e COMPILE=1 \
    fishaudio/fish-speech:latest-server-cuda

# For CPU-only deployment
docker run -d \
    --name fish-speech-server-cpu \
    -p 8080:8080 \
    -v ./checkpoints:/app/checkpoints \
    -v ./references:/app/references \
    fishaudio/fish-speech:latest-server-cpu
```

Access the API documentation at `http://localhost:8080`

<Tip>
  Enable the `COMPILE=1` environment variable for \~10x faster inference on CUDA deployments. This uses `torch.compile` to optimize the model.
</Tip>

## Docker Compose Deployment

For development or customization, Docker Compose provides easier configuration management:

### Setup

```bash theme={null}
# Clone the repository
git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
```

### Start Services

```bash theme={null}
# Start WebUI with CUDA
docker compose --profile webui up

# Start WebUI with compile optimization
COMPILE=1 docker compose --profile webui up

# Start API server
docker compose --profile server up

# Start API server with compile optimization
COMPILE=1 docker compose --profile server up

# For CPU-only deployment
BACKEND=cpu docker compose --profile webui up
```

<Tip>
  Run containers in detached mode by adding the `-d` flag: `docker compose --profile webui up -d`
</Tip>

### Environment Variables

Customize deployment using environment variables or a `.env` file:

```bash theme={null}
# .env file example
BACKEND=cuda              # or cpu
COMPILE=1                 # Enable compile optimization
GRADIO_PORT=7860         # WebUI port
API_PORT=8080            # API server port
UV_VERSION=0.8.15        # UV package manager version
```

## Manual Docker Build

For advanced users who need custom configurations:

### Build WebUI Image

```bash theme={null}
# Build with CUDA support
docker build \
    --platform linux/amd64 \
    -f docker/Dockerfile \
    --build-arg BACKEND=cuda \
    --build-arg CUDA_VER=12.6.0 \
    --build-arg UV_EXTRA=cu126 \
    --target webui \
    -t fish-speech-webui:cuda .

# Build CPU-only (supports multi-platform)
docker build \
    --platform linux/amd64,linux/arm64 \
    -f docker/Dockerfile \
    --build-arg BACKEND=cpu \
    --target webui \
    -t fish-speech-webui:cpu .
```

### Build API Server Image

```bash theme={null}
# Build with CUDA support
docker build \
    --platform linux/amd64 \
    -f docker/Dockerfile \
    --build-arg BACKEND=cuda \
    --build-arg CUDA_VER=12.6.0 \
    --build-arg UV_EXTRA=cu126 \
    --target server \
    -t fish-speech-server:cuda .
```

### Build Development Image

```bash theme={null}
# Build development image with all tools
docker build \
    --platform linux/amd64 \
    -f docker/Dockerfile \
    --build-arg BACKEND=cuda \
    --target dev \
    -t fish-speech-dev:cuda .
```

### Build Arguments

| Argument     | Options                   | Default  | Description         |
| ------------ | ------------------------- | -------- | ------------------- |
| `BACKEND`    | `cuda`, `cpu`             | `cuda`   | Compute backend     |
| `CUDA_VER`   | `12.6.0`, etc.            | `12.6.0` | CUDA version        |
| `UV_EXTRA`   | `cu126`, `cu128`, `cu129` | `cu126`  | UV extra for CUDA   |
| `UBUNTU_VER` | `24.04`, etc.             | `24.04`  | Ubuntu base version |
| `PY_VER`     | `3.12`, etc.              | `3.12`   | Python version      |

## Volume Mounts

Both Docker run and Compose methods require these volume mounts:

| Host Path       | Container Path     | Purpose                                 |
| --------------- | ------------------ | --------------------------------------- |
| `./checkpoints` | `/app/checkpoints` | Model weights directory                 |
| `./references`  | `/app/references`  | Reference audio files for voice cloning |

<Warning>
  Ensure model weights are downloaded and placed in the `./checkpoints` directory before starting containers. See [Running Inference](/developer-guide/self-hosting/running-inference#download-weights) for download instructions.
</Warning>

## Environment Variables Reference

### WebUI Configuration

| Variable             | Default   | Description                  |
| -------------------- | --------- | ---------------------------- |
| `GRADIO_SERVER_NAME` | `0.0.0.0` | WebUI server host            |
| `GRADIO_SERVER_PORT` | `7860`    | WebUI server port            |
| `GRADIO_SHARE`       | `false`   | Enable Gradio public sharing |

### API Server Configuration

| Variable          | Default   | Description     |
| ----------------- | --------- | --------------- |
| `API_SERVER_NAME` | `0.0.0.0` | API server host |
| `API_SERVER_PORT` | `8080`    | API server port |

### Model Configuration

| Variable                  | Default                                   | Description                |
| ------------------------- | ----------------------------------------- | -------------------------- |
| `LLAMA_CHECKPOINT_PATH`   | `checkpoints/openaudio-s1-mini`           | Path to model weights      |
| `DECODER_CHECKPOINT_PATH` | `checkpoints/openaudio-s1-mini/codec.pth` | Path to decoder weights    |
| `DECODER_CONFIG_NAME`     | `modded_dac_vq`                           | Decoder configuration name |

### Performance Optimization

| Variable  | Default | Description                                        |
| --------- | ------- | -------------------------------------------------- |
| `COMPILE` | `0`     | Enable torch.compile for \~10x speedup (CUDA only) |

## Container Management

### View Logs

```bash theme={null}
# Docker run
docker logs fish-speech-webui

# Docker Compose
docker compose logs webui
```

### Stop Containers

```bash theme={null}
# Docker run
docker stop fish-speech-webui

# Docker Compose
docker compose down
```

### Update Images

```bash theme={null}
# Pull latest images
docker pull fishaudio/fish-speech:latest-webui-cuda

# Restart containers with new image
docker compose --profile webui up -d
```

## GPU Support

### Prerequisites

Install NVIDIA Container Toolkit:

```bash theme={null}
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```

### Verify GPU Access

```bash theme={null}
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi
```

<Warning>
  GPU support requires NVIDIA Docker runtime. For CPU-only deployment, remove the `--gpus all` flag and use CPU images.
</Warning>

## Troubleshooting

### Container Won't Start

Check logs for errors:

```bash theme={null}
docker logs fish-speech-webui
```

Common issues:

* Missing model weights in `./checkpoints`
* Port already in use (change port mapping)
* Insufficient GPU memory

### GPU Not Detected

Verify NVIDIA Docker runtime is installed:

```bash theme={null}
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi
```

### Performance Issues

1. Enable compile optimization: `COMPILE=1`
2. Ensure GPU is being used (check with `nvidia-smi`)
3. Verify sufficient GPU memory is available

## Next Steps

* **[Run inference](/developer-guide/self-hosting/running-inference)** - Learn how to generate speech
* **[Download models](https://huggingface.co/fishaudio)** - Get pre-trained weights
* **[API documentation](/api-reference/introduction)** - Integrate with your applications


# Local Model Setup
Source: https://docs.fish.audio/developer-guide/self-hosting/local-setup

Install and configure Fish Audio models for local inference

<Visibility>
  <AudioTranscript />
</Visibility>

<Note>
  This guide is for advanced users who want to self-host Fish Audio models. For most users, we recommend using the [Fish Audio API](https://fish.audio) for easier integration and automatic updates.
</Note>

## Prerequisites

Before you begin, ensure you have:

* **GPU**: 12GB VRAM minimum (for inference)
* **OS**: Linux or WSL (Windows Subsystem for Linux)
* **System dependencies**: Audio processing libraries

Install required system packages:

```bash theme={null}
apt install portaudio19-dev libsox-dev ffmpeg
```

## Installation Methods

Fish Audio supports multiple installation methods. Choose the one that best fits your development environment.

### Conda Installation

Conda provides a stable, isolated Python environment:

```bash theme={null}
# Create a new environment with Python 3.12
conda create -n fish-speech python=3.12
conda activate fish-speech

# GPU installation (choose your CUDA version: cu126, cu128, cu129)
pip install -e .[cu129]

# CPU-only installation (slower, not recommended for production)
pip install -e .[cpu]

# Default installation (uses PyTorch default index)
pip install -e .
```

<Tip>
  For best performance, match your CUDA version with your GPU driver. Use `nvidia-smi` to check your CUDA version.
</Tip>

### UV Installation

[UV](https://github.com/astral-sh/uv) provides faster dependency resolution and installation:

```bash theme={null}
# GPU installation (choose your CUDA version: cu126, cu128, cu129)
uv sync --python 3.12 --extra cu129

# CPU-only installation
uv sync --python 3.12 --extra cpu
```

<Note>
  UV is recommended for faster setup times, especially when working with large dependency trees.
</Note>

### Intel Arc XPU Support

For Intel Arc GPU users, install with XPU support:

```bash theme={null}
# Create environment
conda create -n fish-speech python=3.12
conda activate fish-speech

# Install required C++ standard library
conda install libstdcxx -c conda-forge

# Install PyTorch with Intel XPU support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu

# Install Fish Speech
pip install -e .
```

<Warning>
  The `--compile` optimization flag is not supported on Windows and macOS. To use compile acceleration, you need to install Triton manually.
</Warning>

## Repository Setup

Clone the Fish Speech repository to get started:

```bash theme={null}
git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
```

Then follow one of the installation methods above.

## Next Steps

Once installation is complete, you can:

* **[Set up Docker deployment](/developer-guide/self-hosting/docker-deployment)** - Use containerized deployment for easier management
* **[Run inference](/developer-guide/self-hosting/running-inference)** - Start generating speech with your local models
* **Download models** - Get pre-trained weights from [Hugging Face](https://huggingface.co/fishaudio)

## Hardware Recommendations

For optimal performance:

| Use Case    | Recommended GPU | VRAM  | Expected Speed          |
| ----------- | --------------- | ----- | ----------------------- |
| Development | RTX 3060        | 12GB  | \~1:15 real-time factor |
| Production  | RTX 4090        | 24GB  | \~1:7 real-time factor  |
| Enterprise  | A100            | 40GB+ | \~1:5 real-time factor  |

<Info>
  Real-time factor indicates how much faster than real-time the model can generate audio. For example, 1:7 means generating 1 minute of audio takes \~8.5 seconds.
</Info>

## Troubleshooting

### CUDA Out of Memory

If you encounter CUDA out of memory errors:

1. Reduce batch size in inference settings
2. Use `--half` flag for FP16 inference
3. Close other GPU-intensive applications

### Package Installation Errors

If you encounter dependency conflicts:

1. Try using UV instead of pip for better dependency resolution
2. Create a fresh conda environment
3. Ensure you're using Python 3.12 (other versions may have compatibility issues)

## Community Support

Need help with local setup?

* Join our [Discord community](https://discord.gg/dF9Db2Tt3Y) for community support
* Check [GitHub Issues](https://github.com/fishaudio/fish-speech/issues) for known problems
* Contact [enterprise support](mailto:support@fish.audio) for commercial deployments


# Running Inference
Source: https://docs.fish.audio/developer-guide/self-hosting/running-inference

Generate speech using self-hosted Fish Audio models

<Visibility>
  <AudioTranscript />
</Visibility>

Fish Audio supports multiple inference methods: command line, HTTP API, WebUI, and GUI. Choose the method that best fits your workflow.

<Note>
  This guide assumes you have already [installed Fish Audio locally](/developer-guide/self-hosting/local-setup) or [set up Docker deployment](/developer-guide/self-hosting/docker-deployment).
</Note>

## Download Weights

Before running inference, download the required model weights from Hugging Face:

```bash theme={null}
# Install Hugging Face CLI (if not already installed)
pip install huggingface_hub[cli]
# or
uv tool install huggingface_hub[cli]

# Download Fish Audio S1-mini weights
hf download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```

<Info>
  **Fish Audio S1-mini** is the open-source distilled version (0.5B parameters) optimized for local deployment. The full **S1** model (4B parameters) is available exclusively on [Fish Audio cloud](https://fish.audio).
</Info>

## Command Line Inference

Command line inference provides maximum control and is ideal for scripting and batch processing.

### Step 1: Extract VQ Tokens from Reference Audio

First, encode your reference audio to get voice characteristics:

```bash theme={null}
python fish_speech/models/dac/inference.py \
    -i "reference_audio.wav" \
    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```

This generates two files:

* `fake.npy` - VQ tokens representing voice characteristics
* `fake.wav` - Reconstructed audio for verification

<Tip>
  **Skip this step if you want random voice generation** - the model can generate speech without reference audio.
</Tip>

### Step 2: Generate Semantic Tokens from Text

Convert your text to semantic tokens using the language model:

```bash theme={null}
python fish_speech/models/text2semantic/inference.py \
    --text "The text you want to convert to speech" \
    --prompt-text "Transcription of your reference audio" \
    --prompt-tokens "fake.npy" \
    --compile
```

**Parameters:**

* `--text`: The text to synthesize
* `--prompt-text`: Transcription of the reference audio (for voice cloning)
* `--prompt-tokens`: Path to VQ tokens from Step 1 (for voice cloning)
* `--compile`: Enable kernel fusion for faster inference (\~10x speedup on RTX 4090)

<Note>
  For random voice generation, omit `--prompt-text` and `--prompt-tokens` parameters.
</Note>

This creates a file named `codes_N.npy` (where N starts from 0) containing semantic tokens.

<Warning>
  For GPUs that don't support bf16 (bfloat16), add the `--half` flag to use fp16 instead.
</Warning>

### Step 3: Generate Audio from Semantic Tokens

Finally, convert semantic tokens to audio:

```bash theme={null}
python fish_speech/models/dac/inference.py \
    -i "codes_0.npy"
```

This generates the final audio file.

### Full Example

Here's a complete workflow for voice cloning:

```bash theme={null}
# 1. Encode reference audio
python fish_speech/models/dac/inference.py \
    -i "my_voice.wav" \
    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"

# 2. Generate semantic tokens
python fish_speech/models/text2semantic/inference.py \
    --text "Hello, this is a test of voice cloning." \
    --prompt-text "This is my reference voice recording." \
    --prompt-tokens "fake.npy" \
    --compile

# 3. Generate final audio
python fish_speech/models/dac/inference.py \
    -i "codes_0.npy"
```

## HTTP API Inference

The HTTP API provides a programmatic interface for integrations and production deployments.

### Start API Server

```bash theme={null}
# With local installation
python -m tools.api_server \
    --listen 0.0.0.0:8080 \
    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
    --decoder-config-name modded_dac_vq

# With UV
uv run tools/api_server.py \
    --listen 0.0.0.0:8080 \
    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
    --decoder-config-name modded_dac_vq
```

<Tip>
  Add the `--compile` flag to enable torch.compile optimization for faster inference.
</Tip>

### Access API Documentation

Once the server is running, access the interactive API documentation at:

```
http://localhost:8080/docs
```

The API provides endpoints for:

* Text-to-speech synthesis
* Voice cloning with reference audio
* Batch processing
* Model information

### Example API Request

```bash theme={null}
curl -X POST "http://localhost:8080/v1/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test",
    "reference_audio": "base64_encoded_audio",
    "reference_text": "Reference transcription"
  }'
```

## WebUI Inference

The WebUI provides an intuitive interface for interactive testing and development.

### Start WebUI

```bash theme={null}
# With all parameters
python -m tools.run_webui \
    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
    --decoder-config-name modded_dac_vq

# Or use defaults (auto-detects models in checkpoints/)
python -m tools.run_webui
```

<Tip>
  Add the `--compile` flag for faster inference during interactive sessions.
</Tip>

### Access WebUI

The WebUI starts on port 7860 by default. Access it at:

```
http://localhost:7860
```

### Configure with Environment Variables

Customize the WebUI using Gradio environment variables:

```bash theme={null}
# Enable public sharing
GRADIO_SHARE=1 python -m tools.run_webui

# Change server port
GRADIO_SERVER_PORT=8080 python -m tools.run_webui

# Change server name
GRADIO_SERVER_NAME=0.0.0.0 python -m tools.run_webui
```

### Using Reference Audio Library

For faster workflow, pre-save reference audio:

1. Create a `references/` directory in the project root
2. Create subdirectories named by voice ID: `references/<voice_id>/`
3. Place files in each subdirectory:
   * `sample.wav` - Reference audio file
   * `sample.lab` - Text transcription of the audio

Example structure:

```
references/
├── alice/
│   ├── sample.wav
│   └── sample.lab
└── bob/
    ├── sample.wav
    └── sample.lab
```

These references will appear as selectable options in the WebUI.

## GUI Inference

For users who prefer a native desktop application, a PyQt6-based GUI is available.

### Download GUI Client

Download the latest release from the [Fish Speech GUI repository](https://github.com/AnyaCoder/fish-speech-gui/releases).

**Supported platforms:**

* Linux
* Windows
* macOS

### Connect to API Server

The GUI client connects to a running API server (see [HTTP API Inference](#http-api-inference) above).

1. Start the API server
2. Launch the GUI client
3. Configure the API endpoint (default: `http://localhost:8080`)

## Docker Inference

If you're using Docker deployment, refer to the [Docker Deployment guide](/developer-guide/self-hosting/docker-deployment) for detailed instructions on:

* Running pre-built WebUI containers
* Running pre-built API server containers
* Customizing container configuration
* Volume mounts for models and references

Quick example:

```bash theme={null}
# Start WebUI with Docker
docker run -d \
    --name fish-speech-webui \
    --gpus all \
    -p 7860:7860 \
    -v ./checkpoints:/app/checkpoints \
    -v ./references:/app/references \
    -e COMPILE=1 \
    fishaudio/fish-speech:latest-webui-cuda
```

## Performance Optimization

### Enable Compilation

Torch compilation provides \~10x speedup on compatible GPUs:

```bash theme={null}
# Add --compile flag to any inference command
python -m tools.api_server --compile ...
```

<Warning>
  Compilation requires:

  * CUDA-compatible GPU
  * Triton library (not supported on Windows/macOS)
  * First run will be slow due to compilation overhead
</Warning>

### Use Mixed Precision

For GPUs without bf16 support, use fp16:

```bash theme={null}
python fish_speech/models/text2semantic/inference.py --half ...
```

### Batch Processing

For multiple audio generations, use batch processing to amortize model loading overhead:

```python theme={null}
# Example batch processing script
import fish_speech

model = fish_speech.load_model("checkpoints/openaudio-s1-mini")

texts = ["First sentence", "Second sentence", "Third sentence"]
for text in texts:
    audio = model.synthesize(text)
    audio.save(f"output_{texts.index(text)}.wav")
```

## Emotion Control

Fish Audio S1 supports emotional markers for expressive speech synthesis:

### Basic Emotions

```
(angry) (sad) (excited) (surprised) (satisfied) (delighted)
(scared) (worried) (upset) (nervous) (frustrated) (depressed)
(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
(grateful) (confident) (interested) (curious) (confused) (joyful)
```

### Advanced Emotions

```
(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
(keen) (disapproving) (negative) (denying) (astonished) (serious)
(sarcastic) (conciliative) (comforting) (sincere) (sneering)
(hesitating) (yielding) (painful) (awkward) (amused)
```

### Tone Markers

```
(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
```

### Special Effects

```
(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
(groaning) (crowd laughing) (background laughter) (audience laughing)
```

### Example Usage

```bash theme={null}
python fish_speech/models/text2semantic/inference.py \
    --text "(excited)This is amazing! (laughing)Ha ha ha!" \
    --compile
```

<Info>
  Emotion control is currently supported for English, Chinese, and Japanese. More languages coming soon!
</Info>

For more details, see the [Emotion Reference](/api-reference/emotion-reference).

## Troubleshooting

### Out of Memory Errors

If you encounter CUDA out of memory errors:

1. Reduce input text length
2. Use `--half` flag for fp16 inference
3. Close other GPU applications
4. Use a smaller batch size

### Slow Inference

To improve speed:

1. Enable `--compile` flag
2. Verify GPU is being used (check with `nvidia-smi`)
3. Ensure CUDA version matches PyTorch installation
4. Use fp16 instead of bf16 on older GPUs

### Poor Audio Quality

For better quality:

1. Use high-quality reference audio (clear, no background noise)
2. Ensure reference text accurately matches reference audio
3. Use 10-30 seconds of reference audio
4. See [Voice Cloning Best Practices](/developer-guide/best-practices/voice-cloning)

### Model Loading Errors

If models fail to load:

1. Verify model weights are downloaded completely
2. Check checkpoint paths are correct
3. Ensure sufficient disk space
4. Re-download weights if corrupted

## Next Steps

* **[Emotion Control Best Practices](/developer-guide/best-practices/emotion-control)** - Master expressive speech
* **[Voice Cloning Best Practices](/developer-guide/best-practices/voice-cloning)** - Optimize voice cloning quality
* **[API Reference](/api-reference/introduction)** - Integrate with your applications
* **[Cloud API](https://fish.audio)** - Compare with managed service performance


# Tutorials & Examples
Source: https://docs.fish.audio/developer-guide/tutorials/tutorials

Step-by-step guides and code examples for Fish Audio features

<Visibility>
  <AudioTranscript />
</Visibility>

<Info>
  Coming soon! We're preparing comprehensive tutorials and examples to help you get the most out of Fish Audio.
</Info>

We're working on tutorials for:

* Building your first TTS application
* Creating custom voice models
* Implementing real-time streaming
* Building interactive voice applications
* Advanced emotion and prosody control
* Multi-speaker conversations

In the meantime, check out:

* [Quickstart Guide](/developer-guide/getting-started/quickstart) for getting started
* [Python SDK Examples](/developer-guide/sdk-guide/python/text-to-speech) for code samples
* [JavaScript SDK Examples](/developer-guide/sdk-guide/javascript/text-to-speech) for code samples
* [Guide and Best Practices](/developer-guide/core-features/text-to-speech) for optimization tips

Join our [Discord](https://discord.gg/dF9Db2Tt3Y) for updates and community examples.