fishaudio.types.voices
Voice and model management types.Sample Objects
title- Title/name of the audio sampletext- Transcription of the spoken content in the sampletask_id- Unique identifier for the sample taskaudio- URL or path to the audio file
Author Objects
id- Unique author identifiernickname- Author’s display nameavatar- URL to author’s avatar image
Voice Objects
id- Unique voice model identifier (use as reference_id in TTS)type- Model type. Options: “svc” (singing voice conversion), “tts” (text-to-speech)title- Voice model title/namedescription- Detailed description of the voice modelcover_image- URL to the voice model’s cover imagetrain_mode- Training mode used. Options: “fast”state- Current model state (e.g., “ready”, “training”, “failed”)tags- List of tags for categorization (e.g., [“male”, “english”, “young”])samples- List of audio samples demonstrating the voicecreated_at- Timestamp when the model was createdupdated_at- Timestamp when the model was last updatedlanguages- List of supported language codes (e.g., [“en”, “zh”])visibility- Model visibility. Options: “public”, “private”, “unlisted”lock_visibility- Whether visibility setting is lockedlike_count- Number of likes the model has receivedmark_count- Number of bookmarks/favoritesshared_count- Number of times the model has been sharedtask_count- Number of times the model has been used for generationliked- Whether the current user has liked this model. Default: Falsemarked- Whether the current user has bookmarked this model. Default: Falseauthor- Information about the voice model’s creator
fishaudio.types.tts
TTS-related types.ReferenceAudio Objects
audio- Audio file bytes for the reference sampletext- Transcription of what is spoken in the reference audio. Should match exactly what’s spoken and include punctuation for proper prosody.
Prosody Objects
speed- Speech speed multiplier. Range: 0.5-2.0. Default: 1.0.Examples- 1.5 = 50% faster, 0.8 = 20% slowervolume- Volume adjustment in decibels. Range: -20.0 to 20.0. Default: 0.0 (no change). Positive values increase volume, negative values decrease it.
from_speed_override
speed- Speed value to usebase- Base prosody to preserve volume from (if any)
TTSConfig Objects
format- Audio output format. Options: “mp3”, “wav”, “pcm”, “opus”. Default: “mp3”sample_rate- Audio sample rate in Hz. If None, uses format-specific default.mp3_bitrate- MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128opus_bitrate- Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32normalize- Whether to normalize/clean the input text. Default: Truechunk_length- Characters per generation chunk. Range: 100-300. Default: 200. Lower values = faster initial response, higher values = better qualitylatency- Generation mode. Options: “normal” (higher quality), “balanced” (faster). Default: “balanced”reference_id- Voice model ID from fish.audio (e.g., “802e3bc2b27e49c2995d23ef70e6ac89”). Find IDs in voice URLs or via voices.list()references- List of reference audio samples for instant voice cloning. Default: []prosody- Speech speed and volume settings. Default: None (uses natural prosody)top_p- Nucleus sampling parameter for token selection. Range: 0.0-1.0. Default: 0.7temperature- Randomness in generation. Range: 0.0-1.0. Default: 0.7. Higher = more varied, lower = more consistent
TTSRequest Objects
text- Text to synthesize into speechchunk_length- Characters per generation chunk. Range: 100-300. Default: 200format- Audio output format. Options: “mp3”, “wav”, “pcm”, “opus”. Default: “mp3”sample_rate- Audio sample rate in Hz. If None, uses format-specific defaultmp3_bitrate- MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128opus_bitrate- Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32references- List of reference audio samples for voice cloning. Default: []reference_id- Voice model ID for using a specific voice. Default: Nonenormalize- Whether to normalize/clean the input text. Default: Truelatency- Generation mode. Options: “normal”, “balanced”. Default: “balanced”prosody- Speech speed and volume settings. Default: Nonetop_p- Nucleus sampling for token selection. Range: 0.0-1.0. Default: 0.7temperature- Randomness in generation. Range: 0.0-1.0. Default: 0.7
StartEvent Objects
event- Event type identifier, always “start”request- TTS configuration for the streaming session
TextEvent Objects
event- Event type identifier, always “text”text- Text chunk to synthesize
FlushEvent Objects
event- Event type identifier, always “flush”
CloseEvent Objects
event- Event type identifier, always “stop”
fishaudio.types.account
Account-related types (credits, packages, etc.).Credits Objects
id- Unique credits record identifieruser_id- User identifiercredit- Current credit balance (decimal for precise accounting)created_at- Timestamp when the credits record was createdupdated_at- Timestamp when the credits were last updatedhas_phone_sha256- Whether the user has a verified phone number. Optionalhas_free_credit- Whether the user has received free credits. Optional
Package Objects
id- Unique package identifieruser_id- User identifiertype- Package type identifiertotal- Total units in the packagebalance- Remaining units in the packagecreated_at- Timestamp when the package was purchasedupdated_at- Timestamp when the package was last updatedfinished_at- Timestamp when the package was fully consumed. None if still active
fishaudio.types.asr
ASR (Automatic Speech Recognition) related types.ASRSegment Objects
text- The transcribed text for this segmentstart- Segment start time in secondsend- Segment end time in seconds
ASRResponse Objects
text- Complete transcription of the entire audioduration- Total audio duration in millisecondssegments- List of timestamped text segments. Empty if include_timestamps=False
duration
Duration in millisecondsfishaudio.types.shared
Shared types used across the SDK.PaginatedResponse Objects
total- Total number of items across all pagesitems- List of items on the current page

