{}

Listen to Page Powered by Fish Audio S2 Pro {resolvedVoices.length > 1 ?

{isDropdownOpen &&

{resolvedVoices.map((voice, index) => )}

}

{}

; }; ## Available Models Fish Audio offers state-of-the-art text-to-speech models optimized for different use cases and performance requirements. ### Recommended Model **Fish Audio S2-Pro** - Our next-generation TTS model with best-in-class performance * Natural language control with `[bracket]` syntax — not limited to a fixed set (e.g., `[whispers sweetly]`, `[laughing nervously]`) * Multi-speaker dialogue support **(S2-Pro exclusive)** * 80+ languages * 100ms time-to-first-audio * Full SGLang-based serving stack * Open-source We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations. ### Previous Model **Fish Audio S1** - High-quality voice generation * 4 billion parameters * 0.008 WER (0.8% word error rate) * Full emotional control capabilities with `(parenthesis)` syntax ## Model Specifications ### Fish Audio S1 Performance Metrics * **Word Error Rate (WER)**: 0.008 (0.8%) * **Character Error Rate (CER)**: 0.004 (0.4%) * **Real-time Factor**: \~1:7 on standard hardware * **TTS-Arena2 Ranking**: #1 worldwide ## Supported Languages ### S2-Pro S2-Pro supports 80+ languages with automatic language detection and inline emotion and paralinguistic cue support. Language detection is automatic - simply provide text in your target language. ### S1 S1 supports text-to-speech generation in 13 languages with full emotional expression capabilities. ``` English, Chinese, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, Portuguese ``` ## Voice Styles and Emotions Fish Audio models support emotional expressions and voice styles that can be controlled through text markers in your input. ### S2-Pro Natural Language Control S2-Pro treats `[bracket]` tags as standard text rather than dedicated control tokens. Through training on massive datasets, the model learned implicit mappings between natural language descriptions and acoustic variations. This means you are not limited to a predefined set of tags — you can use any descriptive expression and the model will interpret it, such as `[whispers sweetly]` or `[laughing nervously]`. Common examples include: ``` [whisper] [laugh] [emphasis] [sigh] [gasp] [pause] [angry] [excited] [sad] [surprised] [inhale] [exhale] ``` S2-Pro cues can be placed anywhere in your text to control emotion at specific positions. For example: `"I can't believe it [gasp] you actually did it [laugh]"` ### S1 Voice Styles and Emotions S1 supports 64+ emotional expressions using `(parenthesis)` syntax. ### Basic Emotions (24 expressions) ``` (angry) (sad) (excited) (surprised) (satisfied) (delighted) (scared) (worried) (upset) (nervous) (frustrated) (depressed) (empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) ``` ### Advanced Emotions (25 expressions) ``` (disdainful) (unhappy) (anxious) (hysterical) (indifferent) (impatient) (guilty) (scornful) (panicked) (furious) (reluctant) (keen) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused) ``` ### Tone Markers (5 expressions) ``` (in a hurry tone) (shouting) (screaming) (whispering) (soft tone) ``` ### Audio Effects (10 expressions) ``` (laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting) (groaning) (crowd laughing) (background laughter) (audience laughing) ``` You can also use natural expressions like "Ha,ha,ha" for laughter. Experiment with combinations to achieve the perfect emotional tone for your application. ## Support Need help? Check out these resources: * [API Reference](/api-reference/introduction) - Complete API documentation * [Create a Voice Clone](/api-reference/endpoint/model/create-model) - Create a voice clone model * [Generate Speech](/api-reference/endpoint/openapi-v1/text-to-speech) - Generate realistic speech * [Real-time Streaming](/developer-guide/sdk-guide/python/websocket) - WebSocket for real-time streaming * [Discord Community](https://discord.com/invite/dF9Db2Tt3Y) - Get help from the community * [Support Email](mailto:support@fish.audio) - Contact our support team