Core features
Text to Speech
Convert text into lifelike speech with the
s2.1-pro, s2-pro, and s1 models.Speech to Text
Transcribe audio to text with per-segment timestamps.
Voice Cloning
Clone a voice instantly from a clip, or train a persistent model.
Realtime Streaming
Stream audio as it generates — for voice agents and live apps.
Manage Voices
List, inspect, update, and delete your voice models.
Also in the web app
These run in the browser, no code required — see the Platform guide.Voice Changer
Transform existing audio into a different voice.
Story Studio
Produce multi-speaker, long-form audio — audiobooks and narration.
Music & Sound Effects
Generate music and cinematic sound effects from a prompt.
Audio Separation
Split audio into stems, and related processing utilities.
Models
These text-to-speech models power most capabilities:s2.1-pro— the recommended production model, with improved quality, latency, and throughput over S2-Pro.s2.1-pro-free— the same model at $0 for testing, prototyping, development, and smaller businesses, without TTFA or DPA guarantees.s2-pro— the previous-generation S2 model, with multi-speaker and natural-language expression control.s1— the previous generation, with(parenthesis)emotion tags.
Pick your path
Use the web app
No code — generate audio, clone voices, and produce projects in your browser.
Build with the SDK
The Python library for your application.
Call the API
Raw REST and WebSocket endpoints for any language.
Use your AI coding agent
Install the Fish Audio skill so your agent writes correct code.

