application/json and application/msgpack.For best results, upload reference audio using the create model before using this one. This improves speech quality and reduces latency.To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the instructions.- WAV / PCM
- Sample Rate: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- 16-bit, mono
- MP3
- Sample Rate: 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- mono
- Bitrate: 64kbps, 128kbps (default), 192kbps
- Opus
- Sample Rate: 48kHz
- Default Sample Rate: 48kHz
- mono
- Bitrate: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
Specify which TTS model to use. We recommend s1
s1, speech-1.6, speech-1.5 Body
Text to be converted to speech
Controls randomness in the speech generation. Higher values (e.g., 1.0) make the output more random, while lower values (e.g., 0.1) make it more deterministic. We recommend 0.9 for s1 model
0 <= x <= 1Controls diversity via nucleus sampling. Lower values (e.g., 0.1) make the output more focused, while higher values (e.g., 1.0) allow more diversity. We recommend 0.9 for s1 model
0 <= x <= 1References to be used for the speech, this requires MessagePack serialization, this will override reference_voices and reference_texts
ID of the reference model o be used for the speech
Prosody to be used for the speech
Chunk length to be used for the speech
100 <= x <= 300Whether to normalize the speech, this will reduce the latency but may reduce performance on numbers and dates
Format to be used for the speech
wav, pcm, mp3, opus Sample rate to be used for the speech
MP3 Bitrate to be used for the speech
64, 128, 192 Opus Bitrate to be used for the speech
-1000, 24, 32, 48, 64 Latency to be used for the speech, balanced will reduce the latency but may lead to performance degradation
normal, balanced Response
Request fulfilled, document follows

