Text to Speech
Convert text to speech
This endpoint only accepts application/json
and application/msgpack
.
For best results, upload reference audio using the create model before using this one. This improves speech quality and reduces latency.
To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the instructions.
Audio formats supported:
- WAV / PCM
- Sample Rate: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- 16-bit, mono
- MP3
- Sample Rate: 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- mono
- Bitrate: 64kbps, 128kbps (default), 192kbps
- Opus
- Sample Rate: 48kHz
- Default Sample Rate: 48kHz
- mono
- Bitrate: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Text to be converted to speech
Chunk length to be used for the speech
100 < x < 300
Format to be used for the speech
wav
, pcm
, mp3
, opus
Latency to be used for the speech, balanced will reduce the latency but may lead to performance degradation
normal
, balanced
MP3 Bitrate to be used for the speech
64
, 128
, 192
Whether to normalize the speech, this will reduce the latency but may reduce performance on numbers and dates
Opus Bitrate to be used for the speech
-1000
, 24
, 32
, 48
, 64
Prosody to be used for the speech
ID of the reference model o be used for the speech
References to be used for the speech, this requires MessagePack serialization, this will override reference_voices and reference_texts
Sample rate to be used for the speech