POST
/
v1
/
tts

This endpoint only accepts application/json and application/msgpack.

For best results, upload reference audio using the create model before using this one. This improves speech quality and reduces latency.

To upload audio clips directly, without pre-uploading, serialize the request body with MessagePack as per the instructions.

Audio formats supported:

  • WAV / PCM (16-bit, 44100 Hz, mono)
  • MP3 (44100 Hz, mono)
  • Opus (48000 Hz, mono)

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

text
string
required

Text to be converted to speech

references
object[] | null

References to be used for the speech, this requires MessagePack serialization, this will override reference_voices and reference_texts

reference_id
string | null

ID of the reference model o be used for the speech

chunk_length
integer
default: 200

Chunk length to be used for the speech

Required range: 100 < x < 300
normalize
boolean
default: true

Whether to normalize the speech, this will reduce the latency but may reduce performance on numbers and dates

format
enum<string>
default: mp3

Format to be used for the speech

Available options:
wav,
pcm,
mp3,
opus
mp3_bitrate
enum<integer>
default: 128

MP3 Bitrate to be used for the speech

Available options:
64,
128,
192
opus_bitrate
enum<integer>
default: 32

Opus Bitrate to be used for the speech

Available options:
-1000,
24,
32,
48,
64
latency
enum<string>
default: normal

Latency to be used for the speech, balanced will reduce the latency but may lead to performance degradation

Available options:
normal,
balanced