Voice Design
Generate candidate voices from a prompt
audio_base64
audio payloads. Decode the base64 value to write the candidate audio to a
file.Example
Usage notes
instructionis required and must be 1 to 2000 characters.reference_textis optional preview text and can be up to 300 characters.ncontrols how many candidates are returned. The supported range is 1 to 4.seedis optional and can help reproduce candidate generation.- The endpoint is stateless: it does not create batches, samples, voice models, or presigned URLs.
- Billing happens once per successful generation request, not once per candidate.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
Specify which voice-design model to use.
"voice-design-1"Body
Request body for synchronous voice design generation. The endpoint returns generated voice candidates with base64-encoded audio.
Voice design prompt. Must contain 1 to 2000 characters.
1 - 2000Optional text used as reference content for the generated voice.
300Optional BCP-47 language hint, such as en, zh, or ja.
Number of voice candidates to generate.
1 <= x <= 4Speaking speed multiplier for candidate generation.
x <= 3Number of diffusion steps used by the voice-design model.
1 <= x <= 128Classifier-free guidance scale. Higher values follow the prompt more strongly.
x >= 0Instruction guidance scale for prompt conditioning.
x >= 0Optional deterministic seed for candidate generation.
Response
Request fulfilled, document follows
Generated voice candidates.

