Skip to main content
POST
/
v1
/
voice-design
Voice Design
curl --request POST \
  --url https://api.fish.audio/v1/voice-design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'model: voice-design-1' \
  --data '{
    "instruction": "Warm, confident studio narrator with a natural tone",
    "reference_text": "Welcome to Fish Audio.",
    "language": "en",
    "n": 2,
    "speed": 1,
    "num_step": 32,
    "guidance_scale": 2,
    "instruct_guidance_scale": 0,
    "seed": 42
  }'
{
  "candidates": [
    {
      "id": "<string>",
      "index": 1,
      "audio_base64": "<string>",
      "sample_rate": 123,
      "duration_ms": 1,
      "text": "<string>",
      "instruct": "<string>",
      "language": "<string>"
    }
  ]
}
This endpoint only accepts application/json.You must include the model: voice-design-1 header. Extra request fields are rejected.
A successful request returns generated voice candidates with audio_base64 audio payloads. Decode the base64 value to write the candidate audio to a file.

Example

curl --request POST https://api.fish.audio/v1/voice-design \
  --header "Authorization: Bearer $FISH_API_KEY" \
  --header "Content-Type: application/json" \
  --header "model: voice-design-1" \
  --data '{
    "instruction": "Warm, confident studio narrator with a natural tone",
    "reference_text": "Welcome to Fish Audio.",
    "language": "en",
    "n": 2,
    "speed": 1,
    "num_step": 32,
    "guidance_scale": 2,
    "instruct_guidance_scale": 0,
    "seed": 42
  }'

Usage notes

  • instruction is required and must be 1 to 2000 characters.
  • reference_text is optional preview text and can be up to 300 characters.
  • n controls how many candidates are returned. The supported range is 1 to 4.
  • seed is optional and can help reproduce candidate generation.
  • The endpoint is stateless: it does not create batches, samples, voice models, or presigned URLs.
  • Billing happens once per successful generation request, not once per candidate.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

model
string
default:voice-design-1
required

Specify which voice-design model to use.

Allowed value: "voice-design-1"

Body

application/json

Request body for synchronous voice design generation. The endpoint returns generated voice candidates with base64-encoded audio.

instruction
string
required

Voice design prompt. Must contain 1 to 2000 characters.

Required string length: 1 - 2000
reference_text
string | null

Optional text used as reference content for the generated voice.

Maximum string length: 300
language
string | null

Optional BCP-47 language hint, such as en, zh, or ja.

n
integer
default:2

Number of voice candidates to generate.

Required range: 1 <= x <= 4
speed
number
default:1

Speaking speed multiplier for candidate generation.

Required range: x <= 3
num_step
integer
default:32

Number of diffusion steps used by the voice-design model.

Required range: 1 <= x <= 128
guidance_scale
number
default:2

Classifier-free guidance scale. Higher values follow the prompt more strongly.

Required range: x >= 0
instruct_guidance_scale
number
default:0

Instruction guidance scale for prompt conditioning.

Required range: x >= 0
seed
integer | null

Optional deterministic seed for candidate generation.

Response

Request fulfilled, document follows

candidates
VoiceDesignCandidate · object[]
required

Generated voice candidates.