Advanced control over speech generation
"normalize": false
in the request body. This ensures that the API doesn’t alter the intonation of control tags.
Playground: You can use V1.6 Control Model, without setting any other options.
<|phoneme_start|>
and <|phoneme_end|>
tags. Each tag should contain a single word or character.
<|phoneme_start|>EH N JH AH N IH R<|phoneme_end|>
.”
<|phoneme_start|>gong1<|phoneme_end|><|phoneme_start|>cheng2<|phoneme_end|><|phoneme_start|>shi1<|phoneme_end|>
。“
Effect | Description | First Available | Stage |
---|---|---|---|
(break) | Short pause | V1.6 | Experimental |
(long-break) | Extended pause | V1.6 | Experimental |
(breath) | Breathing sound | V1.6 | Experimental |
(laugh) | Laughter sound | V1.6 | Experimental |
(cough) | Coughing sound | V1.6 | Experimental |
(lip-smacking) | Lip smacking sound | V1.6 | Experimental |
(sigh) | Sighing sound | V1.6 | Experimental |
(laugh)
, (cough)
, (lip-smacking)
, and (sigh)
are developing. You may need to repeat them multiple times for better results.