Fine-grained Control
Advanced control over speech generation
Fine-grained control is currently in beta. The API may be unstable and is subject to change.
Getting Started
To use fine-grained control, you can use either our SDK or API. We recommend disabling normalization by setting "normalize": false
in the request body. This ensures that the API doesn’t alter the intonation of control tags.
Disabling normalization may reduce the stability of reading numbers, dates, and URLs. You’ll need to handle these cases manually for best results.
Phoneme Control
Phoneme control allows you to specify exact pronunciations for words or characters. Currently, we support:
- CMU Arpabet (for English)
- Pinyin (for Chinese)
To use phoneme control, wrap the desired pronunciation in <|phoneme_start|>
and <|phoneme_end|>
tags. Each tag should contain a single word or character.
English Example
Standard: “I am an engineer.”
With phoneme control: “I am an <|phoneme_start|>EH N JH AH N IH R<|phoneme_end|>
.”
Chinese Example
Standard: “我是一个工程师。“
With phoneme control: “我是一个<|phoneme_start|>gong1<|phoneme_end|><|phoneme_start|>cheng2<|phoneme_end|><|phoneme_start|>shi1<|phoneme_end|>
。”