Fine-grained control is currently in beta. The API may be unstable and is subject to change.

Getting Started

To use fine-grained control, you can use either our SDK or API. We recommend disabling normalization by setting "normalize": false in the request body. This ensures that the API doesn’t alter the intonation of control tags.

Disabling normalization may reduce the stability of reading numbers, dates, and URLs. You’ll need to handle these cases manually for best results.

Phoneme Control

Phoneme control allows you to specify exact pronunciations for words or characters. Currently, we support:

  • CMU Arpabet (for English)
  • Pinyin (for Chinese)

To use phoneme control, wrap the desired pronunciation in <|phoneme_start|> and <|phoneme_end|> tags. Each tag should contain a single word or character.

English Example

Standard: “I am an engineer.”
With phoneme control: “I am an <|phoneme_start|>EH N JH AH N IH R<|phoneme_end|>.”

Chinese Example

Standard: “我是一个工程师。“
With phoneme control: “我是一个<|phoneme_start|>gong1<|phoneme_end|><|phoneme_start|>cheng2<|phoneme_end|><|phoneme_start|>shi1<|phoneme_end|>。”