Documentation Index
Fetch the complete documentation index at: https://docs.fish.audio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
English phoneme control uses CMU Arpabet, the pronunciation format used by CMUdict.
Wrap the pronunciation for one word in <|phoneme_start|> and <|phoneme_end|>, and keep surrounding punctuation outside the tag.
I am an <|phoneme_start|>EH1 N JH AH0 N IH1 R<|phoneme_end|>.
IPA is not supported for English phoneme tags. Convert IPA pronunciations to
CMU Arpabet before using phoneme control.
CMU Arpabet
CMU Arpabet is written as space-separated uppercase symbols. Vowels can include stress digits:
0 for unstressed vowels.
1 for primary stress.
2 for secondary stress.
For the full symbol inventory, see the CMUdict cmudict.symbols list. You can also look up words on the CMU Pronouncing Dictionary page.
Example:
Standard: I am an engineer.
With phoneme control: I am an <|phoneme_start|>EH1 N JH AH0 N IH1 R<|phoneme_end|>.
You can omit stress digits when you only need a rough pronunciation, but CMUdict-style output with stress digits usually gives the model the clearest signal.
Common Examples
Use phoneme control when spelling alone is ambiguous:
The <|phoneme_start|>R IY1 D<|phoneme_end|> endpoint returns the current state.
The book was <|phoneme_start|>R EH1 D<|phoneme_end|> yesterday.
The <|phoneme_start|>B EY1 S<|phoneme_end|> line is too loud.
The <|phoneme_start|>B AE1 S<|phoneme_end|> swam upstream.
The <|phoneme_start|>P OW1 L IH0 SH<|phoneme_end|> team joined the call.
Please <|phoneme_start|>P AA1 L IH0 SH<|phoneme_end|> the final mix.
Use it for product names, acronyms, and technical terms:
Deploy with <|phoneme_start|>K UW2 B ER0 N EH1 T IY0 Z<|phoneme_end|>.
The query uses <|phoneme_start|>EH1 S K Y UW1 EH1 L<|phoneme_end|>.
Generate CMU Arpabet
The training pipeline uses CMUdict-style pronunciations. You can generate the same format with the cmudict package:
import cmudict
entries = cmudict.dict()
def cmu_pronunciation(word: str) -> str | None:
phones = entries.get(word.lower())
if not phones:
return None
return " ".join(phones[0])
print(cmu_pronunciation("engineer"))
# EH1 N JH AH0 N IH1 R
CMUdict may contain multiple pronunciations for the same word. Listen to the result and choose the variant that matches your intended accent or context.
Practical Tips
- Replace only the word whose pronunciation needs control.
- Strip punctuation before dictionary lookup, then place punctuation after the tag.
- Use CMU Arpabet for English phoneme tags.
- For names and brands, write the pronunciation that you want the listener to hear, not necessarily the spelling.