Documentation Index
Fetch the complete documentation index at: https://docs.fish.audio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Japanese phoneme control uses OpenJTalk-style romaji phonemes plus pitch accent information. This is useful for Japanese homographs that have the same plain phoneme sequence but different pitch accents, such as端が, 箸が, and 橋が.
Format
Put the pitch level digit immediately after each vowel-bearing mora:0means the current mora is low.1means the current mora is high.Ncan also carry a pitch digit.- Consonants are written without spaces before the vowel they belong to, for example
ha,shi, andga. - Use OpenJTalk phoneme symbols such as
a,i,u,e,o,N,cl,ky,sh,ch, andts.
h a sh i g a, but the pitch markers disambiguate the word:
端が(end + subject marker):<|phoneme_start|>ha0shi1ga1<|phoneme_end|>箸が(chopsticks + subject marker):<|phoneme_start|>ha1shi0ga0<|phoneme_end|>橋が(bridge + subject marker):<|phoneme_start|>ha0shi1ga0<|phoneme_end|>
Japanese pitch accent depends on the dictionary, reading, and dialect.
Generate the phoneme string from the same text you send to TTS, then listen
and adjust the digits when you need a specific accent.
Relation to ttslearn Prosody Symbols
The ttslearn Japanese Tacotron recipe shows how to extract phonemes and prosody symbols from OpenJTalk full-context labels. That recipe prints symbols such as[ for a pitch rise and ] for a pitch fall.
Fish Audio phoneme tags should not contain literal [ or ]. Convert that prosody into digit notation, such as ha0shi1ga0.
Generate Japanese Phonemes
You can generate Japanese phoneme strings withpyopenjtalk. The converter below follows the same full-context label logic used in training:
Processing Longer Text
For long Japanese text, split on punctuation and tag short Japanese runs instead of wrapping an entire paragraph. The training augmentation used short segments and skipped empty or very long spans. Good:% to パーセント before extracting phonemes.



