Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fish.audio/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.

Web Interface

The easiest way to create a voice model:
1

Go to Fish Audio

Visit fish.audio and log in
2

Navigate to Models

Click on “Models” in your dashboard
3

Click Create Model

Select “Create New Model”
4

Upload Your Audio

Add 1 or more voice samples (at least 10 seconds each)
5

Configure Settings

Choose privacy settings and training options
6

Start Training

Click “Create” and wait for processing

Using the API

Using the SDK

Create models with the Python or JavaScript SDK:
First, install the SDK:
pip install fish-audio-sdk
Then create a model:
from fishaudio import FishAudio

client = FishAudio(api_key="your_api_key_here")

with open("sample1.mp3", "rb") as f1, open("sample2.wav", "rb") as f2:
    voice = client.voices.create(
        title="My Voice Model",
        voices=[f1.read(), f2.read()],
        description="Custom voice for storytelling",
        visibility="private",
        enhance_audio_quality=True,
    )

# The Python SDK maps the REST `_id` field to `voice.id`.
print(f"Voice model ID: {voice.id}")

Direct API

Create models directly using the REST API:
The REST API accepts uploaded audio as multipart/form-data. Let your HTTP client set the multipart Content-Type boundary for you.
curl --request POST "https://api.fish.audio/model" \
  --header "Authorization: Bearer $FISH_API_KEY" \
  --form "type=tts" \
  --form "train_mode=fast" \
  --form "title=My Voice Model" \
  --form "visibility=private" \
  --form "description=Custom voice model" \
  --form "voices=@sample1.mp3" \
  --form "voices=@sample2.wav" \
  --form "enhance_audio_quality=true"

Model Settings

Required Parameters

ParameterDescriptionTypeOptions
titleName of your modelstringAny text
voicesOne or more audio samplesFile or Array<File>.mp3, .wav, .m4a, .opus
type*Model typeenum<string>tts
train_mode*Model train mode. fast means the model is instantly available after creationenum<string>fast
*Automatically set by Python and JavaScript SDKs

Optional Parameters

ParameterDescriptionTypeOptions
visibilityWho can use your modelenum<string>private, public, unlist
default: public
descriptionModel descriptionstring or nullAny text
cover_imageModel cover image, required if the model is publicFile.jpg, .png
textsTranscripts of audio samples. If omitted, ASR transcribes the audiostring or Array<string>Must match number of audio files
tagsTags for your modelstring or Array<string>Any text
enhance_audio_qualityRemove background noise and normalize audiobooleantrue, false
default: true
generate_sampleGenerate a default sample text for the modelbooleantrue, false
default: false
The REST API defaults visibility to public. The SDK examples above set visibility to private, which is safer for personal voice models and avoids requiring a public cover_image.
For detailed explanations view our API reference.

Audio Requirements

Quality Guidelines

Minimum Requirements:
  • At least 1 audio sample
  • 10+ seconds per sample
Best Practices:
  • Use multiple diverse samples
  • 1 consistent speaker throughout
  • Include different emotions and tones
  • Record in a quiet environment
  • Maintain steady volume

Adding Transcripts

Including text transcripts improves model quality:
import requests

with open("hello.mp3", "rb") as f1, open("world.wav", "rb") as f2:
    response = requests.post(
        "https://api.fish.audio/model",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files=[
            ("voices", f1),
            ("voices", f2),
        ],
        data=[
            ("type", "tts"),
            ("train_mode", "fast"),
            ("title", "Enhanced Model"),
            ("texts", "Hello, this is my first recording."),
            ("texts", "Welcome to the world of AI voices."),
            ("visibility", "private"),
        ],
    )

response.raise_for_status()
print(response.json()["_id"])
Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.

Using Your Model

Once training is complete: Use the SDK voice.id or the REST response _id as the TTS reference_id.
from fishaudio import FishAudio
from fishaudio.utils import save

client = FishAudio(api_key="your_api_key_here")

audio = client.tts.convert(
    text="Hello from my custom voice!",
    reference_id="your_voice_model_id",
    format="mp3",
)

save(audio, "output.mp3")

Troubleshooting

Common Issues

Model training fails:
  • Check audio quality and format
  • Ensure single speaker in all samples
  • Verify files are not corrupted
  • Confirm REST requests include type=tts, train_mode=fast, title, and at least one voices file
  • If texts are provided, make sure the count matches the number of voices files
Poor voice quality:
  • Add more diverse audio samples
  • Enable audio enhancement
  • Use higher quality recording
Public model creation fails:
  • Add a cover_image, or set visibility to private or unlist
Cannot use the created voice in TTS:
  • Use REST _id or SDK voice.id as the TTS reference_id
  • If the model state is not trained, check it with Get Model

Best Practices

  1. Start Simple: Begin with 2-3 samples in fast mode to test
  2. Iterate: Refine with cleaner samples, transcripts, and audio enhancement
  3. Document: Keep track of which samples work best
  4. Test Thoroughly: Try different texts and emotions
  5. Privacy First: Keep personal models private

Support

Need help creating models?