Skip to main content

Overview

Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.

Web Interface

The easiest way to create a voice model:
1

Go to Fish Audio

Visit fish.audio and log in
2

Navigate to Models

Click on “Models” in your dashboard
3

Click Create Model

Select “Create New Model”
4

Upload Your Audio

Add 1 or more voice samples (at least 10 seconds each)
5

Configure Settings

Choose privacy settings and training options
6

Start Training

Click “Create” and wait for processing

Using the API

Using the SDK

Create models with the Python or JavaScript SDK:
  • Python
  • JavaScript
First, install the SDK:
pip install fish-audio-sdk
Then create a model:
from fish_audio_sdk import Session

# Initialize session with your API key
session = Session("your_api_key")

# Create the model
model = session.create_model(
    title="My Voice Model",
    description="Custom voice for storytelling",
    voices=[
        voice_file1.read(),
        voice_file2.read()
    ],
    cover_image=image_file.read()  # Optional
)

print(f"Model created: {model.id}")

Direct API

Create models directly using the REST API:
  • Python
  • JavaScript
import requests

response = requests.post(
    "https://api.fish.audio/model",
    files=[
        ("voices", open("sample1.mp3", "rb")),
        ("voices", open("sample2.wav", "rb"))
    ],
    data=[
        ("title", "My Voice Model"),
        ("description", "Custom voice model"),
        ("visibility", "private"),
        ("type", "tts"),
        ("train_mode", "fast"),
        ("enhance_audio_quality", "true")
    ],
    headers={
        "Authorization": "Bearer YOUR_API_KEY"
    }
)

result = response.json()
print(f"Model ID: {result['id']}")

Model Settings

Required Parameters

ParameterDescriptionTypeOptions
titleName of your modelstringAny text
voicesAudio samplesArray<File>.mp3, .wav, .m4a, .opus
type*Model typeenum<string>tts
train_mode*Model train mode, fast means model instantly available after creationenum<string>fast
*Automatically set by Python and JavaScript SDKs

Optional Parameters

ParameterDescriptionTypeOptions
visibilityWho can use your modelenum<string>private, public, unlist
default: public
descriptionModel descriptionstringAny text
cover_imageModel cover image, required if the model is publicFile.jpg, .png
textsTranscripts of audio samplesArray<string>Must match number of audio files
tagsTags for your modelstring[]Any text
enhance_audio_qualityRemove background noisebooleantrue, false
default: false
For detailed explanations view our API reference.

Audio Requirements

Quality Guidelines

Minimum Requirements:
  • At least 1 audio sample
  • 10+ seconds per sample
Best Practices:
  • Use multiple diverse samples
  • 1 consistent speaker throughout
  • Include different emotions and tones
  • Record in a quiet environment
  • Maintain steady volume

Adding Transcripts

Including text transcripts improves model quality:
  • Python
  • JavaScript
response = requests.post(
    "https://api.fish.audio/model",
    files=[
        ("voices", open("hello.mp3", "rb")),
        ("voices", open("world.wav", "rb"))
    ],
    data=[
        ("title", "Enhanced Model"),
        ("texts", "Hello, this is my first recording."),
        ("texts", "Welcome to the world of AI voices."),
        # ... other parameters
    ],
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)
Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.

Using Your Model

Once training is complete:
  • Python
  • JavaScript
# Generate speech with your model
response = requests.post(
    "https://api.fish.audio/v1/tts",
    json={
        "text": "Hello from my custom voice!",
        "model_id": model_id,
        "format": "mp3"
    },
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

# Save the audio
with open("output.mp3", "wb") as f:
    f.write(response.content)

Troubleshooting

Common Issues

Model training fails:
  • Check audio quality and format
  • Ensure single speaker in all samples
  • Verify files are not corrupted
Poor voice quality:
  • Add more diverse audio samples
  • Enable audio enhancement
  • Use higher quality recording

Best Practices

  1. Start Simple: Begin with 2-3 samples in fast mode to test
  2. Iterate: Refine with more samples and quality mode
  3. Document: Keep track of which samples work best
  4. Test Thoroughly: Try different texts and emotions
  5. Privacy First: Keep personal models private

Support

Need help creating models?