Creating Voice Models

Overview

Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.

Web Interface

The easiest way to create a voice model:

Go to Fish Audio

Visit fish.audio and log in

Navigate to Models

Click on “Models” in your dashboard

Click Create Model

Select “Create New Model”

Upload Your Audio

Add 1 or more voice samples (at least 10 seconds each)

Configure Settings

Choose privacy settings and training options

Start Training

Click “Create” and wait for processing

Using the API

Using the SDK

Create models with the Python or JavaScript SDK:

Python
JavaScript

First, install the SDK:

pip install fish-audio-sdk

Then create a model:

from fish_audio_sdk import Session

# Initialize session with your API key
session = Session("your_api_key")

# Create the model
model = session.create_model(
    title="My Voice Model",
    description="Custom voice for storytelling",
    voices=[
        voice_file1.read(),
        voice_file2.read()
    ],
    cover_image=image_file.read()  # Optional
)

print(f"Model created: {model.id}")

Direct API

Create models directly using the REST API:

Python
JavaScript

import requests

response = requests.post(
    "https://api.fish.audio/model",
    files=[
        ("voices", open("sample1.mp3", "rb")),
        ("voices", open("sample2.wav", "rb"))
    ],
    data=[
        ("title", "My Voice Model"),
        ("description", "Custom voice model"),
        ("visibility", "private"),
        ("type", "tts"),
        ("train_mode", "fast"),
        ("enhance_audio_quality", "true")
    ],
    headers={
        "Authorization": "Bearer YOUR_API_KEY"
    }
)

result = response.json()
print(f"Model ID: {result['id']}")

Model Settings

Required Parameters

Parameter	Description	Type	Options
title	Name of your model	`string`	Any text
voices	Audio samples	`Array<File>`	.mp3, .wav, .m4a, .opus
type*	Model type	`enum<string>`	`tts`
train_mode*	Model train mode, fast means model instantly available after creation	`enum<string>`	`fast`

*Automatically set by Python and JavaScript SDKs

Optional Parameters

Parameter	Description	Type	Options
visibility	Who can use your model	`enum<string>`	`private`, `public`, `unlist` `default: public`
description	Model description	`string`	Any text
cover_image	Model cover image, required if the model is public	`File`	.jpg, .png
texts	Transcripts of audio samples	`Array<string>`	Must match number of audio files
tags	Tags for your model	`string[]`	Any text
enhance_audio_quality	Remove background noise	`boolean`	`true`, `false` `default: false`

For detailed explanations view our API reference.

Audio Requirements

Quality Guidelines

Minimum Requirements:

At least 1 audio sample
10+ seconds per sample

Best Practices:

Use multiple diverse samples
1 consistent speaker throughout
Include different emotions and tones
Record in a quiet environment
Maintain steady volume

Adding Transcripts

Including text transcripts improves model quality:

Python
JavaScript

response = requests.post(
    "https://api.fish.audio/model",
    files=[
        ("voices", open("hello.mp3", "rb")),
        ("voices", open("world.wav", "rb"))
    ],
    data=[
        ("title", "Enhanced Model"),
        ("texts", "Hello, this is my first recording."),
        ("texts", "Welcome to the world of AI voices."),
        # ... other parameters
    ],
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.

Using Your Model

Once training is complete:

Python
JavaScript

# Generate speech with your model
response = requests.post(
    "https://api.fish.audio/v1/tts",
    json={
        "text": "Hello from my custom voice!",
        "model_id": model_id,
        "format": "mp3"
    },
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

# Save the audio
with open("output.mp3", "wb") as f:
    f.write(response.content)

Troubleshooting

Common Issues

Model training fails:

Check audio quality and format
Ensure single speaker in all samples
Verify files are not corrupted

Poor voice quality:

Add more diverse audio samples
Enable audio enhancement
Use higher quality recording

Best Practices

Start Simple: Begin with 2-3 samples in fast mode to test
Iterate: Refine with more samples and quality mode
Document: Keep track of which samples work best
Test Thoroughly: Try different texts and emotions
Privacy First: Keep personal models private

Support

Need help creating models?

API Documentation: Full API Reference
Discord Community: Join our Discord
Email Support: support@fish.audio

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Tutorials

Resources

Creating Voice Models

Overview

Web Interface

Using the API

Using the SDK

Direct API

Model Settings

Required Parameters

Optional Parameters

Audio Requirements

Quality Guidelines

Adding Transcripts

Using Your Model

Troubleshooting

Common Issues

Best Practices

Support

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Tutorials

Resources

​Overview

​Web Interface

​Using the API

​Using the SDK

​Direct API

​Model Settings

​Required Parameters

​Optional Parameters

​Audio Requirements

​Quality Guidelines

​Adding Transcripts

​Using Your Model

​Troubleshooting

​Common Issues

​Best Practices

​Support

Overview

Web Interface

Using the API

Using the SDK

Direct API

Model Settings

Required Parameters

Optional Parameters

Audio Requirements

Quality Guidelines

Adding Transcripts

Using Your Model

Troubleshooting

Common Issues

Best Practices

Support