Documentation Index
Fetch the complete documentation index at: https://docs.fish.audio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.
Web Interface
The easiest way to create a voice model:
Navigate to Models
Click on “Models” in your dashboard
Click Create Model
Select “Create New Model”
Upload Your Audio
Add 1 or more voice samples (at least 10 seconds each)
Configure Settings
Choose privacy settings and training options
Start Training
Click “Create” and wait for processing
Using the API
Using the SDK
Create models with the Python or JavaScript SDK:
First, install the SDK:pip install fish-audio-sdk
Then create a model:from fishaudio import FishAudio
client = FishAudio(api_key="your_api_key_here")
with open("sample1.mp3", "rb") as f1, open("sample2.wav", "rb") as f2:
voice = client.voices.create(
title="My Voice Model",
voices=[f1.read(), f2.read()],
description="Custom voice for storytelling",
visibility="private",
enhance_audio_quality=True,
)
# The Python SDK maps the REST `_id` field to `voice.id`.
print(f"Voice model ID: {voice.id}")
First, install the SDK:Then create a model:import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
try {
const response = await fishAudio.voices.ivc.create({
title: "My Voice Model",
voices: [
createReadStream("sample1.mp3"),
createReadStream("sample2.wav"),
],
description: "Custom voice for storytelling",
visibility: "private",
enhance_audio_quality: true,
});
console.log("Voice created:", {
_id: response._id,
title: response.title,
state: response.state,
});
} catch (err) {
console.error("Create voice request failed:", err);
}
Direct API
Create models directly using the REST API:
The REST API accepts uploaded audio as multipart/form-data. Let your HTTP
client set the multipart Content-Type boundary for you.
curl --request POST "https://api.fish.audio/model" \
--header "Authorization: Bearer $FISH_API_KEY" \
--form "type=tts" \
--form "train_mode=fast" \
--form "title=My Voice Model" \
--form "visibility=private" \
--form "description=Custom voice model" \
--form "voices=@sample1.mp3" \
--form "voices=@sample2.wav" \
--form "enhance_audio_quality=true"
import requests
with open("sample1.mp3", "rb") as f1, open("sample2.wav", "rb") as f2:
response = requests.post(
"https://api.fish.audio/model",
headers={"Authorization": "Bearer YOUR_API_KEY"},
data=[
("type", "tts"),
("train_mode", "fast"),
("title", "My Voice Model"),
("description", "Custom voice model"),
("visibility", "private"),
("enhance_audio_quality", "true"),
],
files=[
("voices", f1),
("voices", f2),
],
)
response.raise_for_status()
result = response.json()
print(f"Model ID: {result['_id']}")
print(f"State: {result['state']}")
import { readFile } from "fs/promises";
const form = new FormData();
form.append("title", "My Voice Model");
form.append("description", "Custom voice model");
form.append("visibility", "private");
form.append("type", "tts");
form.append("train_mode", "fast");
form.append("enhance_audio_quality", "true");
const v1 = await readFile("sample1.mp3");
const v2 = await readFile("sample2.wav");
form.append("voices", new Blob([v1]), "sample1.mp3");
form.append("voices", new Blob([v2]), "sample2.wav");
const res = await fetch("https://api.fish.audio/model", {
method: "POST",
headers: { Authorization: "Bearer <YOUR_API_KEY>" },
body: form,
});
if (!res.ok) throw new Error(await res.text());
const result = await res.json();
console.log("Model ID:", result._id);
console.log("State:", result.state);
Model Settings
Required Parameters
| Parameter | Description | Type | Options |
|---|
| title | Name of your model | string | Any text |
| voices | One or more audio samples | File or Array<File> | .mp3, .wav, .m4a, .opus |
| type* | Model type | enum<string> | tts |
| train_mode* | Model train mode. fast means the model is instantly available after creation | enum<string> | fast |
*Automatically set by Python and JavaScript SDKs
Optional Parameters
| Parameter | Description | Type | Options |
|---|
| visibility | Who can use your model | enum<string> | private, public, unlist
default: public |
| description | Model description | string or null | Any text |
| cover_image | Model cover image, required if the model is public | File | .jpg, .png |
| texts | Transcripts of audio samples. If omitted, ASR transcribes the audio | string or Array<string> | Must match number of audio files |
| tags | Tags for your model | string or Array<string> | Any text |
| enhance_audio_quality | Remove background noise and normalize audio | boolean | true, false
default: true |
| generate_sample | Generate a default sample text for the model | boolean | true, false
default: false |
The REST API defaults visibility to public. The SDK examples above set
visibility to private, which is safer for personal voice models and avoids
requiring a public cover_image.
For detailed explanations view our API reference.
Audio Requirements
Quality Guidelines
Minimum Requirements:
- At least 1 audio sample
- 10+ seconds per sample
Best Practices:
- Use multiple diverse samples
- 1 consistent speaker throughout
- Include different emotions and tones
- Record in a quiet environment
- Maintain steady volume
Adding Transcripts
Including text transcripts improves model quality:
import requests
with open("hello.mp3", "rb") as f1, open("world.wav", "rb") as f2:
response = requests.post(
"https://api.fish.audio/model",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files=[
("voices", f1),
("voices", f2),
],
data=[
("type", "tts"),
("train_mode", "fast"),
("title", "Enhanced Model"),
("texts", "Hello, this is my first recording."),
("texts", "Welcome to the world of AI voices."),
("visibility", "private"),
],
)
response.raise_for_status()
print(response.json()["_id"])
import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
const response = await fishAudio.voices.ivc.create({
title: "Enhanced Model",
voices: [
createReadStream("hello.mp3"),
createReadStream("world.wav"),
],
texts: [
"Hello, this is my first recording.",
"Welcome to the world of AI voices.",
],
// other optional fields:
// visibility: "private",
// enhance_audio_quality: true,
});
console.log("Model ID:", response._id);
Text transcripts must match the exact number of audio files. If you provide 3
audio files, you must provide exactly 3 text transcripts.
Using Your Model
Once training is complete:
Use the SDK voice.id or the REST response _id as the TTS reference_id.
from fishaudio import FishAudio
from fishaudio.utils import save
client = FishAudio(api_key="your_api_key_here")
audio = client.tts.convert(
text="Hello from my custom voice!",
reference_id="your_voice_model_id",
format="mp3",
)
save(audio, "output.mp3")
import { FishAudioClient } from "fish-audio";
import { writeFile } from "fs/promises";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
const audio = await fishAudio.textToSpeech.convert({
text: "Hello from my custom voice!",
reference_id: "your_voice_model_id",
format: "mp3",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile("output.mp3", buffer);
console.log("✓ Audio saved to output.mp3");
Troubleshooting
Common Issues
Model training fails:
- Check audio quality and format
- Ensure single speaker in all samples
- Verify files are not corrupted
- Confirm REST requests include
type=tts, train_mode=fast, title, and at least one voices file
- If
texts are provided, make sure the count matches the number of voices files
Poor voice quality:
- Add more diverse audio samples
- Enable audio enhancement
- Use higher quality recording
Public model creation fails:
- Add a
cover_image, or set visibility to private or unlist
Cannot use the created voice in TTS:
- Use REST
_id or SDK voice.id as the TTS reference_id
- If the model state is not
trained, check it with Get Model
Best Practices
- Start Simple: Begin with 2-3 samples in fast mode to test
- Iterate: Refine with cleaner samples, transcripts, and audio enhancement
- Document: Keep track of which samples work best
- Test Thoroughly: Try different texts and emotions
- Privacy First: Keep personal models private
Support
Need help creating models?