Overview
Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.
Web Interface
The easiest way to create a voice model:
Navigate to Models
Click on “Models” in your dashboard
Click Create Model
Select “Create New Model”
Upload Your Audio
Add 1 or more voice samples (at least 10 seconds each)
Configure Settings
Choose privacy settings and training options
Start Training
Click “Create” and wait for processing
Using the API
Using the SDK
Create models with the Python or JavaScript SDK:
First, install the SDK:pip install fish-audio-sdk
Then create a model:from fish_audio_sdk import Session
# Initialize session with your API key
session = Session("your_api_key")
# Create the model
model = session.create_model(
title="My Voice Model",
description="Custom voice for storytelling",
voices=[
voice_file1.read(),
voice_file2.read()
],
cover_image=image_file.read() # Optional
)
print(f"Model created: {model.id}")
First, install the SDK:Then create a model:import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
const title = "My Voice Model";
const audioFile1 = createReadStream("sample1.mp3");
// Optionally add more samples:
// const audioFile2 = createReadStream("sample2.wav");
const coverImageFile = createReadStream("cover.png"); // optional
try {
const response = await fishAudio.voices.ivc.create({
title,
voices: [audioFile1],
cover_image: coverImageFile,
description: "Custom voice for storytelling",
visibility: "private",
});
console.log("Voice created:", {
id: response._id,
title: response.title,
state: response.state,
});
} catch (err) {
console.error("Create voice request failed:", err);
}
Direct API
Create models directly using the REST API:
import requests
response = requests.post(
"https://api.fish.audio/model",
files=[
("voices", open("sample1.mp3", "rb")),
("voices", open("sample2.wav", "rb"))
],
data=[
("title", "My Voice Model"),
("description", "Custom voice model"),
("visibility", "private"),
("type", "tts"),
("train_mode", "fast"),
("enhance_audio_quality", "true")
],
headers={
"Authorization": "Bearer YOUR_API_KEY"
}
)
result = response.json()
print(f"Model ID: {result['id']}")
import { readFile } from "fs/promises";
const form = new FormData();
form.append("title", "My Voice Model");
form.append("description", "Custom voice model");
form.append("visibility", "private");
form.append("type", "tts");
form.append("train_mode", "fast");
form.append("enhance_audio_quality", "true");
const v1 = await readFile("sample1.mp3");
const v2 = await readFile("sample2.wav");
form.append("voices", new File([v1], "sample1.mp3"));
form.append("voices", new File([v2], "sample2.wav"));
const res = await fetch("https://api.fish.audio/model", {
method: "POST",
headers: { Authorization: "Bearer <YOUR_API_KEY>" },
body: form,
});
const result = await res.json();
console.log("Model ID:", result.id);
Model Settings
Required Parameters
| Parameter | Description | Type | Options |
|---|
| title | Name of your model | string | Any text |
| voices | Audio samples | Array<File> | .mp3, .wav, .m4a, .opus |
| type* | Model type | enum<string> | tts |
| train_mode* | Model train mode, fast means model instantly available after creation | enum<string> | fast |
*Automatically set by Python and JavaScript SDKs
Optional Parameters
| Parameter | Description | Type | Options |
|---|
| visibility | Who can use your model | enum<string> | private, public, unlist
default: public |
| description | Model description | string | Any text |
| cover_image | Model cover image, required if the model is public | File | .jpg, .png |
| texts | Transcripts of audio samples | Array<string> | Must match number of audio files |
| tags | Tags for your model | string[] | Any text |
| enhance_audio_quality | Remove background noise | boolean | true, false
default: false |
For detailed explanations view our API reference.
Audio Requirements
Quality Guidelines
Minimum Requirements:
- At least 1 audio sample
- 10+ seconds per sample
Best Practices:
- Use multiple diverse samples
- 1 consistent speaker throughout
- Include different emotions and tones
- Record in a quiet environment
- Maintain steady volume
Adding Transcripts
Including text transcripts improves model quality:
response = requests.post(
"https://api.fish.audio/model",
files=[
("voices", open("hello.mp3", "rb")),
("voices", open("world.wav", "rb"))
],
data=[
("title", "Enhanced Model"),
("texts", "Hello, this is my first recording."),
("texts", "Welcome to the world of AI voices."),
# ... other parameters
],
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
import { FishAudioClient } from "fish-audio";
import { createReadStream } from "fs";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
const response = await fishAudio.voices.ivc.create({
title: "Enhanced Model",
voices: [
createReadStream("hello.mp3"),
createReadStream("world.wav"),
],
texts: [
"Hello, this is my first recording.",
"Welcome to the world of AI voices.",
],
// other optional fields:
// visibility: "private",
// enhance_audio_quality: true,
});
console.log("Model ID:", response._id);
Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.
Using Your Model
Once training is complete:
# Generate speech with your model
response = requests.post(
"https://api.fish.audio/v1/tts",
json={
"text": "Hello from my custom voice!",
"model_id": model_id,
"format": "mp3"
},
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
# Save the audio
with open("output.mp3", "wb") as f:
f.write(response.content)
import { FishAudioClient } from "fish-audio";
import { writeFile } from "fs/promises";
const fishAudio = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
const audio = await fishAudio.textToSpeech.convert({
text: "Hello from my custom voice!",
model_id: "your_model_id_here",
format: "mp3",
});
const buffer = Buffer.from(await new Response(audio).arrayBuffer());
await writeFile("output.mp3", buffer);
console.log("✓ Audio saved to output.mp3");
Troubleshooting
Common Issues
Model training fails:
- Check audio quality and format
- Ensure single speaker in all samples
- Verify files are not corrupted
Poor voice quality:
- Add more diverse audio samples
- Enable audio enhancement
- Use higher quality recording
Best Practices
- Start Simple: Begin with 2-3 samples in fast mode to test
- Iterate: Refine with more samples and quality mode
- Document: Keep track of which samples work best
- Test Thoroughly: Try different texts and emotions
- Privacy First: Keep personal models private
Support
Need help creating models?