Overview
Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.Web Interface
The easiest way to create a voice model:1
Go to Fish Audio
Visit fish.audio and log in
2
Navigate to Models
Click on “Models” in your dashboard
3
Click Create Model
Select “Create New Model”
4
Upload Your Audio
Add 2 or more voice samples (30-45 seconds each)
5
Configure Settings
Choose privacy settings and training options
6
Start Training
Click “Create” and wait for processing
Using the API
Python SDK
First, install the SDK:Direct API
Create models directly using the REST API:Model Settings
Required Parameters
Parameter | Description | Options |
---|---|---|
title | Name of your model | Any text |
voices | Audio samples (minimum 2) | MP3, WAV, M4A files |
Optional Parameters
Parameter | Description | Options |
---|---|---|
description | Model description | Any text |
visibility | Who can use your model | private , public |
train_mode | Training speed/quality | fast , balanced , quality |
enhance_audio_quality | Remove background noise | true , false |
texts | Transcripts of audio samples | Must match number of audio files |
cover_image | Model thumbnail | JPG, PNG image |
Audio Requirements
Quality Guidelines
Minimum Requirements:- At least 2 audio samples
- 30-45 seconds per sample
- Single speaker only
- Consistent voice throughout
- Use 3-5 diverse samples
- Include different emotions and tones
- Record in a quiet environment
- Maintain steady volume
Training Modes
Fast Mode
- Training Time: 5-10 minutes
- Best For: Quick prototypes
- Quality: Good for basic use
Balanced Mode
- Training Time: 15-30 minutes
- Best For: Most use cases
- Quality: Excellent balance
Quality Mode
- Training Time: 45-60 minutes
- Best For: Professional projects
- Quality: Maximum accuracy
Adding Transcripts
Including text transcripts improves model quality:Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.
Checking Model Status
After creating a model, check its training status:- pending: In queue for training
- training: Currently being processed
- completed: Ready to use
- failed: Training unsuccessful
Using Your Model
Once training is complete:Troubleshooting
Common Issues
Model training fails:- Check audio quality and format
- Ensure single speaker in all samples
- Verify files are not corrupted
- Add more diverse audio samples
- Enable audio enhancement
- Use higher quality training mode
- Normal for quality mode
- Check API status page for delays
- Consider using balanced mode
Best Practices
- Start Simple: Begin with 2-3 samples in fast mode to test
- Iterate: Refine with more samples and quality mode
- Document: Keep track of which samples work best
- Test Thoroughly: Try different texts and emotions
- Privacy First: Keep personal models private
Support
Need help creating models?- API Documentation: Full API Reference
- Discord Community: Join our Discord
- Email Support: support@fish.audio