Overview
Create custom voice models to generate consistent, high-quality speech. You can create models through our web interface or programmatically via API.Web Interface
The easiest way to create a voice model:1
Go to Fish Audio
Visit fish.audio and log in
2
Navigate to Models
Click on “Models” in your dashboard
3
Click Create Model
Select “Create New Model”
4
Upload Your Audio
Add 1 or more voice samples (at least 10 seconds each)
5
Configure Settings
Choose privacy settings and training options
6
Start Training
Click “Create” and wait for processing
Using the API
Using the SDK
Create models with the Python or JavaScript SDK:- Python
- JavaScript
First, install the SDK:Then create a model:
Direct API
Create models directly using the REST API:- Python
- JavaScript
Model Settings
Required Parameters
| Parameter | Description | Type | Options |
|---|---|---|---|
| title | Name of your model | string | Any text |
| voices | Audio samples | Array<File> | .mp3, .wav, .m4a, .opus |
| type* | Model type | enum<string> | tts |
| train_mode* | Model train mode, fast means model instantly available after creation | enum<string> | fast |
Optional Parameters
| Parameter | Description | Type | Options |
|---|---|---|---|
| visibility | Who can use your model | enum<string> | private, public, unlistdefault: public |
| description | Model description | string | Any text |
| cover_image | Model cover image, required if the model is public | File | .jpg, .png |
| texts | Transcripts of audio samples | Array<string> | Must match number of audio files |
| tags | Tags for your model | string[] | Any text |
| enhance_audio_quality | Remove background noise | boolean | true, falsedefault: false |
Audio Requirements
Quality Guidelines
Minimum Requirements:- At least 1 audio sample
- 10+ seconds per sample
- Use multiple diverse samples
- 1 consistent speaker throughout
- Include different emotions and tones
- Record in a quiet environment
- Maintain steady volume
Adding Transcripts
Including text transcripts improves model quality:- Python
- JavaScript
Text transcripts must match the exact number of audio files. If you provide 3 audio files, you must provide exactly 3 text transcripts.
Using Your Model
Once training is complete:- Python
- JavaScript
Troubleshooting
Common Issues
Model training fails:- Check audio quality and format
- Ensure single speaker in all samples
- Verify files are not corrupted
- Add more diverse audio samples
- Enable audio enhancement
- Use higher quality recording
Best Practices
- Start Simple: Begin with 2-3 samples in fast mode to test
- Iterate: Refine with more samples and quality mode
- Document: Keep track of which samples work best
- Test Thoroughly: Try different texts and emotions
- Privacy First: Keep personal models private
Support
Need help creating models?- API Documentation: Full API Reference
- Discord Community: Join our Discord
- Email Support: support@fish.audio

