Skip to main content
This guide is for advanced users who want to self-host Fish Audio models. For most users, we recommend using the Fish Audio API for easier integration and automatic updates.

Prerequisites

Before you begin, ensure you have:
  • GPU: 12GB VRAM minimum (for inference)
  • OS: Linux or WSL (Windows Subsystem for Linux)
  • System dependencies: Audio processing libraries
Install required system packages:
apt install portaudio19-dev libsox-dev ffmpeg

Installation Methods

Fish Audio supports multiple installation methods. Choose the one that best fits your development environment.

Conda Installation

Conda provides a stable, isolated Python environment:
# Create a new environment with Python 3.12
conda create -n fish-speech python=3.12
conda activate fish-speech

# GPU installation (choose your CUDA version: cu126, cu128, cu129)
pip install -e .[cu129]

# CPU-only installation (slower, not recommended for production)
pip install -e .[cpu]

# Default installation (uses PyTorch default index)
pip install -e .
For best performance, match your CUDA version with your GPU driver. Use nvidia-smi to check your CUDA version.

UV Installation

UV provides faster dependency resolution and installation:
# GPU installation (choose your CUDA version: cu126, cu128, cu129)
uv sync --python 3.12 --extra cu129

# CPU-only installation
uv sync --python 3.12 --extra cpu
UV is recommended for faster setup times, especially when working with large dependency trees.

Intel Arc XPU Support

For Intel Arc GPU users, install with XPU support:
# Create environment
conda create -n fish-speech python=3.12
conda activate fish-speech

# Install required C++ standard library
conda install libstdcxx -c conda-forge

# Install PyTorch with Intel XPU support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu

# Install Fish Speech
pip install -e .
The --compile optimization flag is not supported on Windows and macOS. To use compile acceleration, you need to install Triton manually.

Repository Setup

Clone the Fish Speech repository to get started:
git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
Then follow one of the installation methods above.

Next Steps

Once installation is complete, you can:

Hardware Recommendations

For optimal performance:
Use CaseRecommended GPUVRAMExpected Speed
DevelopmentRTX 306012GB~1:15 real-time factor
ProductionRTX 409024GB~1:7 real-time factor
EnterpriseA10040GB+~1:5 real-time factor
Real-time factor indicates how much faster than real-time the model can generate audio. For example, 1:7 means generating 1 minute of audio takes ~8.5 seconds.

Troubleshooting

CUDA Out of Memory

If you encounter CUDA out of memory errors:
  1. Reduce batch size in inference settings
  2. Use --half flag for FP16 inference
  3. Close other GPU-intensive applications

Package Installation Errors

If you encounter dependency conflicts:
  1. Try using UV instead of pip for better dependency resolution
  2. Create a fresh conda environment
  3. Ensure you’re using Python 3.12 (other versions may have compatibility issues)

Community Support

Need help with local setup?