Agent Quickstart - Fish Audio

Purpose

This page is the recommended starting point for AI agents, RAG pipelines, and documentation crawlers that need accurate Fish Audio references with minimal markup noise.

Built-In Agent Indexes

This documentation site already provides built-in LLM-friendly indexes:

llms.txt for the curated documentation index
llms-full.txt for broader site context

In most cases, agents should read llms.txt first and only fetch llms-full.txt when they need wider context across the whole documentation set.

Install the Agent Skill

For coding agents that support Agent Skills (Claude Code, Cursor, Windsurf, Codex, and others), install the ready-made raw-API skill with a single command:

npx skills add https://docs.fish.audio --skill fish-audio-api

The skill teaches the agent how to call the Fish Audio REST and WebSocket APIs directly from curl, Python, Node.js, or any HTTP client — no SDK required. It covers authentication, every endpoint in our OpenAPI schema, MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue, and the WebSocket streaming protocol. Discovery endpoint: /.well-known/agent-skills/index.json. Run npx skills add https://docs.fish.audio (without --skill) to install every skill published here, including the auto-generated product overview skill.

Retrieval Order

Read llms.txt for the curated documentation index.
Read llms-full.txt when broad site context is needed.
Read OpenAPI for REST schemas, parameters, and examples.
Read AsyncAPI for the WebSocket streaming protocol.
Fetch individual .md pages only after narrowing to a specific task.

Canonical API Facts

Base API URL: https://api.fish.audio
Authentication: Authorization: Bearer <FISH_API_KEY>
TTS model selection: send a required model header. Recommended default: s2-pro
Main REST endpoints:
- POST /v1/tts
- POST /v1/asr
- GET /model
- POST /model
- GET /model/{id}
- PATCH /model/{id}
- DELETE /model/{id}
Real-time streaming endpoint: wss://api.fish.audio/v1/tts/live

High-Value URLs

Start Here

API Specs

Authentication And SDK Setup

Core Product Tasks

Real-Time And Integrations

Models, Pricing, And Lifecycle

Task Routing

If the task is “generate speech”, start with Quick Start, the Text to Speech guide, and POST /v1/tts.
If the task is “transcribe audio”, start with the Speech to Text guide and POST /v1/asr.
If the task is “clone or manage voices”, start with Creating Voice Models and the /model endpoints.
If the task is “stream audio in real time”, start with AsyncAPI, WebSocket TTS Streaming, and the WebSocket SDK guides.
If the task is “pick the right model or estimate cost”, start with Models Overview and Pricing And Rate Limits.

Notes For Agents

Prefer openapi.json and asyncapi.yml for machine-readable schemas.
Prefer .md URLs when you need a single human-authored page in Markdown form.
Some richer pages use interactive MDX widgets. If a fetched page contains UI or component noise, fall back to this page, llms.txt, llms-full.txt, or the API spec files first.
Treat this page as the canonical low-noise entry point for Fish Audio documentation retrieval.

Documentation Index

​Purpose

​Built-In Agent Indexes

​Install the Agent Skill

​Retrieval Order

​Canonical API Facts

​High-Value URLs

​Start Here

​API Specs

​Authentication And SDK Setup

​Core Product Tasks

​Real-Time And Integrations

​Models, Pricing, And Lifecycle

​Task Routing

​Notes For Agents