Overview

Transform any audio recording into text with Fish Audio’s speech recognition. Perfect for transcriptions, subtitles, and voice commands.

Getting Started

Web Interface

Transcribe audio instantly:
1

Visit Fish Audio

Go to fish.audio and log in
2

Navigate to Transcribe

Click on “Speech to Text” in your dashboard
3

Upload Audio

Select your audio file (MP3, WAV, M4A)
4

Get Transcription

Click “Transcribe” and copy your text

Supported Formats

Audio Files

Accepted formats:
  • MP3 (recommended)
  • WAV
  • M4A
  • OGG
  • FLAC
  • AAC
File requirements:
  • Maximum size: 100MB
  • Maximum duration: 60 minutes
  • Minimum duration: 1 second

Language Support

Automatic Detection

The system automatically detects the language spoken in your audio. No configuration needed!

Manual Selection

For better accuracy, specify the language: Major Languages:
  • English (en)
  • Chinese (zh)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Japanese (ja)
  • Korean (ko)
  • Portuguese (pt)
30+ Additional Languages including Arabic, Hindi, Russian, Italian, and more.

Audio Quality Tips

For Best Results

Recording Environment:
  • Quiet room with minimal echo
  • No background music
  • Clear, consistent speaking voice
  • One speaker at a time
Audio Settings:
  • Sample rate: 16kHz or higher
  • Bit rate: 128kbps or higher
  • Mono or stereo (mono preferred)

Common Issues

Poor transcription quality?
  • Remove background noise
  • Increase microphone volume
  • Speak clearly and not too fast
  • Avoid multiple speakers talking over each other

Use Cases

Meeting Transcription

Convert recorded meetings into searchable text:
  1. Record your meeting (Zoom, Teams, etc.)
  2. Export the audio file
  3. Upload to Fish Audio
  4. Get formatted transcription with timestamps

Podcast Transcripts

Create written versions of your podcasts:
  • Generate show notes automatically
  • Create searchable content
  • Improve accessibility
  • Enable translations

Video Subtitles

Generate subtitles for your videos:
  1. Extract audio from video
  2. Transcribe with Fish Audio
  3. Get timestamped text
  4. Import into video editor

Voice Notes

Convert voice memos to text:
  • Dictate ideas quickly
  • Transcribe later for editing
  • Search through voice notes
  • Share as text documents

Advanced Features

Timestamps

Get precise timing for each spoken segment:
[00:00:00] Welcome to our podcast.
[00:00:03] Today we're discussing AI technology.
[00:00:07] Let's dive right in.
Perfect for:
  • Creating subtitles
  • Navigating long recordings
  • Synchronizing with video
  • Building searchable archives

Speaker Detection

Identify different speakers in conversations:
Speaker 1: "What do you think about the proposal?"
Speaker 2: "I think it has potential."
Speaker 1: "Let's discuss the details."

Punctuation & Formatting

Automatic formatting includes:
  • Sentence capitalization
  • Punctuation marks
  • Paragraph breaks
  • Number formatting

Tips for Different Content

Interviews

Best practices:
  • Use a good microphone for each speaker
  • Record in a quiet environment
  • Speak one at a time
  • Keep consistent volume levels

Lectures & Presentations

Optimize for:
  • Clear articulation of technical terms
  • Pause between topics
  • Repeat important points
  • Avoid reading too fast

Phone Calls

Considerations:
  • Phone audio is lower quality
  • Expect slightly lower accuracy
  • Speak clearly and slowly
  • Avoid speakerphone if possible

Accuracy Expectations

What Affects Accuracy

Positive factors:
  • Clear audio quality
  • Native speaker accent
  • Common vocabulary
  • Single speaker
Challenging factors:
  • Heavy accents
  • Technical jargon
  • Multiple speakers
  • Background noise

Typical Accuracy Rates

  • Professional recording: 95-98%
  • Clean amateur recording: 90-95%
  • Phone/video calls: 85-90%
  • Noisy environments: 75-85%

Post-Processing Tips

Editing Transcriptions

After transcription:
  1. Review for accuracy - Check names and technical terms
  2. Add formatting - Break into paragraphs
  3. Correct errors - Fix any misheard words
  4. Add context - Include speaker names

Export Options

Save your transcriptions as:
  • Plain text (.txt)
  • Word document (.docx)
  • Subtitle file (.srt)
  • PDF document

Common Applications

Business

  • Meeting minutes
  • Interview transcripts
  • Call recordings
  • Training materials

Education

  • Lecture notes
  • Research interviews
  • Student recordings
  • Language learning

Content Creation

  • Video scripts
  • Podcast show notes
  • Social media captions
  • Blog post drafts

Accessibility

  • Hearing impaired support
  • Multi-language content
  • Searchable archives
  • Documentation

Troubleshooting

No Text Output

Check:
  • Audio file isn’t corrupted
  • File format is supported
  • Audio contains speech
  • Volume is audible

Incorrect Language

Solutions:
  • Manually select the correct language
  • Ensure majority of audio is in one language
  • Separate multi-language content

Missing Words

Common causes:
  • Speaking too fast
  • Mumbling or unclear speech
  • Technical terms not recognized
  • Very quiet sections

Privacy & Security

Your Data

  • Audio files are processed securely
  • Transcriptions are private to your account
  • Files are not used for training
  • Delete anytime from your account

Sensitive Content

For confidential audio:
  • Use on-premise solutions if available
  • Review privacy policy
  • Consider redacting sensitive information
  • Download and delete after processing

Best Practices Summary

  1. Start with quality audio - Good input = good output
  2. Choose the right environment - Quiet spaces work best
  3. Speak clearly - Articulate and consistent pace
  4. Review and edit - All transcriptions benefit from review
  5. Use appropriate tools - Different content needs different approaches

Get Support

Need help with transcription?