Overview
Transform any audio recording into text with Fish Audio’s speech recognition. Perfect for transcriptions, subtitles, and voice commands.Getting Started
Web Interface
Transcribe audio instantly:1
Visit Fish Audio
Go to fish.audio and log in
2
Navigate to Transcribe
Click on “Speech to Text” in your dashboard
3
Upload Audio
Select your audio file (MP3, WAV, M4A)
4
Get Transcription
Click “Transcribe” and copy your text
Supported Formats
Audio Files
Accepted formats:- MP3 (recommended)
- WAV
- M4A
- OGG
- FLAC
- AAC
- Maximum size: 100MB
- Maximum duration: 60 minutes
- Minimum duration: 1 second
Language Support
Automatic Detection
The system automatically detects the language spoken in your audio. No configuration needed!Manual Selection
For better accuracy, specify the language: Major Languages:- English (en)
- Chinese (zh)
- Spanish (es)
- French (fr)
- German (de)
- Japanese (ja)
- Korean (ko)
- Portuguese (pt)
Audio Quality Tips
For Best Results
Recording Environment:- Quiet room with minimal echo
- No background music
- Clear, consistent speaking voice
- One speaker at a time
- Sample rate: 16kHz or higher
- Bit rate: 128kbps or higher
- Mono or stereo (mono preferred)
Common Issues
Poor transcription quality?- Remove background noise
- Increase microphone volume
- Speak clearly and not too fast
- Avoid multiple speakers talking over each other
Use Cases
Meeting Transcription
Convert recorded meetings into searchable text:- Record your meeting (Zoom, Teams, etc.)
- Export the audio file
- Upload to Fish Audio
- Get formatted transcription with timestamps
Podcast Transcripts
Create written versions of your podcasts:- Generate show notes automatically
- Create searchable content
- Improve accessibility
- Enable translations
Video Subtitles
Generate subtitles for your videos:- Extract audio from video
- Transcribe with Fish Audio
- Get timestamped text
- Import into video editor
Voice Notes
Convert voice memos to text:- Dictate ideas quickly
- Transcribe later for editing
- Search through voice notes
- Share as text documents
Advanced Features
Timestamps
Get precise timing for each spoken segment:- Creating subtitles
- Navigating long recordings
- Synchronizing with video
- Building searchable archives
Speaker Detection
Identify different speakers in conversations:Punctuation & Formatting
Automatic formatting includes:- Sentence capitalization
- Punctuation marks
- Paragraph breaks
- Number formatting
Tips for Different Content
Interviews
Best practices:- Use a good microphone for each speaker
- Record in a quiet environment
- Speak one at a time
- Keep consistent volume levels
Lectures & Presentations
Optimize for:- Clear articulation of technical terms
- Pause between topics
- Repeat important points
- Avoid reading too fast
Phone Calls
Considerations:- Phone audio is lower quality
- Expect slightly lower accuracy
- Speak clearly and slowly
- Avoid speakerphone if possible
Accuracy Expectations
What Affects Accuracy
Positive factors:- Clear audio quality
- Native speaker accent
- Common vocabulary
- Single speaker
- Heavy accents
- Technical jargon
- Multiple speakers
- Background noise
Typical Accuracy Rates
- Professional recording: 95-98%
- Clean amateur recording: 90-95%
- Phone/video calls: 85-90%
- Noisy environments: 75-85%
Post-Processing Tips
Editing Transcriptions
After transcription:- Review for accuracy - Check names and technical terms
- Add formatting - Break into paragraphs
- Correct errors - Fix any misheard words
- Add context - Include speaker names
Export Options
Save your transcriptions as:- Plain text (.txt)
- Word document (.docx)
- Subtitle file (.srt)
- PDF document
Common Applications
Business
- Meeting minutes
- Interview transcripts
- Call recordings
- Training materials
Education
- Lecture notes
- Research interviews
- Student recordings
- Language learning
Content Creation
- Video scripts
- Podcast show notes
- Social media captions
- Blog post drafts
Accessibility
- Hearing impaired support
- Multi-language content
- Searchable archives
- Documentation
Troubleshooting
No Text Output
Check:- Audio file isn’t corrupted
- File format is supported
- Audio contains speech
- Volume is audible
Incorrect Language
Solutions:- Manually select the correct language
- Ensure majority of audio is in one language
- Separate multi-language content
Missing Words
Common causes:- Speaking too fast
- Mumbling or unclear speech
- Technical terms not recognized
- Very quiet sections
Privacy & Security
Your Data
- Audio files are processed securely
- Transcriptions are private to your account
- Files are not used for training
- Delete anytime from your account
Sensitive Content
For confidential audio:- Use on-premise solutions if available
- Review privacy policy
- Consider redacting sensitive information
- Download and delete after processing
Best Practices Summary
- Start with quality audio - Good input = good output
- Choose the right environment - Quiet spaces work best
- Speak clearly - Articulate and consistent pace
- Review and edit - All transcriptions benefit from review
- Use appropriate tools - Different content needs different approaches
Get Support
Need help with transcription?- Try it free: fish.audio
- Community: Discord
- Email: support@fish.audio
- Status: status.fish.audio