Speech to Text
Convert spoken audio into written text with NeuraAI’s speech recognition models. Powered by Whisper, our transcription service supports multiple languages and audio formats with high accuracy.
Overview
The speech-to-text API can:
- Transcribe audio files in various formats (MP3, WAV, M4A, etc.)
- Support 50+ languages
- Handle background noise and accents
- Provide timestamps for segments
- Process files up to 25MB
Basic Transcription
Convert an audio file to text:
Supported Audio Formats
- MP3
- MP4
- MPEG
- MPGA
- M4A
- WAV
- WEBM
Language Support
Whisper automatically detects the spoken language, but you can specify it for better accuracy:
Common language codes:
en- Englishes- Spanishfr- Frenchde- Germanit- Italianpt- Portuguesenl- Dutchja- Japaneseko- Koreanzh- Chinese
Response Formats
Plain Text (Default)
JSON
Verbose JSON
Get detailed information including segments and timestamps:
SRT (Subtitles)
Generate subtitle files:
VTT (WebVTT)
For web video players:
Advanced Options
Prompt for Context
Provide context to improve accuracy:
The prompt helps with:
- Technical terminology
- Proper nouns and names
- Domain-specific vocabulary
- Consistent spelling of terms
Temperature
Control randomness in transcription (0-1):
Practical Examples
Meeting Transcription
Podcast Episode
Interview Transcription
Video Subtitles Generator
Handling Large Files
For files larger than 25MB, split them into chunks:
Best Practices
Audio Quality
- Use lossless formats (WAV) when possible
- Minimum bitrate: 64 kbps
- Recommended sample rate: 16kHz or higher
- Reduce background noise before transcription
Language Detection
- Specify language code for better accuracy
- Use prompts for technical or specialized content
- For multilingual audio, transcribe segments separately
Processing Tips
- Split long files into manageable chunks
- Use lower temperature (0.0-0.3) for technical content
- Use higher temperature (0.5-0.8) for creative content
- Provide context prompts for better terminology recognition
Error Handling
Common Use Cases
- Meeting Notes - Automatic transcription of business meetings
- Podcast Production - Generate show notes and transcripts
- Video Subtitles - Create accessibility captions
- Interview Analysis - Transcribe research interviews
- Voice Notes - Convert voice memos to text
- Customer Support - Transcribe support calls for analysis
- Legal Documentation - Transcribe depositions and hearings
- Medical Records - Convert doctor dictations to text
Limitations
- Maximum file size: 25MB
- Supported audio length: Up to several hours
- Background noise may affect accuracy
- Heavy accents may require language specification
- Real-time streaming not supported (batch processing only)
Tips for Better Results
- Clean Audio - Remove background noise when possible
- Good Microphone - Use quality recording equipment
- Clear Speech - Speak clearly and at moderate pace
- Context Prompts - Provide relevant context for technical terms
- Specify Language - Set language code for non-English audio
- Format Choice - Use verbose JSON for editing, SRT for subtitles