Speech to Text | NeuraAI Documentation

Convert spoken audio into written text with NeuraAI’s speech recognition models. Powered by Whisper, our transcription service supports multiple languages and audio formats with high accuracy.

Overview

The speech-to-text API can:

Transcribe audio files in various formats (MP3, WAV, M4A, etc.)
Support 50+ languages
Handle background noise and accents
Provide timestamps for segments
Process files up to 25MB

Basic Transcription

Convert an audio file to text:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.neura-ai.app/v1"
5 )
6 
7 with open("audio.mp3", "rb") as audio_file:
8     transcription = client.audio.transcriptions.create(
9         model="whisper-1",
10         file=audio_file
11     )
12 
13 print(transcription.text)

Supported Audio Formats

MP3
MP4
MPEG
MPGA
M4A
WAV
WEBM

Language Support

Whisper automatically detects the spoken language, but you can specify it for better accuracy:

1 with open("spanish_audio.mp3", "rb") as audio_file:
2     transcription = client.audio.transcriptions.create(
3         model="whisper-1",
4         file=audio_file,
5         language="es"  # ISO-639-1 code
6     )
7 
8 print(transcription.text)

Common language codes:

en - English
es - Spanish
fr - French
de - German
it - Italian
pt - Portuguese
nl - Dutch
ja - Japanese
ko - Korean
zh - Chinese

Response Formats

Plain Text (Default)

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     response_format="text"
5 )
6 
7 print(transcription.text)

JSON

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     response_format="json"
5 )
6 
7 print(transcription.text)

Verbose JSON

Get detailed information including segments and timestamps:

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     response_format="verbose_json"
5 )
6 
7 print(f"Language: {transcription.language}")
8 print(f"Duration: {transcription.duration}")
9 
10 for segment in transcription.segments:
11     print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")

SRT (Subtitles)

Generate subtitle files:

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     response_format="srt"
5 )
6 
7 with open("subtitles.srt", "w") as f:
8     f.write(transcription.text)

VTT (WebVTT)

For web video players:

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     response_format="vtt"
5 )
6 
7 with open("subtitles.vtt", "w") as f:
8     f.write(transcription.text)

Advanced Options

Prompt for Context

Provide context to improve accuracy:

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     prompt="This is a technical discussion about machine learning and neural networks."
5 )

The prompt helps with:

Technical terminology
Proper nouns and names
Domain-specific vocabulary
Consistent spelling of terms

Temperature

Control randomness in transcription (0-1):

1 transcription = client.audio.transcriptions.create(
2     model="whisper-1",
3     file=audio_file,
4     temperature=0.2  # Lower = more consistent, Higher = more varied
5 )

Practical Examples

Meeting Transcription

1 from openai import OpenAI
2 
3 client = OpenAI(base_url="https://api.neura-ai.app/v1")
4 
5 def transcribe_meeting(audio_path):
6     with open(audio_path, "rb") as audio_file:
7         transcription = client.audio.transcriptions.create(
8             model="whisper-1",
9             file=audio_file,
10             response_format="verbose_json",
11             prompt="Business meeting discussing Q4 targets and marketing strategy"
12         )
13     
14     # Save full transcript
15     with open("meeting_transcript.txt", "w") as f:
16         f.write(transcription.text)
17     
18     # Save timestamped version
19     with open("meeting_detailed.txt", "w") as f:
20         for segment in transcription.segments:
21             f.write(f"[{segment.start:.2f}s]: {segment.text}\n")
22     
23     return transcription
24 
25 result = transcribe_meeting("quarterly_meeting.mp3")
26 print(f"Transcribed {result.duration:.2f} seconds of audio")

Podcast Episode

1 def transcribe_podcast(episode_file, episode_title):
2     with open(episode_file, "rb") as audio_file:
3         transcription = client.audio.transcriptions.create(
4             model="whisper-1",
5             file=audio_file,
6             response_format="srt",
7             prompt=f"Podcast episode: {episode_title}"
8         )
9     
10     # Save as subtitle file
11     srt_filename = episode_file.replace(".mp3", ".srt")
12     with open(srt_filename, "w", encoding="utf-8") as f:
13         f.write(transcription.text)
14     
15     print(f"Subtitles saved to {srt_filename}")
16 
17 transcribe_podcast("episode_42.mp3", "The Future of AI")

Interview Transcription

1 def transcribe_interview(audio_path, interviewer, interviewee):
2     with open(audio_path, "rb") as audio_file:
3         transcription = client.audio.transcriptions.create(
4             model="whisper-1",
5             file=audio_file,
6             response_format="verbose_json",
7             prompt=f"Interview between {interviewer} and {interviewee}"
8         )
9     
10     # Format with timestamps
11     formatted_text = f"Interview: {interviewee}\n"
12     formatted_text += f"Date: {transcription.duration:.0f} seconds\n\n"
13     
14     for segment in transcription.segments:
15         timestamp = f"[{int(segment.start//60):02d}:{int(segment.start%60):02d}]"
16         formatted_text += f"{timestamp} {segment.text}\n"
17     
18     return formatted_text
19 
20 result = transcribe_interview(
21     "interview.wav",
22     "John Smith",
23     "Jane Doe"
24 )
25 print(result)

Video Subtitles Generator

1 import os
2 
3 def generate_subtitles(video_file):
4     # Extract audio from video (requires ffmpeg)
5     audio_file = video_file.replace(".mp4", ".mp3")
6     os.system(f'ffmpeg -i "{video_file}" -q:a 0 -map a "{audio_file}" -y')
7     
8     # Transcribe
9     with open(audio_file, "rb") as f:
10         transcription = client.audio.transcriptions.create(
11             model="whisper-1",
12             file=f,
13             response_format="srt"
14         )
15     
16     # Save subtitles
17     srt_file = video_file.replace(".mp4", ".srt")
18     with open(srt_file, "w", encoding="utf-8") as f:
19         f.write(transcription.text)
20     
21     # Clean up temporary audio
22     os.remove(audio_file)
23     
24     print(f"✅ Subtitles generated: {srt_file}")
25 
26 generate_subtitles("presentation.mp4")

Handling Large Files

For files larger than 25MB, split them into chunks:

1 from pydub import AudioSegment
2 
3 def transcribe_large_file(file_path):
4     # Load audio
5     audio = AudioSegment.from_file(file_path)
6     
7     # Split into 10-minute chunks
8     chunk_length_ms = 10 * 60 * 1000
9     chunks = [audio[i:i + chunk_length_ms] 
10               for i in range(0, len(audio), chunk_length_ms)]
11     
12     full_transcript = ""
13     
14     for i, chunk in enumerate(chunks):
15         # Export chunk
16         chunk_file = f"temp_chunk_{i}.mp3"
17         chunk.export(chunk_file, format="mp3")
18         
19         # Transcribe
20         with open(chunk_file, "rb") as f:
21             transcription = client.audio.transcriptions.create(
22                 model="whisper-1",
23                 file=f
24             )
25         
26         full_transcript += transcription.text + " "
27         
28         # Clean up
29         os.remove(chunk_file)
30     
31     return full_transcript.strip()

Best Practices

Audio Quality

Use lossless formats (WAV) when possible
Minimum bitrate: 64 kbps
Recommended sample rate: 16kHz or higher
Reduce background noise before transcription

Language Detection

Specify language code for better accuracy
Use prompts for technical or specialized content
For multilingual audio, transcribe segments separately

Processing Tips

Split long files into manageable chunks
Use lower temperature (0.0-0.3) for technical content
Use higher temperature (0.5-0.8) for creative content
Provide context prompts for better terminology recognition

Error Handling

1 def safe_transcribe(audio_path):
2     try:
3         with open(audio_path, "rb") as audio_file:
4             transcription = client.audio.transcriptions.create(
5                 model="whisper-1",
6                 file=audio_file,
7                 response_format="text"
8             )
9         return transcription.text
10     
11     except FileNotFoundError:
12         print(f"❌ File not found: {audio_path}")
13         return None
14     
15     except Exception as e:
16         print(f"❌ Transcription error: {e}")
17         return None

Common Use Cases

Meeting Notes - Automatic transcription of business meetings
Podcast Production - Generate show notes and transcripts
Video Subtitles - Create accessibility captions
Interview Analysis - Transcribe research interviews
Voice Notes - Convert voice memos to text
Customer Support - Transcribe support calls for analysis
Legal Documentation - Transcribe depositions and hearings
Medical Records - Convert doctor dictations to text

Limitations

Maximum file size: 25MB
Supported audio length: Up to several hours
Background noise may affect accuracy
Heavy accents may require language specification
Real-time streaming not supported (batch processing only)

Tips for Better Results

Clean Audio - Remove background noise when possible
Good Microphone - Use quality recording equipment
Clear Speech - Speak clearly and at moderate pace
Context Prompts - Provide relevant context for technical terms
Specify Language - Set language code for non-English audio
Format Choice - Use verbose JSON for editing, SRT for subtitles