Audio Translation | NeuraAI Documentation

Translate spoken audio from any supported language directly into English text. NeuraAI’s translation service automatically detects the source language and provides accurate English translations.

Overview

The audio translation API:

Translates from 50+ languages to English
Automatically detects source language
Supports multiple audio formats
Maintains context and meaning
Handles various accents and dialects

Basic Translation

Translate audio to English:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.neura-ai.app/v1"
5 )
6 
7 with open("spanish_audio.mp3", "rb") as audio_file:
8     translation = client.audio.translations.create(
9         model="whisper-1",
10         file=audio_file
11     )
12 
13 print(translation.text)

How It Differs from Transcription

Feature	Transcription	Translation
Output Language	Same as input	Always English
Purpose	Convert speech to text	Translate to English
Language Detection	Optional	Automatic

Example:

Input: Spanish audio “Hola, ¿cómo estás?”
Transcription: “Hola, ¿cómo estás?”
Translation: “Hello, how are you?”

Supported Input Languages

The translation API accepts audio in any language supported by Whisper, including:

Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Dutch (nl)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)
Arabic (ar)
Hindi (hi)
And 40+ more languages

Supported Audio Formats

MP3
MP4
MPEG
MPGA
M4A
WAV
WEBM

Maximum file size: 25MB

Response Formats

Plain Text (Default)

1 with open("french_audio.mp3", "rb") as audio_file:
2     translation = client.audio.translations.create(
3         model="whisper-1",
4         file=audio_file,
5         response_format="text"
6     )
7 
8 print(translation.text)

JSON

1 with open("german_audio.mp3", "rb") as audio_file:
2     translation = client.audio.translations.create(
3         model="whisper-1",
4         file=audio_file,
5         response_format="json"
6     )
7 
8 print(translation.text)

Verbose JSON

Get detailed information with segments:

1 with open("italian_audio.mp3", "rb") as audio_file:
2     translation = client.audio.translations.create(
3         model="whisper-1",
4         file=audio_file,
5         response_format="verbose_json"
6     )
7 
8 print(f"Duration: {translation.duration} seconds")
9 
10 for segment in translation.segments:
11     print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")

Subtitle Formats

Generate English subtitles from foreign language audio:

1 # SRT format
2 with open("japanese_video.mp3", "rb") as audio_file:
3     translation = client.audio.translations.create(
4         model="whisper-1",
5         file=audio_file,
6         response_format="srt"
7     )
8 
9 with open("english_subtitles.srt", "w") as f:
10     f.write(translation.text)
11 
12 # VTT format
13 with open("korean_video.mp3", "rb") as audio_file:
14     translation = client.audio.translations.create(
15         model="whisper-1",
16         file=audio_file,
17         response_format="vtt"
18     )
19 
20 with open("english_subtitles.vtt", "w") as f:
21     f.write(translation.text)

Advanced Options

Prompt for Context

Provide context to improve translation accuracy:

1 with open("chinese_business.mp3", "rb") as audio_file:
2     translation = client.audio.translations.create(
3         model="whisper-1",
4         file=audio_file,
5         prompt="Business meeting discussing quarterly sales and marketing strategy"
6     )
7 
8 print(translation.text)

Context prompts help with:

Industry-specific terminology
Proper nouns and company names
Technical vocabulary
Idiomatic expressions

Temperature

Control consistency in translation:

1 with open("russian_lecture.mp3", "rb") as audio_file:
2     translation = client.audio.translations.create(
3         model="whisper-1",
4         file=audio_file,
5         temperature=0.0  # Most consistent/deterministic
6     )

Practical Examples

Translating International News

1 def translate_news_clip(audio_file, topic):
2     with open(audio_file, "rb") as f:
3         translation = client.audio.translations.create(
4             model="whisper-1",
5             file=f,
6             response_format="verbose_json",
7             prompt=f"News report about {topic}"
8         )
9     
10     # Save translated transcript
11     output_file = audio_file.replace(".mp3", "_english.txt")
12     with open(output_file, "w") as f:
13         f.write(f"Topic: {topic}\n")
14         f.write(f"Duration: {translation.duration:.2f}s\n\n")
15         f.write(translation.text)
16     
17     return translation.text
18 
19 translate_news_clip("french_news.mp3", "European Union policy")

International Video Content

1 def create_english_subtitles(video_audio, original_language):
2     with open(video_audio, "rb") as f:
3         translation = client.audio.translations.create(
4             model="whisper-1",
5             file=f,
6             response_format="srt"
7         )
8     
9     subtitle_file = video_audio.replace(".mp3", "_EN.srt")
10     with open(subtitle_file, "w", encoding="utf-8") as f:
11         f.write(translation.text)
12     
13     print(f"✅ English subtitles created: {subtitle_file}")
14     return subtitle_file
15 
16 create_english_subtitles("spanish_tutorial.mp3", "Spanish")

Customer Support Translation

1 def translate_support_call(call_recording, customer_language):
2     with open(call_recording, "rb") as f:
3         translation = client.audio.translations.create(
4             model="whisper-1",
5             file=f,
6             response_format="verbose_json",
7             prompt="Customer support call discussing technical issues"
8         )
9     
10     # Create formatted transcript
11     transcript = f"Support Call Translation\n"
12     transcript += f"Original Language: {customer_language}\n"
13     transcript += f"Duration: {translation.duration:.2f}s\n"
14     transcript += "=" * 50 + "\n\n"
15     
16     for segment in translation.segments:
17         timestamp = f"[{int(segment.start//60):02d}:{int(segment.start%60):02d}]"
18         transcript += f"{timestamp} {segment.text}\n"
19     
20     return transcript
21 
22 result = translate_support_call("german_support.wav", "German")
23 print(result)

Educational Content

1 def translate_lecture(lecture_file, subject):
2     with open(lecture_file, "rb") as f:
3         translation = client.audio.translations.create(
4             model="whisper-1",
5             file=f,
6             response_format="text",
7             prompt=f"University lecture on {subject}",
8             temperature=0.1  # More consistent for educational content
9         )
10     
11     # Save as markdown for easy reading
12     output_file = lecture_file.replace(".mp3", "_english.md")
13     with open(output_file, "w") as f:
14         f.write(f"# Lecture: {subject}\n\n")
15         f.write(translation.text)
16     
17     return translation.text
18 
19 translate_lecture("physics_lecture_french.mp3", "Quantum Mechanics")

Podcast Translation

1 def translate_podcast_episode(episode_file, show_name, episode_num):
2     with open(episode_file, "rb") as f:
3         translation = client.audio.translations.create(
4             model="whisper-1",
5             file=f,
6             response_format="verbose_json",
7             prompt=f"Podcast: {show_name}, Episode {episode_num}"
8         )
9     
10     # Create formatted English transcript
11     transcript = f"# {show_name} - Episode {episode_num}\n"
12     transcript += f"## English Translation\n\n"
13     transcript += translation.text
14     
15     # Save
16     output_file = f"{show_name}_ep{episode_num}_EN.md"
17     with open(output_file, "w") as f:
18         f.write(transcript)
19     
20     print(f"✅ Translated podcast saved to {output_file}")
21     return transcript
22 
23 translate_podcast_episode(
24     "italian_podcast.mp3",
25     "Tech Talk Italy",
26     "042"
27 )

Comparison with Transcription + Translation

You might wonder: should I transcribe first, then translate? Or use audio translation directly?

Direct Audio Translation (Recommended)

✅ Single API call ✅ Faster processing ✅ Better context preservation ✅ More accurate for idiomatic expressions ✅ Lower cost

1 # One step - Direct translation
2 with open("french.mp3", "rb") as f:
3     result = client.audio.translations.create(model="whisper-1", file=f)

Two-Step Process

❌ Two API calls required ❌ Slower overall ❌ May lose nuance in translation ✅ Provides original transcript ✅ Useful if you need both versions

1 # Two steps - Transcribe then translate
2 # Step 1: Transcribe in original language
3 with open("french.mp3", "rb") as f:
4     transcription = client.audio.transcriptions.create(
5         model="whisper-1", 
6         file=f,
7         language="fr"
8     )
9 
10 # Step 2: Translate text (requires text translation API)
11 # (This would require additional API call)

Handling Large Files

For files larger than 25MB, split them into chunks:

1 from pydub import AudioSegment
2 import os
3 
4 def translate_large_file(file_path):
5     audio = AudioSegment.from_file(file_path)
6     
7     # Split into 10-minute chunks
8     chunk_length_ms = 10 * 60 * 1000
9     chunks = [audio[i:i + chunk_length_ms] 
10               for i in range(0, len(audio), chunk_length_ms)]
11     
12     full_translation = ""
13     
14     for i, chunk in enumerate(chunks):
15         chunk_file = f"temp_chunk_{i}.mp3"
16         chunk.export(chunk_file, format="mp3")
17         
18         with open(chunk_file, "rb") as f:
19             translation = client.audio.translations.create(
20                 model="whisper-1",
21                 file=f
22             )
23         
24         full_translation += translation.text + " "
25         os.remove(chunk_file)
26     
27     return full_translation.strip()

Best Practices

Audio Quality

Use clear audio with minimal background noise
Recommended sample rate: 16kHz or higher
Minimum bitrate: 64 kbps for best results

Context Prompts

Include topic or subject matter
Mention technical terminology
Specify proper nouns when known

Error Handling

1 def safe_translate(audio_path, context=""):
2     try:
3         with open(audio_path, "rb") as f:
4             translation = client.audio.translations.create(
5                 model="whisper-1",
6                 file=f,
7                 prompt=context
8             )
9         return translation.text
10     
11     except FileNotFoundError:
12         print(f"❌ File not found: {audio_path}")
13         return None
14     
15     except Exception as e:
16         print(f"❌ Translation error: {e}")
17         return None
18 
19 result = safe_translate("german_audio.mp3", "Technical presentation")
20 if result:
21     print(result)

Common Use Cases

International Business - Translate meetings and conferences
Content Localization - Create English versions of foreign content
Customer Support - Understand international customer calls
Research - Translate foreign language interviews
Education - Make international lectures accessible
Media - Subtitle foreign films and videos
Travel - Translate tour guides and presentations

Limitations

Output is always in English (use transcription for other languages)
Maximum file size: 25MB
Batch processing only (no real-time streaming)
Quality depends on audio clarity and accent
Idiomatic expressions may be literal

Tips for Better Results

Clean Audio - Reduce background noise
Context Matters - Use prompts for technical or specialized content
Test First - Try a small sample before processing large files
Quality Recording - Use good microphones for better accuracy
Split Large Files - Break up files over 25MB
Lower Temperature - Use 0.0-0.3 for consistent technical translations