Audio Translation

View as Markdown

Translate spoken audio from any supported language directly into English text. NeuraAI’s translation service automatically detects the source language and provides accurate English translations.

Overview

The audio translation API:

  • Translates from 50+ languages to English
  • Automatically detects source language
  • Supports multiple audio formats
  • Maintains context and meaning
  • Handles various accents and dialects

Basic Translation

Translate audio to English:

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.neura-ai.app/v1"
5)
6
7with open("spanish_audio.mp3", "rb") as audio_file:
8 translation = client.audio.translations.create(
9 model="whisper-1",
10 file=audio_file
11 )
12
13print(translation.text)

How It Differs from Transcription

FeatureTranscriptionTranslation
Output LanguageSame as inputAlways English
PurposeConvert speech to textTranslate to English
Language DetectionOptionalAutomatic

Example:

  • Input: Spanish audio “Hola, ¿cómo estás?”
  • Transcription: “Hola, ¿cómo estás?”
  • Translation: “Hello, how are you?”

Supported Input Languages

The translation API accepts audio in any language supported by Whisper, including:

  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Dutch (nl)
  • Russian (ru)
  • Japanese (ja)
  • Korean (ko)
  • Chinese (zh)
  • Arabic (ar)
  • Hindi (hi)
  • And 40+ more languages

Supported Audio Formats

  • MP3
  • MP4
  • MPEG
  • MPGA
  • M4A
  • WAV
  • WEBM

Maximum file size: 25MB

Response Formats

Plain Text (Default)

1with open("french_audio.mp3", "rb") as audio_file:
2 translation = client.audio.translations.create(
3 model="whisper-1",
4 file=audio_file,
5 response_format="text"
6 )
7
8print(translation.text)

JSON

1with open("german_audio.mp3", "rb") as audio_file:
2 translation = client.audio.translations.create(
3 model="whisper-1",
4 file=audio_file,
5 response_format="json"
6 )
7
8print(translation.text)

Verbose JSON

Get detailed information with segments:

1with open("italian_audio.mp3", "rb") as audio_file:
2 translation = client.audio.translations.create(
3 model="whisper-1",
4 file=audio_file,
5 response_format="verbose_json"
6 )
7
8print(f"Duration: {translation.duration} seconds")
9
10for segment in translation.segments:
11 print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")

Subtitle Formats

Generate English subtitles from foreign language audio:

1# SRT format
2with open("japanese_video.mp3", "rb") as audio_file:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=audio_file,
6 response_format="srt"
7 )
8
9with open("english_subtitles.srt", "w") as f:
10 f.write(translation.text)
11
12# VTT format
13with open("korean_video.mp3", "rb") as audio_file:
14 translation = client.audio.translations.create(
15 model="whisper-1",
16 file=audio_file,
17 response_format="vtt"
18 )
19
20with open("english_subtitles.vtt", "w") as f:
21 f.write(translation.text)

Advanced Options

Prompt for Context

Provide context to improve translation accuracy:

1with open("chinese_business.mp3", "rb") as audio_file:
2 translation = client.audio.translations.create(
3 model="whisper-1",
4 file=audio_file,
5 prompt="Business meeting discussing quarterly sales and marketing strategy"
6 )
7
8print(translation.text)

Context prompts help with:

  • Industry-specific terminology
  • Proper nouns and company names
  • Technical vocabulary
  • Idiomatic expressions

Temperature

Control consistency in translation:

1with open("russian_lecture.mp3", "rb") as audio_file:
2 translation = client.audio.translations.create(
3 model="whisper-1",
4 file=audio_file,
5 temperature=0.0 # Most consistent/deterministic
6 )

Practical Examples

Translating International News

1def translate_news_clip(audio_file, topic):
2 with open(audio_file, "rb") as f:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=f,
6 response_format="verbose_json",
7 prompt=f"News report about {topic}"
8 )
9
10 # Save translated transcript
11 output_file = audio_file.replace(".mp3", "_english.txt")
12 with open(output_file, "w") as f:
13 f.write(f"Topic: {topic}\n")
14 f.write(f"Duration: {translation.duration:.2f}s\n\n")
15 f.write(translation.text)
16
17 return translation.text
18
19translate_news_clip("french_news.mp3", "European Union policy")

International Video Content

1def create_english_subtitles(video_audio, original_language):
2 with open(video_audio, "rb") as f:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=f,
6 response_format="srt"
7 )
8
9 subtitle_file = video_audio.replace(".mp3", "_EN.srt")
10 with open(subtitle_file, "w", encoding="utf-8") as f:
11 f.write(translation.text)
12
13 print(f"✅ English subtitles created: {subtitle_file}")
14 return subtitle_file
15
16create_english_subtitles("spanish_tutorial.mp3", "Spanish")

Customer Support Translation

1def translate_support_call(call_recording, customer_language):
2 with open(call_recording, "rb") as f:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=f,
6 response_format="verbose_json",
7 prompt="Customer support call discussing technical issues"
8 )
9
10 # Create formatted transcript
11 transcript = f"Support Call Translation\n"
12 transcript += f"Original Language: {customer_language}\n"
13 transcript += f"Duration: {translation.duration:.2f}s\n"
14 transcript += "=" * 50 + "\n\n"
15
16 for segment in translation.segments:
17 timestamp = f"[{int(segment.start//60):02d}:{int(segment.start%60):02d}]"
18 transcript += f"{timestamp} {segment.text}\n"
19
20 return transcript
21
22result = translate_support_call("german_support.wav", "German")
23print(result)

Educational Content

1def translate_lecture(lecture_file, subject):
2 with open(lecture_file, "rb") as f:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=f,
6 response_format="text",
7 prompt=f"University lecture on {subject}",
8 temperature=0.1 # More consistent for educational content
9 )
10
11 # Save as markdown for easy reading
12 output_file = lecture_file.replace(".mp3", "_english.md")
13 with open(output_file, "w") as f:
14 f.write(f"# Lecture: {subject}\n\n")
15 f.write(translation.text)
16
17 return translation.text
18
19translate_lecture("physics_lecture_french.mp3", "Quantum Mechanics")

Podcast Translation

1def translate_podcast_episode(episode_file, show_name, episode_num):
2 with open(episode_file, "rb") as f:
3 translation = client.audio.translations.create(
4 model="whisper-1",
5 file=f,
6 response_format="verbose_json",
7 prompt=f"Podcast: {show_name}, Episode {episode_num}"
8 )
9
10 # Create formatted English transcript
11 transcript = f"# {show_name} - Episode {episode_num}\n"
12 transcript += f"## English Translation\n\n"
13 transcript += translation.text
14
15 # Save
16 output_file = f"{show_name}_ep{episode_num}_EN.md"
17 with open(output_file, "w") as f:
18 f.write(transcript)
19
20 print(f"✅ Translated podcast saved to {output_file}")
21 return transcript
22
23translate_podcast_episode(
24 "italian_podcast.mp3",
25 "Tech Talk Italy",
26 "042"
27)

Comparison with Transcription + Translation

You might wonder: should I transcribe first, then translate? Or use audio translation directly?

✅ Single API call ✅ Faster processing ✅ Better context preservation ✅ More accurate for idiomatic expressions ✅ Lower cost

1# One step - Direct translation
2with open("french.mp3", "rb") as f:
3 result = client.audio.translations.create(model="whisper-1", file=f)

Two-Step Process

❌ Two API calls required ❌ Slower overall ❌ May lose nuance in translation ✅ Provides original transcript ✅ Useful if you need both versions

1# Two steps - Transcribe then translate
2# Step 1: Transcribe in original language
3with open("french.mp3", "rb") as f:
4 transcription = client.audio.transcriptions.create(
5 model="whisper-1",
6 file=f,
7 language="fr"
8 )
9
10# Step 2: Translate text (requires text translation API)
11# (This would require additional API call)

Handling Large Files

For files larger than 25MB, split them into chunks:

1from pydub import AudioSegment
2import os
3
4def translate_large_file(file_path):
5 audio = AudioSegment.from_file(file_path)
6
7 # Split into 10-minute chunks
8 chunk_length_ms = 10 * 60 * 1000
9 chunks = [audio[i:i + chunk_length_ms]
10 for i in range(0, len(audio), chunk_length_ms)]
11
12 full_translation = ""
13
14 for i, chunk in enumerate(chunks):
15 chunk_file = f"temp_chunk_{i}.mp3"
16 chunk.export(chunk_file, format="mp3")
17
18 with open(chunk_file, "rb") as f:
19 translation = client.audio.translations.create(
20 model="whisper-1",
21 file=f
22 )
23
24 full_translation += translation.text + " "
25 os.remove(chunk_file)
26
27 return full_translation.strip()

Best Practices

Audio Quality

  • Use clear audio with minimal background noise
  • Recommended sample rate: 16kHz or higher
  • Minimum bitrate: 64 kbps for best results

Context Prompts

  • Include topic or subject matter
  • Mention technical terminology
  • Specify proper nouns when known

Error Handling

1def safe_translate(audio_path, context=""):
2 try:
3 with open(audio_path, "rb") as f:
4 translation = client.audio.translations.create(
5 model="whisper-1",
6 file=f,
7 prompt=context
8 )
9 return translation.text
10
11 except FileNotFoundError:
12 print(f"❌ File not found: {audio_path}")
13 return None
14
15 except Exception as e:
16 print(f"❌ Translation error: {e}")
17 return None
18
19result = safe_translate("german_audio.mp3", "Technical presentation")
20if result:
21 print(result)

Common Use Cases

  • International Business - Translate meetings and conferences
  • Content Localization - Create English versions of foreign content
  • Customer Support - Understand international customer calls
  • Research - Translate foreign language interviews
  • Education - Make international lectures accessible
  • Media - Subtitle foreign films and videos
  • Travel - Translate tour guides and presentations

Limitations

  • Output is always in English (use transcription for other languages)
  • Maximum file size: 25MB
  • Batch processing only (no real-time streaming)
  • Quality depends on audio clarity and accent
  • Idiomatic expressions may be literal

Tips for Better Results

  1. Clean Audio - Reduce background noise
  2. Context Matters - Use prompts for technical or specialized content
  3. Test First - Try a small sample before processing large files
  4. Quality Recording - Use good microphones for better accuracy
  5. Split Large Files - Break up files over 25MB
  6. Lower Temperature - Use 0.0-0.3 for consistent technical translations