# Audio Translation Translate spoken audio from any supported language directly into English text. NeuraAI's translation service automatically detects the source language and provides accurate English translations. ## Overview The audio translation API: * Translates from 50+ languages to English * Automatically detects source language * Supports multiple audio formats * Maintains context and meaning * Handles various accents and dialects ## Basic Translation Translate audio to English: ```python from openai import OpenAI client = OpenAI( base_url="https://api.neura-ai.app/v1" ) with open("spanish_audio.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file ) print(translation.text) ``` ## How It Differs from Transcription | Feature | Transcription | Translation | | ------------------ | ---------------------- | -------------------- | | Output Language | Same as input | Always English | | Purpose | Convert speech to text | Translate to English | | Language Detection | Optional | Automatic | **Example:** * Input: Spanish audio "Hola, ¿cómo estás?" * Transcription: "Hola, ¿cómo estás?" * Translation: "Hello, how are you?" ## Supported Input Languages The translation API accepts audio in any language supported by Whisper, including: * Spanish (es) * French (fr) * German (de) * Italian (it) * Portuguese (pt) * Dutch (nl) * Russian (ru) * Japanese (ja) * Korean (ko) * Chinese (zh) * Arabic (ar) * Hindi (hi) * And 40+ more languages ## Supported Audio Formats * MP3 * MP4 * MPEG * MPGA * M4A * WAV * WEBM Maximum file size: 25MB ## Response Formats ### Plain Text (Default) ```python with open("french_audio.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, response_format="text" ) print(translation.text) ``` ### JSON ```python with open("german_audio.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, response_format="json" ) print(translation.text) ``` ### Verbose JSON Get detailed information with segments: ```python with open("italian_audio.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, response_format="verbose_json" ) print(f"Duration: {translation.duration} seconds") for segment in translation.segments: print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}") ``` ### Subtitle Formats Generate English subtitles from foreign language audio: ```python # SRT format with open("japanese_video.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, response_format="srt" ) with open("english_subtitles.srt", "w") as f: f.write(translation.text) # VTT format with open("korean_video.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, response_format="vtt" ) with open("english_subtitles.vtt", "w") as f: f.write(translation.text) ``` ## Advanced Options ### Prompt for Context Provide context to improve translation accuracy: ```python with open("chinese_business.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, prompt="Business meeting discussing quarterly sales and marketing strategy" ) print(translation.text) ``` Context prompts help with: * Industry-specific terminology * Proper nouns and company names * Technical vocabulary * Idiomatic expressions ### Temperature Control consistency in translation: ```python with open("russian_lecture.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-1", file=audio_file, temperature=0.0 # Most consistent/deterministic ) ``` ## Practical Examples ### Translating International News ```python def translate_news_clip(audio_file, topic): with open(audio_file, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, response_format="verbose_json", prompt=f"News report about {topic}" ) # Save translated transcript output_file = audio_file.replace(".mp3", "_english.txt") with open(output_file, "w") as f: f.write(f"Topic: {topic}\n") f.write(f"Duration: {translation.duration:.2f}s\n\n") f.write(translation.text) return translation.text translate_news_clip("french_news.mp3", "European Union policy") ``` ### International Video Content ```python def create_english_subtitles(video_audio, original_language): with open(video_audio, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, response_format="srt" ) subtitle_file = video_audio.replace(".mp3", "_EN.srt") with open(subtitle_file, "w", encoding="utf-8") as f: f.write(translation.text) print(f"✅ English subtitles created: {subtitle_file}") return subtitle_file create_english_subtitles("spanish_tutorial.mp3", "Spanish") ``` ### Customer Support Translation ```python def translate_support_call(call_recording, customer_language): with open(call_recording, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, response_format="verbose_json", prompt="Customer support call discussing technical issues" ) # Create formatted transcript transcript = f"Support Call Translation\n" transcript += f"Original Language: {customer_language}\n" transcript += f"Duration: {translation.duration:.2f}s\n" transcript += "=" * 50 + "\n\n" for segment in translation.segments: timestamp = f"[{int(segment.start//60):02d}:{int(segment.start%60):02d}]" transcript += f"{timestamp} {segment.text}\n" return transcript result = translate_support_call("german_support.wav", "German") print(result) ``` ### Educational Content ```python def translate_lecture(lecture_file, subject): with open(lecture_file, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, response_format="text", prompt=f"University lecture on {subject}", temperature=0.1 # More consistent for educational content ) # Save as markdown for easy reading output_file = lecture_file.replace(".mp3", "_english.md") with open(output_file, "w") as f: f.write(f"# Lecture: {subject}\n\n") f.write(translation.text) return translation.text translate_lecture("physics_lecture_french.mp3", "Quantum Mechanics") ``` ### Podcast Translation ```python def translate_podcast_episode(episode_file, show_name, episode_num): with open(episode_file, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, response_format="verbose_json", prompt=f"Podcast: {show_name}, Episode {episode_num}" ) # Create formatted English transcript transcript = f"# {show_name} - Episode {episode_num}\n" transcript += f"## English Translation\n\n" transcript += translation.text # Save output_file = f"{show_name}_ep{episode_num}_EN.md" with open(output_file, "w") as f: f.write(transcript) print(f"✅ Translated podcast saved to {output_file}") return transcript translate_podcast_episode( "italian_podcast.mp3", "Tech Talk Italy", "042" ) ``` ## Comparison with Transcription + Translation You might wonder: should I transcribe first, then translate? Or use audio translation directly? ### Direct Audio Translation (Recommended) ✅ Single API call ✅ Faster processing ✅ Better context preservation ✅ More accurate for idiomatic expressions ✅ Lower cost ```python # One step - Direct translation with open("french.mp3", "rb") as f: result = client.audio.translations.create(model="whisper-1", file=f) ``` ### Two-Step Process ❌ Two API calls required ❌ Slower overall ❌ May lose nuance in translation ✅ Provides original transcript ✅ Useful if you need both versions ```python # Two steps - Transcribe then translate # Step 1: Transcribe in original language with open("french.mp3", "rb") as f: transcription = client.audio.transcriptions.create( model="whisper-1", file=f, language="fr" ) # Step 2: Translate text (requires text translation API) # (This would require additional API call) ``` ## Handling Large Files For files larger than 25MB, split them into chunks: ```python from pydub import AudioSegment import os def translate_large_file(file_path): audio = AudioSegment.from_file(file_path) # Split into 10-minute chunks chunk_length_ms = 10 * 60 * 1000 chunks = [audio[i:i + chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)] full_translation = "" for i, chunk in enumerate(chunks): chunk_file = f"temp_chunk_{i}.mp3" chunk.export(chunk_file, format="mp3") with open(chunk_file, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f ) full_translation += translation.text + " " os.remove(chunk_file) return full_translation.strip() ``` ## Best Practices ### Audio Quality * Use clear audio with minimal background noise * Recommended sample rate: 16kHz or higher * Minimum bitrate: 64 kbps for best results ### Context Prompts * Include topic or subject matter * Mention technical terminology * Specify proper nouns when known ### Error Handling ```python def safe_translate(audio_path, context=""): try: with open(audio_path, "rb") as f: translation = client.audio.translations.create( model="whisper-1", file=f, prompt=context ) return translation.text except FileNotFoundError: print(f"❌ File not found: {audio_path}") return None except Exception as e: print(f"❌ Translation error: {e}") return None result = safe_translate("german_audio.mp3", "Technical presentation") if result: print(result) ``` ## Common Use Cases * **International Business** - Translate meetings and conferences * **Content Localization** - Create English versions of foreign content * **Customer Support** - Understand international customer calls * **Research** - Translate foreign language interviews * **Education** - Make international lectures accessible * **Media** - Subtitle foreign films and videos * **Travel** - Translate tour guides and presentations ## Limitations * Output is always in English (use transcription for other languages) * Maximum file size: 25MB * Batch processing only (no real-time streaming) * Quality depends on audio clarity and accent * Idiomatic expressions may be literal ## Tips for Better Results 1. **Clean Audio** - Reduce background noise 2. **Context Matters** - Use prompts for technical or specialized content 3. **Test First** - Try a small sample before processing large files 4. **Quality Recording** - Use good microphones for better accuracy 5. **Split Large Files** - Break up files over 25MB 6. **Lower Temperature** - Use 0.0-0.3 for consistent technical translations