# Speech to Text

Convert spoken audio into written text with NeuraAI's speech recognition models. Powered by Whisper, our transcription service supports multiple languages and audio formats with high accuracy.

## Overview

The speech-to-text API can:

* Transcribe audio files in various formats (MP3, WAV, M4A, etc.)
* Support 50+ languages
* Handle background noise and accents
* Provide timestamps for segments
* Process files up to 25MB

## Basic Transcription

Convert an audio file to text:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.neura-ai.app/v1"
)

with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcription.text)
```

## Supported Audio Formats

* MP3
* MP4
* MPEG
* MPGA
* M4A
* WAV
* WEBM

## Language Support

Whisper automatically detects the spoken language, but you can specify it for better accuracy:

```python
with open("spanish_audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"  # ISO-639-1 code
    )

print(transcription.text)
```

Common language codes:

* `en` - English
* `es` - Spanish
* `fr` - French
* `de` - German
* `it` - Italian
* `pt` - Portuguese
* `nl` - Dutch
* `ja` - Japanese
* `ko` - Korean
* `zh` - Chinese

## Response Formats

### Plain Text (Default)

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="text"
)

print(transcription.text)
```

### JSON

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="json"
)

print(transcription.text)
```

### Verbose JSON

Get detailed information including segments and timestamps:

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json"
)

print(f"Language: {transcription.language}")
print(f"Duration: {transcription.duration}")

for segment in transcription.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")
```

### SRT (Subtitles)

Generate subtitle files:

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="srt"
)

with open("subtitles.srt", "w") as f:
    f.write(transcription.text)
```

### VTT (WebVTT)

For web video players:

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="vtt"
)

with open("subtitles.vtt", "w") as f:
    f.write(transcription.text)
```

## Advanced Options

### Prompt for Context

Provide context to improve accuracy:

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    prompt="This is a technical discussion about machine learning and neural networks."
)
```

The prompt helps with:

* Technical terminology
* Proper nouns and names
* Domain-specific vocabulary
* Consistent spelling of terms

### Temperature

Control randomness in transcription (0-1):

```python
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    temperature=0.2  # Lower = more consistent, Higher = more varied
)
```

## Practical Examples

### Meeting Transcription

```python
from openai import OpenAI

client = OpenAI(base_url="https://api.neura-ai.app/v1")

def transcribe_meeting(audio_path):
    with open(audio_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            response_format="verbose_json",
            prompt="Business meeting discussing Q4 targets and marketing strategy"
        )
    
    # Save full transcript
    with open("meeting_transcript.txt", "w") as f:
        f.write(transcription.text)
    
    # Save timestamped version
    with open("meeting_detailed.txt", "w") as f:
        for segment in transcription.segments:
            f.write(f"[{segment.start:.2f}s]: {segment.text}\n")
    
    return transcription

result = transcribe_meeting("quarterly_meeting.mp3")
print(f"Transcribed {result.duration:.2f} seconds of audio")
```

### Podcast Episode

```python
def transcribe_podcast(episode_file, episode_title):
    with open(episode_file, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            response_format="srt",
            prompt=f"Podcast episode: {episode_title}"
        )
    
    # Save as subtitle file
    srt_filename = episode_file.replace(".mp3", ".srt")
    with open(srt_filename, "w", encoding="utf-8") as f:
        f.write(transcription.text)
    
    print(f"Subtitles saved to {srt_filename}")

transcribe_podcast("episode_42.mp3", "The Future of AI")
```

### Interview Transcription

```python
def transcribe_interview(audio_path, interviewer, interviewee):
    with open(audio_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            response_format="verbose_json",
            prompt=f"Interview between {interviewer} and {interviewee}"
        )
    
    # Format with timestamps
    formatted_text = f"Interview: {interviewee}\n"
    formatted_text += f"Date: {transcription.duration:.0f} seconds\n\n"
    
    for segment in transcription.segments:
        timestamp = f"[{int(segment.start//60):02d}:{int(segment.start%60):02d}]"
        formatted_text += f"{timestamp} {segment.text}\n"
    
    return formatted_text

result = transcribe_interview(
    "interview.wav",
    "John Smith",
    "Jane Doe"
)
print(result)
```

### Video Subtitles Generator

```python
import os

def generate_subtitles(video_file):
    # Extract audio from video (requires ffmpeg)
    audio_file = video_file.replace(".mp4", ".mp3")
    os.system(f'ffmpeg -i "{video_file}" -q:a 0 -map a "{audio_file}" -y')
    
    # Transcribe
    with open(audio_file, "rb") as f:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=f,
            response_format="srt"
        )
    
    # Save subtitles
    srt_file = video_file.replace(".mp4", ".srt")
    with open(srt_file, "w", encoding="utf-8") as f:
        f.write(transcription.text)
    
    # Clean up temporary audio
    os.remove(audio_file)
    
    print(f"✅ Subtitles generated: {srt_file}")

generate_subtitles("presentation.mp4")
```

## Handling Large Files

For files larger than 25MB, split them into chunks:

```python
from pydub import AudioSegment

def transcribe_large_file(file_path):
    # Load audio
    audio = AudioSegment.from_file(file_path)
    
    # Split into 10-minute chunks
    chunk_length_ms = 10 * 60 * 1000
    chunks = [audio[i:i + chunk_length_ms] 
              for i in range(0, len(audio), chunk_length_ms)]
    
    full_transcript = ""
    
    for i, chunk in enumerate(chunks):
        # Export chunk
        chunk_file = f"temp_chunk_{i}.mp3"
        chunk.export(chunk_file, format="mp3")
        
        # Transcribe
        with open(chunk_file, "rb") as f:
            transcription = client.audio.transcriptions.create(
                model="whisper-1",
                file=f
            )
        
        full_transcript += transcription.text + " "
        
        # Clean up
        os.remove(chunk_file)
    
    return full_transcript.strip()
```

## Best Practices

### Audio Quality

* Use lossless formats (WAV) when possible
* Minimum bitrate: 64 kbps
* Recommended sample rate: 16kHz or higher
* Reduce background noise before transcription

### Language Detection

* Specify language code for better accuracy
* Use prompts for technical or specialized content
* For multilingual audio, transcribe segments separately

### Processing Tips

* Split long files into manageable chunks
* Use lower temperature (0.0-0.3) for technical content
* Use higher temperature (0.5-0.8) for creative content
* Provide context prompts for better terminology recognition

### Error Handling

```python
def safe_transcribe(audio_path):
    try:
        with open(audio_path, "rb") as audio_file:
            transcription = client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                response_format="text"
            )
        return transcription.text
    
    except FileNotFoundError:
        print(f"❌ File not found: {audio_path}")
        return None
    
    except Exception as e:
        print(f"❌ Transcription error: {e}")
        return None
```

## Common Use Cases

* **Meeting Notes** - Automatic transcription of business meetings
* **Podcast Production** - Generate show notes and transcripts
* **Video Subtitles** - Create accessibility captions
* **Interview Analysis** - Transcribe research interviews
* **Voice Notes** - Convert voice memos to text
* **Customer Support** - Transcribe support calls for analysis
* **Legal Documentation** - Transcribe depositions and hearings
* **Medical Records** - Convert doctor dictations to text

## Limitations

* Maximum file size: 25MB
* Supported audio length: Up to several hours
* Background noise may affect accuracy
* Heavy accents may require language specification
* Real-time streaming not supported (batch processing only)

## Tips for Better Results

1. **Clean Audio** - Remove background noise when possible
2. **Good Microphone** - Use quality recording equipment
3. **Clear Speech** - Speak clearly and at moderate pace
4. **Context Prompts** - Provide relevant context for technical terms
5. **Specify Language** - Set language code for non-English audio
6. **Format Choice** - Use verbose JSON for editing, SRT for subtitles