# Speak-Y STT - Complete Documentation

> Speak-Y STT is an AI-powered speech-to-text transcription service that transforms audio and video files into accurate text. Built on advanced Whisper AI technology, it offers speaker diarization, support for 57+ transcription languages, 12 interface languages, smart summaries, and seamless integrations.

---

## Table of Contents

1. [Overview](#overview)
2. [Features](#features)
3. [Getting Started](#getting-started)
4. [API Reference](#api-reference)
5. [Pricing](#pricing)
6. [FAQ](#faq)
7. [Integration Examples](#integration-examples)
8. [Technical Specifications](#technical-specifications)

---

## Overview

Speak-Y STT is a modern web-based transcription platform designed for professionals who need reliable, accurate audio-to-text conversion. Whether you're a journalist transcribing interviews, a podcaster creating show notes, or a developer building voice-enabled applications, Speak-Y STT provides the tools you need.

### Why Choose Speak-Y STT?

- **Industry-Leading Accuracy**: Up to 99% accuracy using advanced Whisper AI models
- **Lightning Fast**: Process audio up to 10x real-time speed
- **Secure**: Enterprise-grade encryption for all uploads and transcripts
- **Affordable**: Generous free tier and competitive paid plans
- **Developer Friendly**: Comprehensive API with webhooks and SDK support

---

## Features

### Core Transcription

**AI-Powered Engine**
Our transcription engine is built on OpenAI's Whisper architecture, fine-tuned for optimal performance across diverse audio conditions. The system handles:
- Background noise and music
- Multiple accents and dialects
- Technical terminology
- Low-quality audio (phone recordings, compressed files)

**Speaker Diarization**
Automatic speaker detection identifies who said what in multi-speaker recordings:
- Supports unlimited speakers
- Timestamps for each speaker turn
- Speaker labels customizable post-transcription
- Works with interviews, meetings, podcasts, and phone calls

**Language Support**
Transcribe in 57+ languages including:
- **European**: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Greek, Turkish
- **Asian**: Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Malay, Tamil, Kannada
- **Middle Eastern**: Arabic, Hebrew, Persian, Urdu
- **Other**: Swahili, Afrikaans, Welsh, Maori, and more

Auto-detection automatically identifies the spoken language, or you can specify it manually for faster processing.

**Interface Languages (12)**
The user interface is fully localized in: English, Spanish, Russian, French, German, Italian, Chinese, Japanese, Korean, Portuguese, Arabic, Hindi.

### Smart Features

**AI Summaries**
Automatically generate:
- Executive summaries (1-2 paragraphs)
- Key points and action items
- Chapter markers with timestamps
- Topic segmentation

**Custom Dictionary**
Improve accuracy for specialized content:
- Add technical terms, acronyms, and jargon
- Include proper nouns (names, companies, products)
- Import dictionary from CSV
- Share dictionaries across team members

**Smart Search** (Pro/Creator)
Search within your transcripts:
- Full-text search across all transcripts
- Timestamp-linked results
- Boolean operators (AND, OR, NOT)
- Phrase matching with quotes

### Export Options

Export transcripts in multiple formats:

| Format | Description | Use Case |
|--------|-------------|----------|
| TXT | Plain text | Simple documentation |
| SRT | SubRip subtitle | Video subtitles |
| VTT | WebVTT | Web video players |
| DOCX | Microsoft Word | Professional documents |
| JSON | Structured data | Programmatic access |
| PDF | Portable Document | Archiving |

### Integrations

**YouTube Integration**
- Paste any YouTube URL to transcribe
- Supports videos up to 4 hours (Creator plan)
- Automatic language detection
- Generate subtitles for your uploads

**Webhook Notifications**
- Receive notifications when transcription completes
- Integrate with your own systems
- Automate workflows programmatically via API

---

## Getting Started

### Quick Start (5 minutes)

1. **Sign Up**: Create a free account at https://stt.speak-y.com
2. **Upload**: Drag and drop your audio/video file or paste a YouTube URL
3. **Configure**: Select language (or use auto-detect) and enable speaker diarization if needed
4. **Process**: Click "Process" and wait for the transcription to complete
5. **Export**: Download your transcript in your preferred format

### Best Practices for Optimal Results

**Audio Quality**
- Use recordings at 16kHz sample rate or higher
- Minimize background noise when possible
- Ensure speakers are close to the microphone
- Use mono recordings for single speakers, stereo for multiple

**File Preparation**
- Trim silence from beginning and end
- Split very long recordings (4+ hours) for faster processing
- Convert unusual formats to MP3 or WAV before upload

---

## API Reference

### Authentication

All API requests require authentication via API key in the header:

```
Authorization: Bearer YOUR_API_KEY
```

Get your API key from the Dashboard > Settings > API Keys.

### Base URL

```
https://api.speak-y.com/v1
```

### Endpoints

#### POST /transcriptions

Create a new transcription job.

**Request Body (multipart/form-data)**:
```json
{
  "file": "<binary file data>",
  "language": "en",  // optional, auto-detect if omitted
  "diarization": true,  // enable speaker detection
  "summary": true,  // generate AI summary
  "webhook_url": "https://your-server.com/webhook"  // optional callback
}
```

**Request Body (JSON for URL)**:
```json
{
  "url": "https://youtube.com/watch?v=VIDEO_ID",
  "language": "en",
  "diarization": true,
  "summary": true
}
```

**Response**:
```json
{
  "id": "tr_abc123",
  "status": "processing",
  "created_at": "2024-12-11T10:30:00Z",
  "estimated_completion": "2024-12-11T10:35:00Z"
}
```

#### GET /transcriptions/{id}

Get transcription status and results.

**Response (completed)**:
```json
{
  "id": "tr_abc123",
  "status": "completed",
  "duration_seconds": 1847,
  "language": "en",
  "text": "Full transcript text...",
  "segments": [
    {
      "start": 0.0,
      "end": 4.5,
      "text": "Hello and welcome to the show.",
      "speaker": "SPEAKER_01"
    }
  ],
  "summary": "This episode discusses...",
  "chapters": [
    {"start": 0, "title": "Introduction"},
    {"start": 120, "title": "Main Topic"}
  ],
  "download_urls": {
    "txt": "https://...",
    "srt": "https://...",
    "json": "https://..."
  }
}
```

#### GET /transcriptions

List all transcriptions.

**Query Parameters**:
- `page` (int): Page number, default 1
- `per_page` (int): Items per page, default 20, max 100
- `status` (string): Filter by status (processing, completed, failed)
- `from` (ISO date): Filter by creation date
- `to` (ISO date): Filter by creation date

#### DELETE /transcriptions/{id}

Delete a transcription and associated files.

#### GET /usage

Get current usage statistics.

**Response**:
```json
{
  "plan": "pro",
  "period_start": "2024-12-01",
  "period_end": "2024-12-31",
  "minutes_used": 450,
  "minutes_limit": 1500,
  "files_processed": 23,
  "storage_used_mb": 1250
}
```

### Webhooks

When a webhook URL is provided, we'll POST to it when transcription completes:

```json
{
  "event": "transcription.completed",
  "transcription_id": "tr_abc123",
  "status": "completed",
  "timestamp": "2024-12-11T10:35:00Z"
}
```

Events: `transcription.completed`, `transcription.failed`

### Rate Limits

| Plan | Requests/Hour | Concurrent Jobs |
|------|---------------|-----------------|
| Free | 10 | 1 |
| Pro | 100 | 5 |
| Creator | 500 | 20 |

### Error Codes

| Code | Description |
|------|-------------|
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API key |
| 403 | Forbidden - Feature not available on your plan |
| 404 | Not Found - Resource doesn't exist |
| 413 | File Too Large - Exceeds plan limit |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Server Error - Contact support |

---

## Pricing

### Free Plan - $0/month
Perfect for trying out the service and light usage.

| Feature | Limit |
|---------|-------|
| Monthly minutes | 300 |
| Per-file limit | 60 minutes |
| Files per month | 20 |
| Storage | 7 days |
| Max file size | 500 MB |
| Noise reduction | Basic |
| Speaker detection | ✓ |
| Export formats | TXT, SRT |
| API access | ✗ |

### Pro Plan - $19/month
Ideal for content creators and professionals.

| Feature | Limit |
|---------|-------|
| Monthly minutes | 1,500 (25 hours) |
| Per-file limit | 4 hours |
| Files per month | 300 |
| Storage | 30 days |
| Max file size | 1.5 GB |
| Noise reduction | Advanced |
| Priority processing | ✓ |
| All export formats | ✓ |
| AI chapters | ✓ |
| Smart Search | 100 queries/month |
| Custom dictionary | 100 terms |
| API access | ✓ |

### Creator Plan - $39/month
For power users and businesses.

| Feature | Limit |
|---------|-------|
| Monthly minutes | 6,000 (100 hours) |
| Per-file limit | Unlimited |
| Files per month | Unlimited |
| Storage | 90 days |
| Max file size | 2 GB |
| Noise reduction | Premium |
| Processing speed | Fastest |
| Translation | 50+ languages |
| Smart Search | 500 queries/month |
| Custom dictionary | Unlimited |
| API access | Full |
| Priority support | ✓ |

### Enterprise
Custom solutions for large organizations. Contact sales@speak-y.com.

---

## FAQ

### General Questions

**Q: What audio formats do you support?**
A: We support MP3, WAV, M4A, FLAC, OGG, AAC, WMA for audio, and MP4, MOV, AVI, MKV, WebM, FLV for video.

**Q: How long does transcription take?**
A: Processing time depends on your plan and server load. Pro and Creator plans process at approximately 10x real-time speed (a 60-minute file takes ~6 minutes). Free plan may take longer during peak hours.

**Q: Is my content secure?**
A: Yes. All uploads are encrypted in transit (TLS 1.3) and at rest (AES-256). Files are automatically deleted after the retention period. We never share your content with third parties.

**Q: Can I cancel my subscription anytime?**
A: Yes. You can cancel at any time from your account settings. You'll retain access until the end of your billing period.

### Accuracy Questions

**Q: How accurate is the transcription?**
A: Accuracy varies based on audio quality, accents, and background noise. For clear audio in supported languages, expect 95-99% accuracy. Use Custom Dictionary to improve accuracy for specialized terminology.

**Q: Why are some words transcribed incorrectly?**
A: Common causes include:
- Background noise or music
- Multiple people speaking simultaneously
- Heavy accents or dialects
- Technical jargon not in our training data
Solution: Use Custom Dictionary to add problematic words.

### Technical Questions

**Q: What's the maximum file size?**
A: 500 MB (Free), 1.5 GB (Pro), 2 GB (Creator). For larger files, contact us about Enterprise plans or split your file.

**Q: Do you support real-time transcription?**
A: Currently we support file-based and URL-based transcription. Real-time streaming is on our roadmap.

**Q: Can I use the API for commercial applications?**
A: Yes, Pro and Creator plans include commercial API usage rights.

---

## Integration Examples

### Python

```python
import requests

API_KEY = "your_api_key"
BASE_URL = "https://api.speak-y.com/v1"

# Upload a file
def transcribe_file(file_path, language="auto"):
    headers = {"Authorization": f"Bearer {API_KEY}"}
    
    with open(file_path, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/transcriptions",
            headers=headers,
            files={"file": f},
            data={"language": language, "diarization": "true"}
        )
    
    return response.json()

# Transcribe YouTube video
def transcribe_youtube(url):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(
        f"{BASE_URL}/transcriptions",
        headers=headers,
        json={"url": url, "diarization": True}
    )
    
    return response.json()

# Get results
def get_transcription(transcription_id):
    headers = {"Authorization": f"Bearer {API_KEY}"}
    
    response = requests.get(
        f"{BASE_URL}/transcriptions/{transcription_id}",
        headers=headers
    )
    
    return response.json()
```

### JavaScript/Node.js

```javascript
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const API_KEY = 'your_api_key';
const BASE_URL = 'https://api.speak-y.com/v1';

// Upload a file
async function transcribeFile(filePath, language = 'auto') {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('language', language);
  form.append('diarization', 'true');
  
  const response = await axios.post(
    `${BASE_URL}/transcriptions`,
    form,
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        ...form.getHeaders()
      }
    }
  );
  
  return response.data;
}

// Transcribe YouTube video
async function transcribeYouTube(url) {
  const response = await axios.post(
    `${BASE_URL}/transcriptions`,
    { url, diarization: true },
    {
      headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
      }
    }
  );
  
  return response.data;
}

// Poll for results
async function waitForTranscription(id, maxWait = 600000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < maxWait) {
    const response = await axios.get(
      `${BASE_URL}/transcriptions/${id}`,
      { headers: { 'Authorization': `Bearer ${API_KEY}` } }
    );
    
    if (response.data.status === 'completed') {
      return response.data;
    }
    
    if (response.data.status === 'failed') {
      throw new Error('Transcription failed');
    }
    
    await new Promise(r => setTimeout(r, 5000));
  }
  
  throw new Error('Timeout waiting for transcription');
}
```

### cURL

```bash
# Upload a file
curl -X POST https://api.speak-y.com/v1/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/audio.mp3" \
  -F "language=en" \
  -F "diarization=true"

# Transcribe YouTube URL
curl -X POST https://api.speak-y.com/v1/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "diarization": true}'

# Get transcription status
curl https://api.speak-y.com/v1/transcriptions/tr_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

# Download as SRT
curl https://api.speak-y.com/v1/transcriptions/tr_abc123/download?format=srt \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -o subtitles.srt
```

---

## Technical Specifications

### Supported Formats

**Audio**
| Format | Extension | Notes |
|--------|-----------|-------|
| MP3 | .mp3 | Most common, good compression |
| WAV | .wav | Lossless, larger files |
| M4A | .m4a | Apple format, good quality |
| FLAC | .flac | Lossless compression |
| OGG | .ogg | Open source format |
| AAC | .aac | High quality, small size |
| WMA | .wma | Windows Media Audio |

**Video**
| Format | Extension | Notes |
|--------|-----------|-------|
| MP4 | .mp4 | Universal, recommended |
| MOV | .mov | Apple QuickTime |
| AVI | .avi | Legacy Windows format |
| MKV | .mkv | Open container format |
| WebM | .webm | Web-optimized |
| FLV | .flv | Flash video (legacy) |

### Performance Benchmarks

| Audio Duration | Free Plan | Pro Plan | Creator Plan |
|----------------|-----------|----------|--------------|
| 10 minutes | ~5 min | ~1 min | ~45 sec |
| 1 hour | ~30 min | ~6 min | ~4 min |
| 4 hours | ~2 hours | ~25 min | ~15 min |

*Processing times are estimates and may vary based on server load.

### Language Accuracy

| Language | Expected Accuracy | Notes |
|----------|-------------------|-------|
| English | 97-99% | Best support |
| Spanish | 96-98% | Excellent |
| French | 96-98% | Excellent |
| German | 95-98% | Excellent |
| Russian | 94-97% | Very good |
| Chinese | 93-96% | Good |
| Japanese | 92-95% | Good |
| Arabic | 91-95% | Good |

*Accuracy depends on audio quality, accents, and specialized terminology.

---

## Contact & Support

- **Website**: https://stt.speak-y.com
- **Email**: support@speak-y.com
- **Documentation**: https://stt.speak-y.com/api-reference
- **Health Check**: https://stt.speak-y.com/health
- **Twitter**: @SpeakYSTT

---

© 2024 Speak-Y STT. All rights reserved.