# Speak-Y STT - Complete Documentation > Speak-Y STT is an AI-powered speech-to-text transcription service that transforms audio and video files into accurate text. Built on advanced Whisper AI technology, it offers speaker diarization, support for 57+ transcription languages, 12 interface languages, smart summaries, and seamless integrations. --- ## Table of Contents 1. [Overview](#overview) 2. [Features](#features) 3. [Getting Started](#getting-started) 4. [API Reference](#api-reference) 5. [Pricing](#pricing) 6. [FAQ](#faq) 7. [Integration Examples](#integration-examples) 8. [Technical Specifications](#technical-specifications) --- ## Overview Speak-Y STT is a modern web-based transcription platform designed for professionals who need reliable, accurate audio-to-text conversion. Whether you're a journalist transcribing interviews, a podcaster creating show notes, or a developer building voice-enabled applications, Speak-Y STT provides the tools you need. ### Why Choose Speak-Y STT? - **Industry-Leading Accuracy**: Up to 99% accuracy using advanced Whisper AI models - **Lightning Fast**: Process audio up to 10x real-time speed - **Secure**: Enterprise-grade encryption for all uploads and transcripts - **Affordable**: Generous free tier and competitive paid plans - **Developer Friendly**: Comprehensive API with webhooks and SDK support --- ## Features ### Core Transcription **AI-Powered Engine** Our transcription engine is built on OpenAI's Whisper architecture, fine-tuned for optimal performance across diverse audio conditions. The system handles: - Background noise and music - Multiple accents and dialects - Technical terminology - Low-quality audio (phone recordings, compressed files) **Speaker Diarization** Automatic speaker detection identifies who said what in multi-speaker recordings: - Supports unlimited speakers - Timestamps for each speaker turn - Speaker labels customizable post-transcription - Works with interviews, meetings, podcasts, and phone calls **Language Support** Transcribe in 57+ languages including: - **European**: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Greek, Turkish - **Asian**: Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Malay, Tamil, Kannada - **Middle Eastern**: Arabic, Hebrew, Persian, Urdu - **Other**: Swahili, Afrikaans, Welsh, Maori, and more Auto-detection automatically identifies the spoken language, or you can specify it manually for faster processing. **Interface Languages (12)** The user interface is fully localized in: English, Spanish, Russian, French, German, Italian, Chinese, Japanese, Korean, Portuguese, Arabic, Hindi. ### Smart Features **AI Summaries** Automatically generate: - Executive summaries (1-2 paragraphs) - Key points and action items - Chapter markers with timestamps - Topic segmentation **Custom Dictionary** Improve accuracy for specialized content: - Add technical terms, acronyms, and jargon - Include proper nouns (names, companies, products) - Import dictionary from CSV - Share dictionaries across team members **Smart Search** (Pro/Creator) Search within your transcripts: - Full-text search across all transcripts - Timestamp-linked results - Boolean operators (AND, OR, NOT) - Phrase matching with quotes ### Export Options Export transcripts in multiple formats: | Format | Description | Use Case | |--------|-------------|----------| | TXT | Plain text | Simple documentation | | SRT | SubRip subtitle | Video subtitles | | VTT | WebVTT | Web video players | | DOCX | Microsoft Word | Professional documents | | JSON | Structured data | Programmatic access | | PDF | Portable Document | Archiving | ### Integrations **YouTube Integration** - Paste any YouTube URL to transcribe - Supports videos up to 4 hours (Creator plan) - Automatic language detection - Generate subtitles for your uploads **Webhook Notifications** - Receive notifications when transcription completes - Integrate with your own systems - Automate workflows programmatically via API --- ## Getting Started ### Quick Start (5 minutes) 1. **Sign Up**: Create a free account at https://stt.speak-y.com 2. **Upload**: Drag and drop your audio/video file or paste a YouTube URL 3. **Configure**: Select language (or use auto-detect) and enable speaker diarization if needed 4. **Process**: Click "Process" and wait for the transcription to complete 5. **Export**: Download your transcript in your preferred format ### Best Practices for Optimal Results **Audio Quality** - Use recordings at 16kHz sample rate or higher - Minimize background noise when possible - Ensure speakers are close to the microphone - Use mono recordings for single speakers, stereo for multiple **File Preparation** - Trim silence from beginning and end - Split very long recordings (4+ hours) for faster processing - Convert unusual formats to MP3 or WAV before upload --- ## API Reference ### Authentication All API requests require authentication via API key in the header: ``` Authorization: Bearer YOUR_API_KEY ``` Get your API key from the Dashboard > Settings > API Keys. ### Base URL ``` https://api.speak-y.com/v1 ``` ### Endpoints #### POST /transcriptions Create a new transcription job. **Request Body (multipart/form-data)**: ```json { "file": "", "language": "en", // optional, auto-detect if omitted "diarization": true, // enable speaker detection "summary": true, // generate AI summary "webhook_url": "https://your-server.com/webhook" // optional callback } ``` **Request Body (JSON for URL)**: ```json { "url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en", "diarization": true, "summary": true } ``` **Response**: ```json { "id": "tr_abc123", "status": "processing", "created_at": "2024-12-11T10:30:00Z", "estimated_completion": "2024-12-11T10:35:00Z" } ``` #### GET /transcriptions/{id} Get transcription status and results. **Response (completed)**: ```json { "id": "tr_abc123", "status": "completed", "duration_seconds": 1847, "language": "en", "text": "Full transcript text...", "segments": [ { "start": 0.0, "end": 4.5, "text": "Hello and welcome to the show.", "speaker": "SPEAKER_01" } ], "summary": "This episode discusses...", "chapters": [ {"start": 0, "title": "Introduction"}, {"start": 120, "title": "Main Topic"} ], "download_urls": { "txt": "https://...", "srt": "https://...", "json": "https://..." } } ``` #### GET /transcriptions List all transcriptions. **Query Parameters**: - `page` (int): Page number, default 1 - `per_page` (int): Items per page, default 20, max 100 - `status` (string): Filter by status (processing, completed, failed) - `from` (ISO date): Filter by creation date - `to` (ISO date): Filter by creation date #### DELETE /transcriptions/{id} Delete a transcription and associated files. #### GET /usage Get current usage statistics. **Response**: ```json { "plan": "pro", "period_start": "2024-12-01", "period_end": "2024-12-31", "minutes_used": 450, "minutes_limit": 1500, "files_processed": 23, "storage_used_mb": 1250 } ``` ### Webhooks When a webhook URL is provided, we'll POST to it when transcription completes: ```json { "event": "transcription.completed", "transcription_id": "tr_abc123", "status": "completed", "timestamp": "2024-12-11T10:35:00Z" } ``` Events: `transcription.completed`, `transcription.failed` ### Rate Limits | Plan | Requests/Hour | Concurrent Jobs | |------|---------------|-----------------| | Free | 10 | 1 | | Pro | 100 | 5 | | Creator | 500 | 20 | ### Error Codes | Code | Description | |------|-------------| | 400 | Bad Request - Invalid parameters | | 401 | Unauthorized - Invalid or missing API key | | 403 | Forbidden - Feature not available on your plan | | 404 | Not Found - Resource doesn't exist | | 413 | File Too Large - Exceeds plan limit | | 429 | Too Many Requests - Rate limit exceeded | | 500 | Server Error - Contact support | --- ## Pricing ### Free Plan - $0/month Perfect for trying out the service and light usage. | Feature | Limit | |---------|-------| | Monthly minutes | 300 | | Per-file limit | 60 minutes | | Files per month | 20 | | Storage | 7 days | | Max file size | 500 MB | | Noise reduction | Basic | | Speaker detection | ✓ | | Export formats | TXT, SRT | | API access | ✗ | ### Pro Plan - $19/month Ideal for content creators and professionals. | Feature | Limit | |---------|-------| | Monthly minutes | 1,500 (25 hours) | | Per-file limit | 4 hours | | Files per month | 300 | | Storage | 30 days | | Max file size | 1.5 GB | | Noise reduction | Advanced | | Priority processing | ✓ | | All export formats | ✓ | | AI chapters | ✓ | | Smart Search | 100 queries/month | | Custom dictionary | 100 terms | | API access | ✓ | ### Creator Plan - $39/month For power users and businesses. | Feature | Limit | |---------|-------| | Monthly minutes | 6,000 (100 hours) | | Per-file limit | Unlimited | | Files per month | Unlimited | | Storage | 90 days | | Max file size | 2 GB | | Noise reduction | Premium | | Processing speed | Fastest | | Translation | 50+ languages | | Smart Search | 500 queries/month | | Custom dictionary | Unlimited | | API access | Full | | Priority support | ✓ | ### Enterprise Custom solutions for large organizations. Contact sales@speak-y.com. --- ## FAQ ### General Questions **Q: What audio formats do you support?** A: We support MP3, WAV, M4A, FLAC, OGG, AAC, WMA for audio, and MP4, MOV, AVI, MKV, WebM, FLV for video. **Q: How long does transcription take?** A: Processing time depends on your plan and server load. Pro and Creator plans process at approximately 10x real-time speed (a 60-minute file takes ~6 minutes). Free plan may take longer during peak hours. **Q: Is my content secure?** A: Yes. All uploads are encrypted in transit (TLS 1.3) and at rest (AES-256). Files are automatically deleted after the retention period. We never share your content with third parties. **Q: Can I cancel my subscription anytime?** A: Yes. You can cancel at any time from your account settings. You'll retain access until the end of your billing period. ### Accuracy Questions **Q: How accurate is the transcription?** A: Accuracy varies based on audio quality, accents, and background noise. For clear audio in supported languages, expect 95-99% accuracy. Use Custom Dictionary to improve accuracy for specialized terminology. **Q: Why are some words transcribed incorrectly?** A: Common causes include: - Background noise or music - Multiple people speaking simultaneously - Heavy accents or dialects - Technical jargon not in our training data Solution: Use Custom Dictionary to add problematic words. ### Technical Questions **Q: What's the maximum file size?** A: 500 MB (Free), 1.5 GB (Pro), 2 GB (Creator). For larger files, contact us about Enterprise plans or split your file. **Q: Do you support real-time transcription?** A: Currently we support file-based and URL-based transcription. Real-time streaming is on our roadmap. **Q: Can I use the API for commercial applications?** A: Yes, Pro and Creator plans include commercial API usage rights. --- ## Integration Examples ### Python ```python import requests API_KEY = "your_api_key" BASE_URL = "https://api.speak-y.com/v1" # Upload a file def transcribe_file(file_path, language="auto"): headers = {"Authorization": f"Bearer {API_KEY}"} with open(file_path, "rb") as f: response = requests.post( f"{BASE_URL}/transcriptions", headers=headers, files={"file": f}, data={"language": language, "diarization": "true"} ) return response.json() # Transcribe YouTube video def transcribe_youtube(url): headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } response = requests.post( f"{BASE_URL}/transcriptions", headers=headers, json={"url": url, "diarization": True} ) return response.json() # Get results def get_transcription(transcription_id): headers = {"Authorization": f"Bearer {API_KEY}"} response = requests.get( f"{BASE_URL}/transcriptions/{transcription_id}", headers=headers ) return response.json() ``` ### JavaScript/Node.js ```javascript const axios = require('axios'); const FormData = require('form-data'); const fs = require('fs'); const API_KEY = 'your_api_key'; const BASE_URL = 'https://api.speak-y.com/v1'; // Upload a file async function transcribeFile(filePath, language = 'auto') { const form = new FormData(); form.append('file', fs.createReadStream(filePath)); form.append('language', language); form.append('diarization', 'true'); const response = await axios.post( `${BASE_URL}/transcriptions`, form, { headers: { 'Authorization': `Bearer ${API_KEY}`, ...form.getHeaders() } } ); return response.data; } // Transcribe YouTube video async function transcribeYouTube(url) { const response = await axios.post( `${BASE_URL}/transcriptions`, { url, diarization: true }, { headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' } } ); return response.data; } // Poll for results async function waitForTranscription(id, maxWait = 600000) { const startTime = Date.now(); while (Date.now() - startTime < maxWait) { const response = await axios.get( `${BASE_URL}/transcriptions/${id}`, { headers: { 'Authorization': `Bearer ${API_KEY}` } } ); if (response.data.status === 'completed') { return response.data; } if (response.data.status === 'failed') { throw new Error('Transcription failed'); } await new Promise(r => setTimeout(r, 5000)); } throw new Error('Timeout waiting for transcription'); } ``` ### cURL ```bash # Upload a file curl -X POST https://api.speak-y.com/v1/transcriptions \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@/path/to/audio.mp3" \ -F "language=en" \ -F "diarization=true" # Transcribe YouTube URL curl -X POST https://api.speak-y.com/v1/transcriptions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "diarization": true}' # Get transcription status curl https://api.speak-y.com/v1/transcriptions/tr_abc123 \ -H "Authorization: Bearer YOUR_API_KEY" # Download as SRT curl https://api.speak-y.com/v1/transcriptions/tr_abc123/download?format=srt \ -H "Authorization: Bearer YOUR_API_KEY" \ -o subtitles.srt ``` --- ## Technical Specifications ### Supported Formats **Audio** | Format | Extension | Notes | |--------|-----------|-------| | MP3 | .mp3 | Most common, good compression | | WAV | .wav | Lossless, larger files | | M4A | .m4a | Apple format, good quality | | FLAC | .flac | Lossless compression | | OGG | .ogg | Open source format | | AAC | .aac | High quality, small size | | WMA | .wma | Windows Media Audio | **Video** | Format | Extension | Notes | |--------|-----------|-------| | MP4 | .mp4 | Universal, recommended | | MOV | .mov | Apple QuickTime | | AVI | .avi | Legacy Windows format | | MKV | .mkv | Open container format | | WebM | .webm | Web-optimized | | FLV | .flv | Flash video (legacy) | ### Performance Benchmarks | Audio Duration | Free Plan | Pro Plan | Creator Plan | |----------------|-----------|----------|--------------| | 10 minutes | ~5 min | ~1 min | ~45 sec | | 1 hour | ~30 min | ~6 min | ~4 min | | 4 hours | ~2 hours | ~25 min | ~15 min | *Processing times are estimates and may vary based on server load. ### Language Accuracy | Language | Expected Accuracy | Notes | |----------|-------------------|-------| | English | 97-99% | Best support | | Spanish | 96-98% | Excellent | | French | 96-98% | Excellent | | German | 95-98% | Excellent | | Russian | 94-97% | Very good | | Chinese | 93-96% | Good | | Japanese | 92-95% | Good | | Arabic | 91-95% | Good | *Accuracy depends on audio quality, accents, and specialized terminology. --- ## Contact & Support - **Website**: https://stt.speak-y.com - **Email**: support@speak-y.com - **Documentation**: https://stt.speak-y.com/api-reference - **Health Check**: https://stt.speak-y.com/health - **Twitter**: @SpeakYSTT --- © 2024 Speak-Y STT. All rights reserved.