Ai Audio (30)

Amazon Polly- AWS text-to-speech service with 60+ voices across 29 languages, offering both streaming and asynchronous speech synthesis with SSML support.
AssemblyAI- Developer-friendly speech-to-text API with Universal models for streaming and batch transcription, offering high accuracy with real-time capabilities and structured outputs.
Cartesia- Real-time AI voice platform with Sonic models for text-to-speech featuring ultra-low latency (40ms TTFA), emotional control, and multilingual support across 42 languages.
Coqui XTTS- Open-source massively multilingual zero-shot text-to-speech model enabling voice cloning across 16+ languages.
Deepgram- High-performance speech-to-text and text-to-speech API with proprietary models (Nova, Flux) designed for real-time voice agents and transcription with sub-300ms latency.
Descript- Audio and video editing platform with AI text-to-speech capabilities for content creation and transcription.
ElevenLabs- Leading AI audio platform offering text-to-speech, speech-to-text, voice cloning, and voice agents with multilingual support across 70+ languages and 1000+ voices.
FakeYou- A voice synthesis platform that generates AI voices and enables voice cloning capabilities.
Fish Audio- A speech-to-text API provider offering high accuracy transcription with streaming and batch modes, supporting 50+ languages with speaker diarization and custom vocabulary features.
Gladia- State-of-the-art speech-to-text API supporting 100+ languages with real-time streaming transcription under 300ms latency and advanced audio intelligence features.
Google Cloud Speech-to-Text- Enterprise-grade speech recognition API supporting 125+ languages with streaming and batch transcription, powered by Google's Chirp models.
Google Cloud Text-to-Speech- Neural text-to-speech API with 380+ voices across 75+ languages using WaveNet and Neural2 technology for natural-sounding speech synthesis.
Hume- An AI platform offering Octave voice models for text-to-speech with empathetic expression capabilities.
Inworld AI- Ranks #1 on independent TTS benchmarks with TTS-1.5 Max model offering sub-250ms latency and exceptional voice quality at competitive pricing ($10/1M characters).
Microsoft Azure Speech- Azure speech services providing speech-to-text, text-to-speech, and speech translation across 100+ languages with enterprise-grade reliability and compliance.
MiniMax- Asian-optimized AI speech platform with Speech-02 model supporting 40+ languages and 300+ voices, designed for real-time voice agents with sub-250ms latency.
Murf.ai- A text-to-speech platform with voice editing studio capabilities for creating professional marketing content.
PlayHT- AI voice generation platform with 900+ human-like voices across 142 languages, offering both web studio and API for text-to-speech synthesis with voice cloning.
PlayHT 2.0- AI voice generation platform focused on conversational speech patterns with support for 142 languages and integrated podcast hosting.
Resemble AI- Specialized voice cloning and synthesis platform with rapid voice cloning from 30 seconds of audio, offering TTS and speech-to-speech capabilities with API access.
Rev AI- Speech-to-text API trained on 3M+ hours of human-transcribed audio, offering both asynchronous and streaming transcription with industry-leading accuracy (99% with human option).
Rime- A text-to-speech API that trains models on real-world conversations to produce natural-sounding voices with conversational prosody and sub-100ms latency.
Smallest.ai- Budget-friendly text-to-speech platform offering minute-based pricing for standard TTS and voice cloning services.
Soniox- Real-time speech-to-text and translation API supporting 60+ languages with low latency, designed for live applications, voice agents, and streaming analytics.
Speechify- An accessibility-focused text-to-speech tool designed for users with reading difficulties and visual impairments.
Speechmatics- Enterprise speech technology platform with speech-to-text, text-to-speech, and voice agents across 55+ languages, supporting cloud, on-premises, and hybrid deployments.
Synthesia- AI video generation platform with AI avatars and text-to-speech voiceovers in 160+ languages, enabling creation of studio-quality videos from scripts in minutes.
TTSMaker- Free online text-to-speech converter that generates speech in 20+ languages with no registration required.
Voice Creator Pro- Desktop AI text-to-speech software offering offline voice cloning with unlimited generations and commercial license included.
WellSaid Labs- Premium text-to-speech platform with 120+ licensed-actor voices delivering natural, expressive speech synthesis with API and Adobe Creative Suite integrations.