Models
Explore our state-of-the-art text-to-speech models
Dialog 1.0
Our flagship, feature-rich model for realistic, multi-speaker speech.
Play 3.0 Mini
Our fastest model for ultra-low latency applications.
Dialog 1.0 Turbo
Our fastest and best-quality model, optimized for speed while maintaining high quality, but with a more focused feature set.
Overview
The PlayAI API offers a range of text-to-speech models optimized for different use cases, quality levels, and performance requirements.
Model | Model ID | Description | Best For |
---|---|---|---|
Dialog 1.0 | PlayDialog | Our flagship model for realistic, multi-speaker speech with rich emotional expression | High-quality voice synthesis, multi-speaker conversations, emotional content |
Dialog 1.0 Turbo | PlayDialogTurbo | A faster version of Dialog 1.0 with a more focused feature set | Low-latency applications, real-time interactions, gaming |
Play 3.0 Mini | Play3.0-mini | Ultra-fast model optimized for real-time applications | Low-latency applications, real-time interactions, gaming |
Play 2.0 (Legacy) | Play2.0 | Previous generation model with high performance | General purpose text-to-speech applications |
Language Support
All non-Turbo models support the following languages: Afrikaans, Albanian, Amharic, Arabic, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Mandarin, Polish, Portuguese, Russian, Serbian, Spanish, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, and Xhosa. (Performance may vary across languages.)
At this time, Dialog 1.0 Turbo only supports English and Arabic.
Dialog 1.0
Dialog 1.0 is our most advanced speech synthesis model, designed for high-quality, emotionally-aware speech with native multi-speaker support. It produces natural, lifelike speech with rich emotional expression and contextual understanding across multiple languages.
The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent. It offers the most comprehensive emotion control among our models, allowing for precise emotional expression in the synthesized speech.
This model excels in scenarios requiring high-quality, emotionally nuanced speech:
- Multi-Speaker Conversations: Native support for multiple speakers in a single conversation.
- Character Voiceovers: Ideal for gaming and animation due to its emotional range.
- Professional Content: Well-suited for corporate videos and e-learning materials.
- Multilingual Projects: Maintains consistent voice quality across language switches.
Dialog 1.0 supports native pace adjustment, allowing you to control the speed of speech while maintaining natural prosody. While it has a higher latency than Play 3.0 Mini, it delivers superior quality for projects where lifelike speech and emotional expression are important.
Dialog 1.0 Turbo
Dialog 1.0 Turbo is a latency-optimized version of Dialog 1.0, while maintaining similar quality.
Dialog 1.0 Turbo has a more focused feature set compared to Dialog 1.0, with support for English and Arabic languages only. It currently works with a curated set of pre-built voices and does not support custom voice cloning.
Key characteristics:
- Fast and High Quality: Optimized for minimal latency while maintaining Dialog 1.0’s quality.
- Limited Language Support: Currently supports English and Arabic only.
- Pre-built Voices: Works with a select set of high-quality pre-built voices.
- Streamlined Features: Focused feature set for maximum performance.
This model is ideal for applications that:
- Need Dialog 1.0’s quality with faster processing.
- Work primarily with English or Arabic content.
- Require consistent, high-quality pre-built voices.
- Need reliable, production-ready speech synthesis.
For detailed API information and available voices, see the Dialog 1.0 Turbo endpoint documentation.
Play 3.0 Mini
Play 3.0 Mini is our fastest speech synthesis model, designed for real-time applications and ultra-low latency scenarios. It delivers high-quality speech with minimal delay (~50ms) while maintaining natural-sounding output across all supported languages.
The model balances speed and quality, making it ideal for interactive applications while maintaining consistent voice characteristics. Like Dialog 1.0, it supports native pace adjustment, allowing for flexible speech rate control.
This model is particularly well-suited for:
- Real-time Applications: Perfect for live voice interactions and chatbots.
- Interactive Gaming: Ideal for games requiring immediate voice feedback.
- Low-latency Systems: Efficient for applications where speed is critical.
- Large-Scale Processing: Cost-effective for bulk text-to-speech conversion.
With its lower latency and competitive pricing, Play 3.0 Mini is the optimal choice for applications requiring fast, reliable speech synthesis without compromising on quality.
Play 2.0 (Legacy)
Play 2.0 is our fastest speech synthesis model, delivering blazing-fast performance for real-time applications. While it may not have the advanced features of Dialog 1.0, it provides ultra-low latency speech synthesis that’s perfect for applications requiring immediate response times.
This model is well-suited for:
- Ultra-Low Latency Applications: Perfect for real-time voice interactions and immediate feedback.
- High-Performance Systems: Ideal for applications where speed is the top priority.
- Basic Text-to-Speech Applications: Reliable for standard use cases with maximum performance.
- Simple Voice Applications: Effective for straightforward voice synthesis needs with minimal delay.
While Play 2.0 continues to be supported, we recommend using Dialog 1.0 or Play 3.0 Mini for new applications to take advantage of their enhanced features and capabilities.
Was this page helpful?