Text-to-Speech Real-time HTTP streaming - PlayDialog
Streams the audio bytes with out ultra-fast text-in, audio-out API. Our HTTP streaming endpoint allows you to send text and receive audio bytes in real-time.
Check out the How to use PlayDialog Text-to-Speech API guide for a step-by-step approach to using the PlayDialog API to convert text into natural human-like sounding audio.
Make sure to see the Create a Multi-Turn Scripted Conversation with the PlayDialog API guide for examples on how to create a multi-turn scripted conversation between two distinct speakers.
Get your Credentials
To use the HTTP API you will need an API Key and a User Id, you can easily get those here.
Explore our models: Play3.0-mini
and PlayDialog
Our API currently supports two models: Play3.0-mini
and PlayDialog
.
Use the model
parameter to select the model you want to use.
Play3.0-mini
is a lightweight model that generates high-quality audio with a focus on speed.
PlayDialog
is a more advanced model that can generate turn-based dialogues with multiple voices.
For details on the specific properties of each, see the examples below.
Example
For code examples, see the interactive code snippets to the right. The provided examples will return an audio buffer stream that you can use to save locally or stream over the network to a browser, app, or telephony system.
For the complete list of supported parameters, see below.
Authorizations
API key required for this endpoint. Use Bearer YOUR_SECRET_API_KEY
. Get your key from https://play.ai/developers.
User ID required for this endpoint. Get it from https://play.ai/developers.
Body
The voice engine used to synthesize the voice.
PlayDialog
The text to be converted to speech. Limited to 50k characters. See the Create a Multi-Turn Scripted Conversation with the PlayDialog API for instructions on how to best explore our multi-turn capabilities.
The unique ID for a PlayAI Voice to be used. See voice2
for multi-turn dialogue generations.
The unique ID for a PlayAI Voice to be used as second character on multi-turn dialogue generations. See the Create a Multi-Turn Scripted Conversation with the PlayDialog API for instructions on how to best explore our multi-turn capabilities.
The format for the output audio.
mp3
, mulaw
, raw
, wav
, ogg
, flac
Control how fast the generated audio should be. A number greater than 0 and less than or equal to 5.0
A number greater than or equal to 8000, and must be less than or equal to 48000
An integer number greater than or equal to 0. If equal to null
or not provided, a random seed will be used. Useful to control the reproducibility of the generated audio. Assuming all other properties didn't change, a fixed seed should always generate the exact same audio file.
A floating point number between 0, inclusive, and 2, inclusive. If equal to null
or not provided, the model's default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.
The prefix to indicate the start of a turn in a multi-turn dialogue with voice
.
The prefix to indicate the start of a turn in a multi-turn dialogue with voiceId2
.
The prompt to be used for the PlayDialog
model with voice
.
The prompt to be used for the PlayDialog
model with voiceId2
.
The number of seconds of conditioning to use from the selected voice
. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.
The number of seconds of conditioning to use from the selected voice2
. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.
Response
The response is of type file
.