Text-to-Speech Real-time HTTP streaming - Play3.0-mini
Streams the audio bytes with out ultra-fast text-in, audio-out API. Our HTTP streaming endpoint allows you to send text and receive audio bytes in real-time.
Play3.0-mini
is a lightweight model that generates high-quality audio with a focus on speed.
Check out the How to use PlayDialog Text-to-Speech API guide for a step-by-step approach to using the PlayDialog API to convert text into natural human-like sounding audio.
Get your Credentials
To use the HTTP API you will need an API Key and a User Id, you can easily get those here.
Explore our models: Play3.0-mini
and PlayDialog
Our API currently supports two models: Play3.0-mini
and PlayDialog
.
Use the model
parameter to select the model you want to use.
Play3.0-mini
is a lightweight model that generates high-quality audio with a focus on speed.
PlayDialog
is a more advanced model that can generate turn-based dialogues with multiple voices.
For details on the specific properties of each, see the examples below.
Example
For code examples, see the interactive code snippets to the right. The provided examples will return an audio buffer stream that you can use to save locally or stream over the network to a browser, app, or telephony system.
For the complete list of supported parameters, see below.
Authorizations
API key required for this endpoint. Use Bearer YOUR_SECRET_API_KEY
. Get your key from https://play.ai/developers.
User ID required for this endpoint. Get it from https://play.ai/developers.
Body
The voice engine used to synthesize the voice.
Play3.0-mini
The text to be converted to speech. Limited to 20k characters.
The unique ID for a PlayAI Voice.
draft
, low
, medium
, high
, premium
The format for the output audio.
mp3
, mulaw
, raw
, wav
, ogg
, flac
Control how fast the generated audio should be. A number greater than 0 and less than or equal to 5.0
0.1 < x < 5
A number greater than or equal to 8000, and must be less than or equal to 48000
8000 < x < 48000
An integer number greater than or equal to 0. If equal to null
or not provided, a random seed will be used. Useful to control the reproducibility of the generated audio. Assuming all other properties didn't change, a fixed seed should always generate the exact same audio file.
x > 0
A floating point number between 0, inclusive, and 2, inclusive. If equal to null
or not provided, the model's default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.
0 < x < 2
A number between 1 and 6. Use lower numbers to reduce how unique your chosen voice will be compared to other voices. Higher numbers will maximize its individuality.
1 < x < 2
A number between 1 and 30. Use lower numbers to to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance.
1 < x < 10
A number between 1 and 2. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text.
1 < x < 2
The language of the voice.
afrikaans
, albanian
, amharic
, arabic
, bengali
, bulgarian
, catalan
, croatian
, czech
, danish
, dutch
, english
, french
, galician
, german
, greek
, hebrew
, hindi
, hungarian
, indonesian
, italian
, japanese
, korean
, malay
, mandarin
, polish
, portuguese
, russian
, serbian
, spanish
, swedish
, tagalog
, thai
, turkish
, ukrainian
, urdu
, xhosa
Response
The response is of type file
.