POST
/
api
/
v1
/
tts
curl --request POST \
  --url https://api.play.ai/api/v1/tts \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'X-USER-ID: <api-key>' \
  --data '{
  "model": "Play3.0-mini",
  "text": "Country Mouse: Welcome to my humble home, cousin! Town Mouse: Thank you, cousin. It'\''s quite... peaceful here. Country Mouse: It is indeed. I hope you'\''re hungry. I'\''ve prepared a simple meal of beans, barley, and fresh roots. Town Mouse: Well, it'\''s... earthy. Do you eat this every day?",
  "voice": "s3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json",
  "voice2": "s3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json",
  "quality": "draft",
  "outputFormat": "mp3",
  "speed": 1,
  "sampleRate": 24000,
  "seed": null,
  "temperature": null,
  "voiceGuidance": null,
  "styleGuidance": null,
  "textGuidance": 1,
  "turnPrefix": "Country Mouse:",
  "turnPrefix2": "Town Mouse:",
  "prompt": "<string>",
  "prompt2": "<string>",
  "voiceConditioningSeconds": 20,
  "voiceConditioningSeconds2": 20,
  "language": "english",
  "webHookUrl": "<string>"
}'
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "createdAt": "2023-11-07T05:31:56Z",
  "input": {},
  "completedAt": "2023-11-07T05:31:56Z",
  "output": {
    "status": "COMPLETED",
    "url": "<string>",
    "contentType": "<string>",
    "fileSize": 123,
    "duration": 123
  }
}

Convert text to speech with our top-of-the-line PlayAI models.

This endpoint supports two models:

  • Play 3.0 Mini: Our fast and efficient model for single-voice text-to-speech.
  • Dialog 1.0: Our flagship model with best quality and multi-turn dialogue capabilities.

We also offer Dialog 1.0 Turbo which is a faster version of Dialog 1.0 from a separate endpoint.

For more information, see Models.

Check out the How to use Dialog 1.0 Text-to-Speech API guide for a step-by-step approach to using the Dialog 1.0 API to convert text into natural human-like sounding audio.

Make sure to see the Create a Multi-Turn Scripted Conversation with the Dialog 1.0 API guide for examples on how to create a multi-turn scripted conversation between two distinct speakers.

Authorizations

Authorization
string
header
required

Your secret API key from PlayAI, formatted as Bearer YOUR_SECRET_API_KEY.

X-USER-ID
string
header
required

Your unique user ID from PlayAI.

Body

application/json
model
enum<string>
required

The voice engine used to synthesize the voice. Must be either Play3.0-mini or PlayDialog.

Available options:
Play3.0-mini,
PlayDialog
Example:

"Play3.0-mini"

text
string
required

The text to be converted to speech. Limited to 20k characters for Play3.0-mini, 50k characters for PlayDialog.

Example:

"Country Mouse: Welcome to my humble home, cousin! Town Mouse: Thank you, cousin. It's quite... peaceful here. Country Mouse: It is indeed. I hope you're hungry. I've prepared a simple meal of beans, barley, and fresh roots. Town Mouse: Well, it's... earthy. Do you eat this every day?"

voice
string
required

The unique ID for a PlayAI Voice to be used. See voice2 for multi-turn dialogue generations with Dialog 1.0.

Example:

"s3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json"

voice2
string

The unique ID for a PlayAI Voice to be used as second character on multi-turn dialogue generations. Only used with PlayDialog model.

Example:

"s3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json"

quality
enum<string>

The quality of the generated audio. Only used with Play3.0-mini model.

Available options:
draft,
low,
medium,
high,
premium
outputFormat
enum<string> | null
default:mp3

The format for the output audio.

Available options:
mp3,
mulaw,
raw,
wav,
ogg,
flac
speed
number

Control how fast the generated audio should be. A number greater than 0 and less than or equal to 5.0. Defaults to 1.

Required range: 0.1 <= x <= 5
Example:

1

sampleRate
number

A number greater than or equal to 8000, and must be less than or equal to 48000. Defaults to 24000.

Required range: 8000 <= x <= 48000
Example:

24000

seed
number | null

An integer number greater than or equal to 0. If equal to null or not provided, a random seed will be used. Useful to control the reproducibility of the generated audio. Assuming all other properties didn't change, a fixed seed should always generate the exact same audio file.

Required range: x >= 0
Example:

null

temperature
number | null

A floating point number between 0, inclusive, and 2, inclusive. If equal to null or not provided, the model's default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.

Required range: 0 <= x <= 2
Example:

null

voiceGuidance
number | null

A number between 1 and 6. Only used with Play3.0-mini model. Use lower numbers to reduce how unique your chosen voice will be compared to other voices. Higher numbers will maximize its individuality.

Required range: 1 <= x <= 2
Example:

null

styleGuidance
number | null

A number between 1 and 30. Only used with Play3.0-mini model. Use lower numbers to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance.

Required range: 1 <= x <= 10
Example:

null

textGuidance
number | null

A number between 1 and 2. Only used with Play3.0-mini model. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text.

Required range: 1 <= x <= 2
Example:

1

turnPrefix
string | null

The prefix to indicate the start of a turn in a multi-turn dialogue with voice. Only used with PlayDialog model.

Example:

"Country Mouse:"

turnPrefix2
string | null

The prefix to indicate the start of a turn in a multi-turn dialogue with voice2. Only used with PlayDialog model.

Example:

"Town Mouse:"

prompt
string | null

The prompt to be used for the PlayDialog model with voice. Only used with PlayDialog model.

prompt2
string | null

The prompt to be used for the PlayDialog model with voice2. Only used with PlayDialog model.

voiceConditioningSeconds
number | null
default:20

The number of seconds of conditioning to use from the selected voice. Only used with PlayDialog model. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.

Example:

20

voiceConditioningSeconds2
number | null
default:20

The number of seconds of conditioning to use from the selected voice2. Only used with PlayDialog model. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.

Example:

20

language
enum<string> | null
default:english

The language of the voices. Defaults to english.

Available options:
afrikaans,
albanian,
amharic,
arabic,
bengali,
bulgarian,
catalan,
croatian,
czech,
danish,
dutch,
english,
french,
galician,
german,
greek,
hebrew,
hindi,
hungarian,
indonesian,
italian,
japanese,
korean,
malay,
mandarin,
polish,
portuguese,
russian,
serbian,
spanish,
swedish,
tagalog,
thai,
turkish,
ukrainian,
urdu,
xhosa
Example:

"english"

webHookUrl
string | null

A URL to send a POST request to when the job is completed. The request will contain the job's details.

Response

201
application/json
The TTS job created.
id
string
required

The unique identifier for the TTS job.

createdAt
string
required

The timestamp when the job was created.

input
object
required

The parameters used to create the job.

completedAt
string | null
required

The timestamp when the job was completed.

output
object
required

The job's output. Contains the following fields when completed: status, url, contentType, fileSize, duration.