This guide provides a step-by-step approach to using the PlayAI’s Dialog 1.0 model to create a multi-turn scripted conversation between two distinct speakers.

Overview

This is an async API endpoint.
You will make a request to trigger podcast generation.
You will then request another endpoint to see if the podcast is ready.

Prerequisites

Access your credentials (secret key and user ID).
Development environment for your chosen programming language.

Steps

Set up environment variables

Choose your operating system and set up the environment variables:

echo 'export PLAYAI_API_KEY="your_api_key_here"' >> ~/.zshrc
echo 'export PLAYAI_USER_ID="your_user_id_here"' >> ~/.zshrc
source ~/.zshrc

Create a new script

Create a new file and add the following code:

import requests
import os
import time

# Set up headers with your API secrety key and user ID
user_id = os.getenv("PLAYAI_USER_ID")
secret_key = os.getenv("PLAYAI_API_KEY")

headers = {
    'X-USER-ID': user_id,
    'Authorization': secret_key,
    'Content-Type': 'application/json',
}

Configure the model and voices

Define the model and select voices for your hosts:

# define the model
model = 'PlayDialog'

# define voices for the 2 hosts
# find all voices here https://docs.play.ai/tts-api-reference/voices
voice_1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json'
voice_2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json'

Add your podcast transcript

Add your scripted conversation in the correct format:

# podcast transcript should be in the format of Host 1: ... Host 2:
transcript = """
Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
"""

Configure the API payload

Set up the payload with your configuration:

payload = {
    'model': model,
    'text': transcript,
    'voice': voice_1,
    'voice2': voice_2,
    'turnPrefix': 'Host 1:',
    'turnPrefix2': 'Host 2:',
    'outputFormat': 'mp3',
}

Send the request and monitor progress

Add the code to send the request and check the status:

# Send the POST request to trigger podcast generation
response = requests.post('https://api.play.ai/api/v1/tts/', headers=headers, json=payload)

# get the job id to check the status
job_id = response.json().get('id')

# use the job id to check completion status
url = f'https://api.play.ai/api/v1/tts/{job_id}'
delay_seconds = 2

# keep checking until status is COMPLETED.
# longer transcripts take more time to complete.
while True:
    response = requests.get(url, headers=headers)

    if response.ok:
        status = response.json().get('output', {}).get('status')
        print(status)
        if status == 'COMPLETED':
            # once completed audio url will be avaialable
            podcast_audio = response.json().get('output', {}).get('url')
            break

    time.sleep(delay_seconds)

print(podcast_audio)

Complete Code

import requests
import os
import time

# Set up headers with your API secrety key and user ID
user_id = os.getenv("PLAYAI_USER_ID")
secret_key = os.getenv("PLAYAI_API_KEY")

headers = {
    'X-USER-ID': user_id,
    'Authorization': secret_key,
    'Content-Type': 'application/json',
}

# define the model
model = 'PlayDialog'

# define voices for the 2 hosts
# find all voices here https://docs.play.ai/tts-api-reference/voices
voice_1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json'
voice_2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json'

# podcast transcript should be in the format of Host 1: ... Host 2:
transcript = """
Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
"""

payload = {
    'model': model,
    'text': transcript,
    'voice': voice_1,
    'voice2': voice_2,
    'turnPrefix': 'Host 1:',
    'turnPrefix2': 'Host 2:',
    'outputFormat': 'mp3',
}

# Send the POST request to trigger podcast generation
response = requests.post('https://api.play.ai/api/v1/tts/', headers=headers, json=payload)

# get the job id to check the status
job_id = response.json().get('id')

# use the job id to check completion status
url = f'https://api.play.ai/api/v1/tts/{job_id}'
delay_seconds = 2

# keep checking until status is COMPLETED.
# longer transcripts take more time to complete.
while True:
    response = requests.get(url, headers=headers)

    if response.ok:
        status = response.json().get('output', {}).get('status')
        print(status)
        if status == 'COMPLETED':
            # once completed audio url will be avaialable
            podcast_audio = response.json().get('output', {}).get('url')
            break

    time.sleep(delay_seconds)

print(podcast_audio)

Key API Parameters

The following API payload define the conversation, speaker details, and audio generation options:

model: Specifies the PlayAI’s Dialog 1.0 model to be used. Here, PlayDialog supports multi-turn conversation generation.
text: Contains the scripted conversation, with each turn prefixed by the speaker’s name (e.g., "Country Mouse" & "Town Mouse").
voice: URL path to the voice manifest for the first speaker.
voice_2: URL path to the voice manifest for the second speaker.
turn_prefix / turn_prefix_2: Used to specify each speaker’s dialogue turns within the text field. For example: turn_prefix says Country Mouse to indicate the position where Speaker 1’s dialogue and turn_prefix_2 says Town Mouse that indicates the position where Speaker 2’s dialogue parts are.
output_format: Format for the generated audio file, typically wav or mp3.

If you happen to save the code as country_mouse.py then Run the code using python3 country_mouse.py pointing your terminal to the directory where the country_mouse.py file is stored. This will save the dialogue.wav in the same working directory.

Code Explanation

This script uses the Dialog 1.0 model to generate a multi-turn conversation between two characters. The AUTHORIZATION token and X-USER-ID are required for authentication, which you’ll need to replace with your own credentials.

Each line of dialogue is labeled by character name (e.g., “Country Mouse” or “Town Mouse”) to simulate a natural conversation. The script assigns a unique voice to each character using voice and voice2. On a successful API call, the generated audio is saved as dialogue.wav. Any errors are reported with status details.

To run the script:

Replace placeholders in the headers with your API key and user ID.
Update the text with your scripted conversation
Update the Speaker Details and their respective voices
Run the script. If successful, an audio file, dialogue.wav, will be saved in the current directory, capturing the dialogue as configured.
This setup can easily adapt to more complex dialogues or different speakers.

Troubleshooting

Authentication Issues: Verify your API key and user ID. Ensure the AUTHORIZATION header includes “Bearer ” followed by your token.
API Endpoint Errors: Confirm you’re using the correct PlayAI’s Dialog 1.0 API endpoint URL and the model name is PlayDialog

Get Started

Text-to-Speech

Agent

Agent SDKs

PlayNote

Tutorials

Resources

Create an AI Podcast

Overview

Prerequisites

Steps

Complete Code

Key API Parameters

Code Explanation

Troubleshooting

Get Started

Text-to-Speech

Agent

Agent SDKs

PlayNote

Tutorials

Resources

​Overview

​Prerequisites

​Steps

​Complete Code

​Key API Parameters

​Code Explanation

​Troubleshooting

Overview

Prerequisites

Steps

Complete Code

Key API Parameters

Code Explanation

Troubleshooting