This guide provides a step-by-step approach to using the PlayAI’s Dialog 1.0 model to create a multi-turn scripted conversation between two distinct speakers.

Overview

  • This is an async API endpoint.

  • You will make a request to trigger podcast generation.

  • You will then request another endpoint to see if the podcast is ready.

Prerequisites

  • Access your credentials (secret key and user ID).
  • Development environment for your chosen programming language.

Steps

1

Set up environment variables

Choose your operating system and set up the environment variables:

echo 'export PLAYAI_API_KEY="your_api_key_here"' >> ~/.zshrc
echo 'export PLAYAI_USER_ID="your_user_id_here"' >> ~/.zshrc
source ~/.zshrc
2

Create a new script

Create a new file and add the following code:

import requests
import os
import time

# Set up headers with your API secrety key and user ID
user_id = os.getenv("PLAYAI_USER_ID")
secret_key = os.getenv("PLAYAI_API_KEY")

headers = {
    'X-USER-ID': user_id,
    'Authorization': secret_key,
    'Content-Type': 'application/json',
}
3

Configure the model and voices

Define the model and select voices for your hosts:

# define the model
model = 'PlayDialog'

# define voices for the 2 hosts
# find all voices here https://docs.play.ai/tts-api-reference/voices
voice_1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json'
voice_2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json'
4

Add your podcast transcript

Add your scripted conversation in the correct format:

# podcast transcript should be in the format of Host 1: ... Host 2:
transcript = """
Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
"""
5

Configure the API payload

Set up the payload with your configuration:

payload = {
    'model': model,
    'text': transcript,
    'voice': voice_1,
    'voice2': voice_2,
    'turnPrefix': 'Host 1:',
    'turnPrefix2': 'Host 2:',
    'outputFormat': 'mp3',
}
6

Send the request and monitor progress

Add the code to send the request and check the status:

# Send the POST request to trigger podcast generation
response = requests.post('https://api.play.ai/api/v1/tts/', headers=headers, json=payload)

# get the job id to check the status
job_id = response.json().get('id')

# use the job id to check completion status
url = f'https://api.play.ai/api/v1/tts/{job_id}'
delay_seconds = 2

# keep checking until status is COMPLETED.
# longer transcripts take more time to complete.
while True:
    response = requests.get(url, headers=headers)

    if response.ok:
        status = response.json().get('output', {}).get('status')
        print(status)
        if status == 'COMPLETED':
            # once completed audio url will be avaialable
            podcast_audio = response.json().get('output', {}).get('url')
            break

    time.sleep(delay_seconds)

print(podcast_audio)

Complete Code

import requests
import os
import time

# Set up headers with your API secrety key and user ID
user_id = os.getenv("PLAYAI_USER_ID")
secret_key = os.getenv("PLAYAI_API_KEY")

headers = {
    'X-USER-ID': user_id,
    'Authorization': secret_key,
    'Content-Type': 'application/json',
}

# define the model
model = 'PlayDialog'

# define voices for the 2 hosts
# find all voices here https://docs.play.ai/tts-api-reference/voices
voice_1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json'
voice_2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json'

# podcast transcript should be in the format of Host 1: ... Host 2:
transcript = """
Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
"""

payload = {
    'model': model,
    'text': transcript,
    'voice': voice_1,
    'voice2': voice_2,
    'turnPrefix': 'Host 1:',
    'turnPrefix2': 'Host 2:',
    'outputFormat': 'mp3',
}

# Send the POST request to trigger podcast generation
response = requests.post('https://api.play.ai/api/v1/tts/', headers=headers, json=payload)

# get the job id to check the status
job_id = response.json().get('id')

# use the job id to check completion status
url = f'https://api.play.ai/api/v1/tts/{job_id}'
delay_seconds = 2

# keep checking until status is COMPLETED.
# longer transcripts take more time to complete.
while True:
    response = requests.get(url, headers=headers)

    if response.ok:
        status = response.json().get('output', {}).get('status')
        print(status)
        if status == 'COMPLETED':
            # once completed audio url will be avaialable
            podcast_audio = response.json().get('output', {}).get('url')
            break

    time.sleep(delay_seconds)

print(podcast_audio)

Key API Parameters

The following API payload define the conversation, speaker details, and audio generation options:

  • model: Specifies the PlayAI’s Dialog 1.0 model to be used. Here, PlayDialog supports multi-turn conversation generation.

  • text: Contains the scripted conversation, with each turn prefixed by the speaker’s name (e.g., "Country Mouse" & "Town Mouse").

  • voice: URL path to the voice manifest for the first speaker.

  • voice_2: URL path to the voice manifest for the second speaker.

  • turn_prefix / turn_prefix_2: Used to specify each speaker’s dialogue turns within the text field. For example: turn_prefix says Country Mouse to indicate the position where Speaker 1’s dialogue and turn_prefix_2 says Town Mouse that indicates the position where Speaker 2’s dialogue parts are.

  • output_format: Format for the generated audio file, typically wav or mp3.

If you happen to save the code as country_mouse.py then Run the code using python3 country_mouse.py pointing your terminal to the directory where the country_mouse.py file is stored. This will save the dialogue.wav in the same working directory.

Code Explanation

This script uses the Dialog 1.0 model to generate a multi-turn conversation between two characters. The AUTHORIZATION token and X-USER-ID are required for authentication, which you’ll need to replace with your own credentials.

Each line of dialogue is labeled by character name (e.g., “Country Mouse” or “Town Mouse”) to simulate a natural conversation. The script assigns a unique voice to each character using voice and voice2. On a successful API call, the generated audio is saved as dialogue.wav. Any errors are reported with status details.

To run the script:

  • Replace placeholders in the headers with your API key and user ID.

  • Update the text with your scripted conversation

  • Update the Speaker Details and their respective voices

  • Run the script. If successful, an audio file, dialogue.wav, will be saved in the current directory, capturing the dialogue as configured.

  • This setup can easily adapt to more complex dialogues or different speakers.

Troubleshooting

  • Authentication Issues: Verify your API key and user ID. Ensure the AUTHORIZATION header includes “Bearer ” followed by your token.

  • API Endpoint Errors: Confirm you’re using the correct PlayAI’s Dialog 1.0 API endpoint URL and the model name is PlayDialog