This guide provides a step-by-step approach to using the PlayDialog API to convert text into natural human-like sounding audio using the Async (non-streaming) API Endpoint.

In this example, we’ll have PlayDialog create a simple audio from the given input text.

Prerequisites

  • Access credentials (Secret key and User ID) for the PlayDialog API.

  • Python environment for executing the API request.

Setup your API Key

To keep your API key secure and avoid hardcoding it directly into your code, you can store it as an environment variable. This way, your script can access it securely without exposing the key.

Step 1: Set the Environment Variable

For macOS and Linux

  • Open your terminal.

  • Add this line to your ~/.bashrc or ~/.zshrc file to make it persistent across sessions:

bash
echo 'export PLAYDIALOG_API_KEY="your_api_key_here"' >> ~/.bashrc
echo 'export PLAYDIALOG_USER_ID="your_user_id_here"' >> ~/.bashrc
  • Run source ~/.bashrc (or source ~/.zshrc for zsh) to load the variables into your current session.

For Windows

  • Open Command Prompt or PowerShell.

  • Use the setx command to create each environment variable individually:

cmd
setx PLAYDIALOG_API_KEY "your_api_key_here"
cmd
setx PLAYDIALOG_USER_ID "your_user_id_here"
  • Restart your terminal to apply the changes.

Step 2: Access the Variables in Python

In your Python script, use the os module to access the environment variables:

python
import os

api_key = os.getenv("PLAYDIALOG_API_KEY")
user_id = os.getenv("PLAYDIALOG_USER_ID")

headers = {
    'AUTHORIZATION': api_key,
    'Content-Type': 'application/json',
    'X-USER-ID': user_id
}

Key API Parameters

The following API payload defines the conversation, speaker details, and audio generation options:

  • model: Specifies the PlayDialog API model to be used.

  • text: Contains the input text for which the speech audio has to be generated.

  • voice: URL path to the voice manifest for the first speaker.

  • outputFormat: Format for the generated audio file, typically wav or mp3.

  • speed: (Optional) Adjust the speaking speed.

  • language: (Optional) Language of the input text.

Call the API - Non-Streaming Endpoint with Polling

The new non-streaming API requires submitting a job and polling to check its status. Once the job is completed, you can retrieve the audio file.

Submit a Job

python
import requests
import os

# Set up headers with your API authentication token and user ID
api_key = os.getenv("PLAYDIALOG_API_KEY")
user_id = os.getenv("PLAYDIALOG_USER_ID")

headers = {
    'AUTHORIZATION': api_key,
    'Content-Type': 'application/json',
    'X-USER-ID': user_id
}

# JSON payload for job submission
json_data = {
    'model': 'PlayDialog',
    'text': "This is the greatest moment to be alive",
    'voice': 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json',
    'outputFormat': 'mp3',
    'speed': 1,
    'language': 'english',
}

# Submit the job to the API
response = requests.post('https://api.play.ai/api/v1/tts', headers=headers, json=json_data)

if response.status_code == 201:
    job_id = response.json().get('id')
    print(f"Job submitted successfully. Job ID: {job_id}")
else:
    print(f"Job submission failed with status code {response.status_code}: {response.text}")

Once this code is executed, You will get a job id which means:

  1. The Async Job has been successfully submitted.

  2. The job_id can be used to poll the status of the TTS job.

Sample Job ID - '9726f318-410a-4c8e-99c6-e2e1da6615e1'

Poll for Job Status

After submitting the job, poll the status until it is completed.

python
import time

# job_id = "your_job_id_here" job_id generated from `response.json().get('id')`
polling_url = f'https://api.play.ai/api/v1/tts/{job_id}'

while True:
    response = requests.get(polling_url, headers=headers)
    status = response.json()['output']['status']

    if status == 'COMPLETED':
        audio_url = response.json()['output']['url']
        print(f"Job completed. Audio URL: {audio_url}")
        break
    elif status == 'IN_PROGRESS':
        print("Job is still in progress. Retrying in 5 seconds...")
        time.sleep(5)
    else:
        print(f"Job failed or encountered an unknown status: {status}")
        break

Sample output object:

output
{'id': '9726f318-410a-4c8e-99c6-e2e1da6615e1',
 'createdAt': '2024-11-18T13:58:32.252Z',
 'input': {'model': 'PlayDialog',
  'text': 'This is the greatest moment to be alive',
  'voice': 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json',
  'outputFormat': 'mp3',
  'speed': 1,
  'language': 'english'},
 'completedAt': '2024-11-18T13:58:50.468Z',
 'output': {'status': 'COMPLETED',
  'url': 'https://fal-api-audio-uploads.s3.amazonaws.com/8e340848-88e7-492c-a712-f7d41c9c4693.mp3',
  'contentType': 'audio/mpeg',
  'fileSize': 52461,
  'duration': 2.14}}

Save the Generated Audio File

Once the job is completed, download and save the audio file from the provided URL.

python
import requests

audio_url = response.json()['output']['url']
audio_response = requests.get(audio_url)

if audio_response.status_code == 200:
    with open('output.mp3', 'wb') as f:
        f.write(audio_response.content)
    print("Audio file saved as output.mp3")
else:
    print(f"Failed to download audio. Status code: {audio_response.status_code}")

If all the above code blocks were successfully executed, at this stage, you’d have an audio file saved in your local computer’s current working directory as output.mp3.

Troubleshooting

  • Authentication Issues: Verify your API key and user ID. Ensure the header is correctly set up.

  • Job Status Polling: Ensure you use the correct job ID to check the status.

  • API Endpoint Errors: Confirm you’re using the correct PlayDialog API endpoint URL and model name.