How to use PlayDialog Text-to-Speech API
PlayDialog API Guide
This guide provides a step-by-step approach to using the PlayDialog API to convert text into natural human-like sounding audio.
In this example, we’ll have PlayDialog create a simple audio from the given input text.
Prerequisites
-
Access credentials (Secret key and User ID) for the PlayDialog API.
-
Python environment for executing the API request.
Setup your API Key
To keep your API key secure and avoid hardcoding it directly into your code, you can store it as an environment variable. This way, your script can access it securely without exposing the key.
Step 1: Set the Environment Variable
For macOS and Linux
-
Open your terminal.
-
Add this line to your ~/.bashrc or ~/.zshrc file to make it persistent across sessions
- Run
~/.bashrc (or source ~/.zshrc for zsh)
to load the variables into your current session.
For Windows
-
Open Command Prompt or PowerShell.
-
Use the
setx
command to create each environment variable individually:
- Restart your terminal to apply the changes.
Step 2: Access the Variables in Python
In your Python script, use the os module to access the environment variables:
Key API Parameters
The following API payload define the conversation, speaker details, and audio generation options:
-
model
: Specifies the PlayDialog API model to be used. Here,PlayDialog
supports multi-turn conversation generation. -
text
: Contains the input text for which the speech audio has to be generated. -
voice
: URL path to the voice manifest for the first speaker. -
output_format
: Format for the generated audio file, typicallywav
ormp3
.
Call the API
The following Python script demonstrates how to request the PlayDialog API and save the generated audio file.
If you happen to save the code as playdialog_tts.py
then Run the code using python3 playdialog_tts.py
pointing your terminal to the directory where the playdialog_tts.py
file is stored. This will save the dialogue.wav
in the same working directory.
Code Explanation
This script uses the PlayDialog API to generate an audio file from a text input. The script requires an AUTHORIZATION token
and X-USER-ID
for authentication, which should be replaced with your own credentials in the environment variables PLAYDIALOG_API_KEY
and PLAYDIALOG_USER_ID
.
The main script sends a single line of text to the PlayDialog API, where it is processed into audio/speech with a specified voice configuration. The voice
parameter contains the path to the desired voice profile, allowing customization of the spoken output. The audio is then generated in the specified wav
format.
On a successful API call, the resulting audio is saved as dialogue.wav
in the current directory, ready for playback. If the request fails, the script outputs an error message with the HTTP status code and any additional details from the API response.
To run the script:
-
Replace placeholders in the headers with your API key and user ID.
-
Update the
text
with your input text (that needs to converted into speech) -
Update the Speaker Details and their respective voice
-
Run the script. If successful, an audio file,
dialogue.wav
, will be saved in the current directory, capturing the dialogue as configured. -
This setup can easily adapt to more complex dialogues or multiple speakers as well.
Troubleshooting
-
Authentication Issues: Verify your
API key
anduser ID
. Ensure theAUTHORIZATION
header includes “Bearer ” followed by your token. -
API Endpoint Errors: Confirm you’re using the correct PlayDialog API endpoint URL and the
model
name isPlayDialog