Websocket API
Enhance your app with our audio-in, audio-out API, enabling seamless, natural conversations with your PlayAI agent. Transform your user experience with the power of voice.
To use our WebSocket, you will need beforehand:
- A Play.ai account
- An API key to authenticate with the Play.ai API
To fully leverage our WebSocket API, the steps are:
- Send a POST request to
https://api.play.ai/api/v1/auth
withAuthorization: Bearer <your_api_key>
andX-User-Id: <your_user_id>
headers - Receive a JSON response with a
webSocketUrl
field containing the WebSocket URL - Connect to the provided
webSocketUrl
URL - Send TTS commands with the same options as our TTS streaming API, but in
snake_case
, e.g.,{"text":"Hello World","voice":"...","output_format":"mp3"}
- Receive audio output as binary messages
Quickstart - Runnable Demo
If you want to get started quickly, you can clone the play-showcase
repository
and run the tts-websocket
app locally.
Establishing a WebSocket Connection
To establish a WebSocket connection, you will need to send a POST request to the https://api.play.ai/api/v1/tts/auth
endpoint with the following headers:
You can obtain your api_key
and user_id
from your PlayAI account.
The response will contain a JSON object with a webSocketUrl
field that you can use to connect to the WebSocket server.
After this point, you can forward the webSocketUrl
to your WebSocket client to establish a connection, such as in the following example:
Sending TTS Commands
Once connected to the WebSocket, you can send TTS commands as JSON messages.
The structure of these commands is similar to our TTS streaming API, but in snake_case
.
Here’s an example:
Examples of the available options for the TTS command are:
request_id
(optional): A unique identifier for the request, useful for correlating responses (see more details below).text
(required): The text to be converted to speech.voice
(required): The voice ID or URL to use for synthesis.output_format
(optional): The desired audio format (default is “mp3”).temperature
(optional): Controls the randomness of the generated speech (0.0 to 1.0).speed
(optional): The speed of the generated speech (0.5 to 2.0).
For the complete list of parameters, refer to the TTS API documentation.
Receiving Audio Output
After sending a TTS command, you’ll receive two kinds of messages:
- The audio output as a series of binary messages.
- One final text message with the format
{"request_id":<request_id>}
to indicate the end of the audio stream. - In this response message,
request_id
is the unique identifier you provided in the TTS command, ornull
if you didn’t provide one.
To handle these messages and play the audio, you can use the following approach:
This code collects the binary audio chunks as they arrive and combines them into a single audio blob when the
End or Request message ({"request_id":<request id>}
) is received. It then creates an audio URL and plays the audio using the Web Audio API.
Error Handling
It’s important to implement error handling in your WebSocket client. Here’s an example of how to handle errors and connection closures:
Connection Timeout
To ensure optimal usage, WebSocket connections may be closed by intermediary proxies if they remain idle for longer than 10 seconds. To keep the connection alive, you can send new TTS commands, which will generate audio in a similar way to the first request.
Best Practices
-
Authentication: Always keep your API key secure. While the WebSocket URL can be shared with client-side code, the API Key and User ID should be kept private.
-
Error Handling: Implement robust error handling and reconnection logic in your WebSocket client.
-
Resource Management: Close the WebSocket connection when it’s no longer needed to free up server resources.
-
Rate Limiting: Be aware of rate limits on the API and implement appropriate throttling in your application.
-
Testing: Thoroughly test your implementation with various inputs and network conditions to ensure reliability.
By following these guidelines and using the provided examples, you can effectively integrate the PlayAI TTS WebSocket API into your application, enabling real-time text-to-speech functionality with low latency and high performance.