To use our WebSocket, you will need beforehand:

A PlayAI account
An API key to authenticate with the PlayAI API
An agent ID of a PlayAI Agent (created via our Web UI or our Create Agent endpoint)

To fully leverage our WebSocket API, the steps are:

Connect to our wss://api.play.ai/v1/talk/<your_agent_id> URL
Send a {"type":"setup","apiKey":"yourKey"} message as first message
Send audio input as base64 encoded string in {"type":"audioIn","data":"base64Data"} messages
Receive audio output in {"type":"audioStream","data":"base64Data"} messages

Establishing a Connection

To initiate a conversation, establish a websocket connection to our talk URL, including the agentId as a path parameter:

wss://api.play.ai/v1/talk/<your_agent_id>

For example, assuming Agent-XP5tVPa8GDWym6j is the ID of an agent you have created via our Web UI or through our Create Agent endpoint, the WebSocket URL should look like:

const myWs = new WebSocket('wss://api.play.ai/v1/talk/Agent-XP5tVPa8GDWym6j');

Initial Setup Message

Before you can start sending and receiving audio data, you must first send a setup message to authenticate and configure your session.

WebSocket basic connection, setup and message flow

The only required field in the setup message is the apiKey. This assumes you are comfortable with the default values for audio input and audio output formats. In this scenario, your first setup message could be as simple as:

{ "type": "setup", "apiKey": "yourKey" }

Get your API Key at our Developers page

Code example:

const myWs = new WebSocket('wss://api.play.ai/v1/talk/Agent-XP5tVPa8GDWym6j');
myWs.onopen = () => {
  console.log('connected!');
  myWs.send(JSON.stringify({ type: 'setup', apiKey: 'yourApiKey' }));
};

Setup Options

The setup message configures important details of the session, including the format/encoding of the audio that you intend to send us and the format that you expect to receive.

Example setup messages with various options:

// mulaw 16KHz as input
{ "type": "setup", "apiKey": "...", "inputEncoding": "mulaw", "inputSampleRate": 16000 }
// 24Khz mp3 output
{ "type": "setup", "apiKey": "...", "outputFormat": "mp3", "outputSampleRate": 24000 }
// mulaw 8KHz in and out
{ "type": "setup", "apiKey": "...", "inputEncoding": "mulaw", "inputSampleRate": 8000, "outputFormat": "mulaw", "outputSampleRate": 8000 }

The following fields are available for configuration:

Property	Accepted values	Description	Default value
`type` (required)	`"setup"`	Specifies that the message is a setup command.	-
`apiKey` (required)	`string`	Your API Key.	-
`outputFormat` (optional)	`"mp3"` `"raw"` `"wav"` `"ogg"` `"flac"` `"mulaw"`	The format of audio you want our agent to output in the `audioStream` messages. `mp3` = 128kbps MP3 `raw` = PCM_FP32 `wav` = 16-bit (uint16) PCM `ogg` = 80kbps OGG Vorbis `flac` = 16-bit (int16) FLAC `mulaw` = 8-bit (uint8) PCM headerless	`"mp3"`
`outputSampleRate` (optional)	`number`	The sample rate of the audio you want our agent to output in the `audioStream` messages	`44100`
`inputEncoding` (optional)	For non-headerless formats:`"media-container"`For headerless formats: `"mulaw"` `"linear16"` `"flac"` `"amr-nb"` `"amr-wb"` `"opus"` `"speex"` `"g729"`	The encoding of the audio you intend to send in the `audioIn` messages.If your are sending audio formats that use media containers (that is, audio that contain headers, such as `mp4`, `m4a`, `mp3`, `ogg`, `flac`, `wav`, `mkv`, `webm`, `aiff`), just use `"media-container"` as value for `inputEncoding` (or don’t pass any value at all since `"media-container"` is the default). This will instruct our servers to process the audio based on the data headers.If, on the other hand, you will send us audio in headerless formats, you have to specify the format you will be sending. In this case, specify it by, e.g., setting `inputEncoding` to `"mulaw"`, `"flac"`, etc.	`"media-container"`
`inputSampleRate` (optional)	`number`	The sample rate of the audio you intend to send. Required if you are specifying an `inputEncoding` different than `"media-container"`. Optional, otherwise	-
`customGreeting` (optional)	`string`	Your agent will say this message to start every conversation. This overrides the agent’s greeting.	-
`prompt` (optional)	`string`	Give instructions to your AI about how it should behave and interact with others in conversation. This is appended to the agent’s prompt.	`""`
`continueConversation` (optional)	`string`	If you want to continue a conversation from a previous session, pass the `conversationId` here. The agent will continue the conversation from where it left off.	-

`audioIn`: Sending Audio Input

After the setup, you can send audio input in the form of an audioIn message. The audio must be sent as a base64 encoded string in the data field. The message format is:

{ "type": "audioIn", "data": "<base64Data>" }

The audio you send must match the inputEncoding and inputSampleRate you configured in the setup options.

Sample Code for Sending Audio

Assuming myWs is a WebSocket connected to our /v1/talk endpoint, the sample code below would send audio directly from the browser:

const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    channelCount: 1,
    echoCancellation: true,
    autoGainControl: true,
    noiseSuppression: true,
  },
});
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = async (event) => {
  const base64Data = await blobToBase64(event.data);

  // Relevant:
  myWs.send(JSON.stringify({ type: 'audioIn', data: base64Data }));
};

async function blobToBase64(blob) {
  const reader = new FileReader();
  reader.readAsDataURL(blob);
  return new Promise((resolve) => {
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
  });
}

`audioStream`: Receiving Audio Output

Audio output from the server will be received in an audioStream message. The message format is:

{ "type": "audioStream", "data": "<base64Data>" }

The audio you receive will match the outputFormat and outputSampleRate you configured in the setup options.

Sample Code for Receiving Audio

myWs.on('message', (message) => {
  const event = JSON.parse(message);
  if (event.type === 'audioStream') {
    // deserialize event.data from a base64 string to binary
    // enqueue/play the binary data at your player
    return;
  }
});

Voice Activity Detection: `voiceActivityStart` and `voiceActivityEnd`

During the conversation, you will receive voiceActivityStart and voiceActivityEnd messages indicating the detection of speech activity in the audio input. These messages help in understanding when the user starts and stops speaking. When our service detects that the user started to speak, it will emit a voiceActivityStart event. Such a message will have the format:

{ "type": "voiceActivityStart" }

It is up to you to decide how to react to this event. We highly recommend you stop playing whatever audio is being played, since the voiceActivityStart generally indicates the user wanted to interrupt the agent. Similarly, when our service detects that the user stopped speaking, it emits a voiceActivityEnd event:

{ "type": "voiceActivityEnd" }

`newAudioStream`: Handling New Audio Streams

A newAudioStream message indicates the start the audio of a new response. It is recommended to clear your player buffer and start playing the new stream content upon receiving this message. This message contains no additional fields.

Error Handling

Errors from the server are sent as error message type, a numeric code and a message in the following format:

{ "type": "error", "code": <errorCode>, "message": "<errorMessage>" }

The table below provides a quick reference to the various error codes and their corresponding messages for the Agent Websocket API.

Error Code	Error Message
1001	Invalid authorization token.
1002	Invalid agent id.
1003	Invalid authorization credentials.
1005	Not enough credits.
4400	Invalid parameters. Indicates the message sent to the server failed to match the expected format. Double check the logic and try again.
4401	Unauthorized. Invalid authorization token or invalid authorization credentials for specified agent.
4429	You have reached the maximum number of concurrent connections allowed by your current plan. Please consider upgrading your plan or reducing the number of active connections to continue.
4500	Generic error code for internal errors, such as failures to generate responses. Generally, the user is not at fault when these happen. An appropriate reaction is to wait a few moments and try again. If the problem persists, contacting support is advised.

This documentation covers the essential aspects of interacting with the PlayAI Websocket API for agent conversations. Ensure that your implementation handles the specified message types and follows the outlined protocols for a seamless integration.

Text-to-Speech

Agents

Agent Conversations

Agent Actions

Agent Statistics

PlayNote

Websocket API

Establishing a Connection

Initial Setup Message

Setup Options

`audioIn`: Sending Audio Input

Sample Code for Sending Audio

`audioStream`: Receiving Audio Output

Sample Code for Receiving Audio

Voice Activity Detection: `voiceActivityStart` and `voiceActivityEnd`

`newAudioStream`: Handling New Audio Streams

Error Handling

Text-to-Speech

Agents

Agent Conversations

Agent Actions

Agent Statistics

PlayNote

​Establishing a Connection

​Initial Setup Message

​Setup Options

​audioIn: Sending Audio Input

​Sample Code for Sending Audio

​audioStream: Receiving Audio Output

​Sample Code for Receiving Audio

​Voice Activity Detection: voiceActivityStart and voiceActivityEnd

​newAudioStream: Handling New Audio Streams

​Error Handling

Establishing a Connection

Initial Setup Message

Setup Options

`audioIn`: Sending Audio Input

Sample Code for Sending Audio

`audioStream`: Receiving Audio Output

Sample Code for Receiving Audio

Voice Activity Detection: `voiceActivityStart` and `voiceActivityEnd`

`newAudioStream`: Handling New Audio Streams

Error Handling