# Get Agent Stats Source: https://docs.play.ai/api-reference/agents/endpoints/v1/agent-stats/get GET /api/v1/agent-stats/{agentId} Retrieve the usage statistics of an agent. # Get Agent Source: https://docs.play.ai/api-reference/agents/endpoints/v1/agents/get GET /api/v1/agents/{agentId} Retrieve all information about an agent. # Update Agent Source: https://docs.play.ai/api-reference/agents/endpoints/v1/agents/patch PATCH /api/v1/agents/{agentId} Updates the properties of the agent with the specified ID. # Create Agent Source: https://docs.play.ai/api-reference/agents/endpoints/v1/agents/post POST /api/v1/agents Create a new PlayAI Agent. Required parameters include the agent's name and the agent's prompt. After you create your agent, you can proceed to start a conversation using our [Websocket API](/api-reference/agents/websocket), or you can try it out through our web interface at `https://play.ai/agent/`. To update the agents see the [Update Agent](/api-reference/agents/endpoints/v1/agents/patch) endpoint. # Get Agent Conversations Source: https://docs.play.ai/api-reference/agents/endpoints/v1/conversations/get GET /api/v1/agents/{agentId}/conversations Retrieve all information about an agent's conversations. ### Response Headers for Pagination | Header Name | Type | Description | | -------------------- | ------- | --------------------------------------------- | | `X-Page-Size` | integer | The number of items per page. | | `X-Start-After` | string | The ID of the last item on the previous page. | | `X-Next-Start-After` | string | The ID of the last item on the current page. | | `X-Total-Count` | integer | The total number of items. | These headers are included in the response to help manage pagination when retrieving conversations for a specific agent. # Get Agent Conversation Source: https://docs.play.ai/api-reference/agents/endpoints/v1/conversations/get-one GET /api/v1/agents/{agentId}/conversations/{conversationId} Retrieve all information about an agent conversation. # Get Conversation Transcript Source: https://docs.play.ai/api-reference/agents/endpoints/v1/conversations/get-transcript GET /api/v1/agents/{agentId}/conversations/{conversationId}/transcript Retrieve the transcript of a specific agent conversation. ### Response Headers for Pagination | Header Name | Type | Description | | -------------------- | ------- | --------------------------------------------- | | `X-Page-Size` | integer | The number of items per page. | | `X-Start-After` | string | The ID of the last item on the previous page. | | `X-Next-Start-After` | string | The ID of the last item on the current page. | | `X-Total-Count` | integer | The total number of items. | These headers are included in the response to help manage pagination when retrieving conversation transcript for a specific agent conversation. # Delete External Function Source: https://docs.play.ai/api-reference/agents/endpoints/v1/external-functions/delete DELETE /api/v1/external-functions/{functionId} Deletes the external function with the specified ID. # Get All External Functions Source: https://docs.play.ai/api-reference/agents/endpoints/v1/external-functions/get GET /api/v1/external-functions Retrieve all information about all external functions that you have created. # Get External Function Source: https://docs.play.ai/api-reference/agents/endpoints/v1/external-functions/get-one GET /api/v1/external-functions/{functionId} Retrieve all information about the external function with the specified ID. # Update External Function Source: https://docs.play.ai/api-reference/agents/endpoints/v1/external-functions/patch PATCH /api/v1/external-functions/{functionId} Updates the properties of the external function with the specified ID. # Create External Function Source: https://docs.play.ai/api-reference/agents/endpoints/v1/external-functions/post POST /api/v1/external-functions Use this endpoint to create new external functions. Required parameters include the external function's name and the external function's description. After you create your agent, you can attach the external function to an agent. To update the external functions see the [Update External Function](/api-reference/agents/endpoints/v1/external-functions/patch) endpoint. # Introduction Source: https://docs.play.ai/api-reference/agents/introduction Create and manage PlayAI agents via the API PlayAI provides a simple and easy to use HTTP API to create and manage AI Agents. After you create your agent, you can proceed to start a conversation using our [Websocket API](/api-reference/agents/websocket), or you can try it out through our web interface at `https://play.ai/agent/`. ## Authentication All API endpoints are authenticated using a User ID and API Key. After you have created an account and logged in, you can get your API Key from the [For Developers](https://play.ai/api/keys) page. # Websocket API Source: https://docs.play.ai/api-reference/agents/websocket Enhance your app with our audio-in, audio-out API, enabling seamless, natural conversations with your PlayAI agent. Transform your user experience with the power of voice. To use our WebSocket, you will need beforehand: * A [PlayAI account](https://play.ai/pricing) * An [API key to authenticate](https://play.ai/api/keys) with the PlayAI API * An agent ID of a PlayAI Agent (created via our [Web UI](https://play.ai/my-agents) or our [Create Agent endpoint](/api-reference/agents/endpoints/v1/agents/post)) To fully leverage our WebSocket API, the steps are: * Connect to our `wss://api.play.ai/v1/talk/` URL * Send a `{"type":"setup","apiKey":"yourKey"}` message as first message * Send audio input as base64 encoded string in `{"type":"audioIn","data":"base64Data"}` messages * Receive audio output in `{"type":"audioStream","data":"base64Data"}` messages # Establishing a Connection To initiate a conversation, establish a websocket connection to our `talk` URL, including the `agentId` as a path parameter: ```text wss://api.play.ai/v1/talk/ ``` For example, assuming `Agent-XP5tVPa8GDWym6j` is the ID of an agent you have created via our [Web UI](https://play.ai/my-agents) or through our [Create Agent endpoint](/api-reference/agents/endpoints/v1/agents/post), the WebSocket URL should look like: ```js const myWs = new WebSocket('wss://api.play.ai/v1/talk/Agent-XP5tVPa8GDWym6j'); ``` # Initial Setup Message Before you can start sending and receiving audio data, you must first send a `setup` message to authenticate and configure your session. ```mermaid graph TB subgraph "conversation" C --> D[Send 'audioIn' messages containing your user's audio data] D --> C end B --> C[Receive 'audioStream' messages containing Agent's audio data] subgraph setup A[Establish WebSocket Connection] --> B[Send 'setup' message] end ``` The only required field in the setup message is the `apiKey`. This assumes you are comfortable with the default values for audio input and audio output formats. In this scenario, your first setup message could be as simple as: ```json { "type": "setup", "apiKey": "yourKey" } ``` Get your API Key at our [Developers](https://play.ai/api/keys) page Code example: ```js const myWs = new WebSocket('wss://api.play.ai/v1/talk/Agent-XP5tVPa8GDWym6j'); myWs.onopen = () => { console.log('connected!'); myWs.send(JSON.stringify({ type: 'setup', apiKey: 'yourApiKey' })); }; ``` ## Setup Options The setup message configures important details of the session, including the format/encoding of the audio that you intend to send us and the format that you expect to receive. ```json Example setup messages with various options: // mulaw 16KHz as input { "type": "setup", "apiKey": "...", "inputEncoding": "mulaw", "inputSampleRate": 16000 } // 24Khz mp3 output { "type": "setup", "apiKey": "...", "outputFormat": "mp3", "outputSampleRate": 24000 } // mulaw 8KHz in and out { "type": "setup", "apiKey": "...", "inputEncoding": "mulaw", "inputSampleRate": 8000, "outputFormat": "mulaw", "outputSampleRate": 8000 } ``` The following fields are available for configuration:
Property Accepted values Description Default value
`type`
(required)
`"setup"` Specifies that the message is a setup command. -
`apiKey`
(required)
`string` [Your API Key](https://play.ai/api/keys). -
`outputFormat`
(optional)
* `"mp3"` * `"raw"` * `"wav"` * `"ogg"` * `"flac"` * `"mulaw"` The format of audio you want our agent to output in the `audioStream` messages. * `mp3` = 128kbps MP3 * `raw` = PCM\_FP32 * `wav` = 16-bit (uint16) PCM * `ogg` = 80kbps OGG Vorbis * `flac` = 16-bit (int16) FLAC * `mulaw` = 8-bit (uint8) PCM headerless `"mp3"`
`outputSampleRate`
(optional)
`number` The sample rate of the audio you want our agent to output in the `audioStream` messages `44100`
`inputEncoding`
(optional)
For non-headerless formats: `"media-container"` For headerless formats: * `"mulaw"` * `"linear16"` * `"flac"` * `"amr-nb"` * `"amr-wb"` * `"opus"` * `"speex"` * `"g729"` The encoding of the audio you intend to send in the `audioIn` messages. If your are sending audio formats that use media containers (that is, audio that contain headers, such as `mp4`, `m4a`, `mp3`, `ogg`, `flac`, `wav`, `mkv`, `webm`, `aiff`), just use `"media-container"` as value for `inputEncoding` (or don't pass any value at all since `"media-container"` is the default). This will instruct our servers to process the audio based on the data headers. If, on the other hand, you will send us audio in headerless formats, you have to specify the format you will be sending. In this case, specify it by, e.g., setting `inputEncoding` to `"mulaw"`, `"flac"`, etc. `"media-container"`
`inputSampleRate`
(optional)
`number` The sample rate of the audio you intend to send. Required if you are specifying an `inputEncoding` different than `"media-container"`. Optional, otherwise -
`customGreeting`
(optional)
`string` Your agent will say this message to start every conversation. This overrides the agent's greeting. -
`prompt`
(optional)
`string` Give instructions to your AI about how it should behave and interact with others in conversation. This is appended to the agent's prompt. `""`
`continueConversation`
(optional)
`string` If you want to continue a conversation from a previous session, pass the `conversationId` here. The agent will continue the conversation from where it left off. -


# `audioIn`: Sending Audio Input After the setup, you can send audio input in the form of an `audioIn` message. The audio must be sent as a base64 encoded string in the `data` field. The message format is: ```json { "type": "audioIn", "data": "" } ``` The audio you send must match the `inputEncoding` and `inputSampleRate` you configured in the setup options. ## Sample Code for Sending Audio Assuming `myWs` is a WebSocket connected to our `/v1/talk` endpoint, the sample code below would send audio directly from the browser: ```javascript const stream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount: 1, echoCancellation: true, autoGainControl: true, noiseSuppression: true, }, }); const mediaRecorder = new MediaRecorder(stream); mediaRecorder.ondataavailable = async (event) => { const base64Data = await blobToBase64(event.data); // Relevant: myWs.send(JSON.stringify({ type: 'audioIn', data: base64Data })); }; async function blobToBase64(blob) { const reader = new FileReader(); reader.readAsDataURL(blob); return new Promise((resolve) => { reader.onloadend = () => resolve(reader.result.split(',')[1]); }); } ``` # `audioStream`: Receiving Audio Output Audio output from the server will be received in an `audioStream` message. The message format is: ```json { "type": "audioStream", "data": "" } ``` The audio you receive will match the `outputFormat` and `outputSampleRate` you configured in the setup options. ## Sample Code for Receiving Audio ```javascript myWs.on('message', (message) => { const event = JSON.parse(message); if (event.type === 'audioStream') { // deserialize event.data from a base64 string to binary // enqueue/play the binary data at your player return; } }); ``` # Voice Activity Detection: `voiceActivityStart` and `voiceActivityEnd` During the conversation, you will receive `voiceActivityStart` and `voiceActivityEnd` messages indicating the detection of speech activity in the audio input. These messages help in understanding when the user starts and stops speaking. When our service detects that the user started to speak, it will emit a `voiceActivityStart` event. Such a message will have the format: ```json { "type": "voiceActivityStart" } ``` It is up to you to decide how to react to this event. We highly recommend you stop playing whatever audio is being played, since the `voiceActivityStart` generally indicates the user wanted to interrupt the agent. Similarly, when our service detects that the user stopped speaking, it emits a `voiceActivityEnd` event: ```json { "type": "voiceActivityEnd" } ``` # `newAudioStream`: Handling New Audio Streams A `newAudioStream` message indicates the start the audio of a new response. It is recommended to clear your player buffer and start playing the new stream content upon receiving this message. This message contains no additional fields. # Error Handling Errors from the server are sent as `error` message type, a numeric code and a message in the following format: ```json { "type": "error", "code": , "message": "" } ``` The table below provides a quick reference to the various error codes and their corresponding messages for the Agent Websocket API. | Error Code | Error Message | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 1001 | Invalid authorization token. | | 1002 | Invalid agent id. | | 1003 | Invalid authorization credentials. | | 1005 | Not enough credits. | | 4400 | Invalid parameters. Indicates the message sent to the server failed to match the expected format. Double check the logic and try again. | | 4401 | Unauthorized. Invalid authorization token or invalid authorization credentials for specified agent. | | 4429 | You have reached the maximum number of concurrent connections allowed by your current plan. Please consider upgrading your plan or reducing the number of active connections to continue. | | 4500 | Generic error code for internal errors, such as failures to generate responses.
Generally, the user is not at fault when these happen. An appropriate reaction is to wait a few moments and try again. If the problem persists, contacting support is advised. | *** This documentation covers the essential aspects of interacting with the PlayAI Websocket API for agent conversations. Ensure that your implementation handles the specified message types and follows the outlined protocols for a seamless integration. # Get All PlayNotes Source: https://docs.play.ai/api-reference/playnote/get-all GET /api/v1/playnotes Retrieve all PlayNotes. # Get PlayNote Source: https://docs.play.ai/api-reference/playnote/get-id GET /api/v1/playnotes/{playNoteId} Retrieve all information about a PlayNote. # Create PlayNote Source: https://docs.play.ai/api-reference/playnote/post POST /api/v1/playnotes Create a new PlayNote by providing a file URL. Check out the [Generate Conversation from PDF with PlayNote API](/documentation/playnote/playnote-quickstart) guide for a step-by-step approach to using the PlayNote API to create a podcast-style conversation (and more!) from a PDF. After you create your PlayNotes, you can proceed to poll its status via the [Get PlayNote](/api-reference/playnote/get-id) endpoint. Note: You can have only **one active generation**. If you face this error code `403` with the message `{"errorMessage":"User already has an active generation","errorId":"UNAUTHORIZED"}` then please wait for some time and try again later. # Create Speech Source: https://docs.play.ai/api-reference/text-to-speech/endpoints/v1/create-speech POST /api/v1/tts Convert text to speech with our top-of-the-line PlayAI models. Convert text to speech with our top-of-the-line PlayAI models. This endpoint supports two models: * **Play 3.0 Mini**: Our fast and efficient model for single-voice text-to-speech. * **Dialog 1.0**: Our flagship model with best quality and multi-turn dialogue capabilities. We also offer **Dialog 1.0 Turbo** which is a faster version of Dialog 1.0 from [a separate endpoint](/api-reference/text-to-speech/endpoints/v1/stream-speech-turbo). For more information, see [Models](/documentation/text-to-speech/tts-models). Check out the [How to use Dialog 1.0 Text-to-Speech API](/documentation/tutorials/tts/dialogs/how-to-use-tts-api) guide for a step-by-step approach to using the Dialog 1.0 API to convert text into natural human-like sounding audio. Make sure to see the [Create a Multi-Turn Scripted Conversation with the Dialog 1.0 API](/documentation/tutorials/tts/dialogs/create-ai-podcast) guide for examples on how to create a multi-turn scripted conversation between two distinct speakers. # Get an Async TTS Job Source: https://docs.play.ai/api-reference/text-to-speech/endpoints/v1/get-async GET /api/v1/tts/{asyncTtsJobId} Gets the current status of an async TTS job. # List Voices Source: https://docs.play.ai/api-reference/text-to-speech/endpoints/v1/list-voices GET /api/v1/voices Get a list of all pre-built voices. # Stream Speech Source: https://docs.play.ai/api-reference/text-to-speech/endpoints/v1/stream-speech openapi POST /api/v1/tts/stream Streams the audio bytes with our ultra-fast text-in, audio-out API. Convert text to speech and receive audio bytes in real-time. This endpoint supports two models: * **Play 3.0 Mini**: Our fast and efficient model for single-voice text-to-speech. * **Dialog 1.0**: Our flagship model with best quality and multi-turn dialogue capabilities. We also offer **Dialog 1.0 Turbo** which is a faster version of Dialog 1.0 from [a separate endpoint](/api-reference/text-to-speech/endpoints/v1/stream-speech-turbo). For more information, see [Models](/documentation/text-to-speech/tts-models). Check out the [How to use Dialog 1.0 Text-to-Speech API](/documentation/tutorials/tts/dialogs/how-to-use-tts-api) guide for a step-by-step approach to using the Dialog 1.0 API to convert text into natural human-like sounding audio. Make sure to see the [Create a Multi-Turn Scripted Conversation with the Dialog 1.0 API](/documentation/tutorials/tts/dialogs/create-ai-podcast) guide for examples on how to create a multi-turn scripted conversation between two distinct speakers. # Stream Speech in Turbo Mode Source: https://docs.play.ai/api-reference/text-to-speech/endpoints/v1/stream-speech-turbo openapi-tts-dialog-turbo POST /api/v1/tts/stream Stream speech with our fastest and best-quality model, Dialog 1.0 Turbo. Convert text to speech and receive audio bytes in real-time in Turbo mode. This endpoint only supports **Dialog 1.0 Turbo**: Our fastest model with best quality and multi-turn dialogue capabilities, but with a narrower feature set than **Dialog 1.0** and **Play 3.0 Mini**. For more information, see [Models](/documentation/text-to-speech/tts-models). Check out the [How to use Dialog 1.0 Text-to-Speech API](/documentation/tutorials/tts/dialogs/how-to-use-tts-api) guide for a step-by-step approach to using the PlayAI API to convert text into natural human-like sounding audio. Or click "Try it" above to see the API in action! # Introduction Source: https://docs.play.ai/api-reference/text-to-speech/introduction Create lifelike speech via the API The PlayAI Text-to-Speech API enables you to convert written text into natural-sounding speech. Our API provides high-quality voice synthesis with multiple voices, languages, and customization options to suit your needs. ## Models We offer three models: * [**Dialog 1.0**](/documentation/text-to-speech/tts-models#dialog-1-0): Our flagship model with best quality and multi-turn dialogue capabilities. * [**Dialog 1.0 Turbo**](/documentation/text-to-speech/tts-models#dialog-1-0-turbo): A faster version of Dialog 1.0, available exclusively via the [Dialog 1.0 Turbo endpoint](/api-reference/text-to-speech/endpoints/v1/stream-speech-turbo). * [**Play 3.0 Mini**](/documentation/text-to-speech/tts-models#play-3-0-mini): Our fast and efficient model for single-voice text-to-speech. ## Features * Multiple voice options with different accents and styles * Support for various languages and dialects * Adjustable speech parameters (speed, pitch, volume) * Real-time streaming capabilities * High-quality audio output in multiple formats # Websocket API Source: https://docs.play.ai/api-reference/text-to-speech/websocket Enhance your app with our audio-in, audio-out API, enabling seamless, natural conversations with your PlayAI agent. Transform your user experience with the power of voice. To fully leverage our WebSocket API, the steps are: * Send a POST request to `https://api.play.ai/api/v1/websocket-auth` with `Authorization: Bearer ` and `X-User-Id: ` headers * Receive a JSON response with a `webSocketUrls` field containing the WebSocket URL according to the desired model * Connect to the provided websocket URL * Send TTS commands with the same options as our [TTS streaming API](/api-reference/text-to-speech/endpoints/v1/stream-speech), but in `snake_case`, e.g., `{"text":"Hello World","voice":"...","output_format":"mp3"}` * Receive audio output as binary messages ## Prerequisites * [Access credentials](https://play.ai/api/keys) to get your API key and User ID. # Quickstart - Runnable Demo If you want to get started quickly, you can clone the [`play-showcase`](https://github.com/playht/playai-showcase) repository and run the [`tts-websocket`](https://github.com/playht/playai-showcase/tree/main/tts-websocket) app locally. ```shell # Clone this repository git clone https://github.com/playht/playai-showcase.git # Navigate to the tts-websocket demo app cd tts-websocket # NPM install npm install # Run the server and follow the instructions npm start ```
# Establishing a WebSocket Connection To establish a WebSocket connection, you will need to send a POST request to the `https://api.play.ai/api/v1/tts/websocket-auth` endpoint with the following headers: ```Text HTTP Authorization: Bearer X-User-Id: Content-Type: application/json ``` You can obtain your `api_key` and `user_id` from your [PlayAI account](https://play.ai/api/keys). The response will contain a JSON object with a `webSocketUrls` field that you can use to connect to the WebSocket server according to the desired model. ```json { "webSocketUrls": { "Play3.0-mini": "wss://ws.fal.run/playht-fal/playht-tts/stream?fal_jwt_token=", "PlayDialog": "wss://ws.fal.run/playht-fal/playht-tts-ldm/stream?fal_jwt_token=", "PlayDialogMultilingual": "wss://ws.fal.run/playht-fal/playht-tts-multilingual-ldm/stream?fal_jwt_token=" }, "expiresAt": "2025-01-06T05:13:04.650Z" } ``` After this point, you can forward the `webSocketUrls[]` to your WebSocket client to establish a connection, such as in the following example: ```javascript const ws = new WebSocket('wss://ws.fal.run/playht-fal/playht-tts/stream?fal_jwt_token='); ``` The WebSocket connection duration is **1 hour**. After this period, you will need to re-authenticate and establish a new connection.
# Sending TTS Commands Once connected to the WebSocket, you can send TTS commands as JSON messages. The structure of these commands is similar to our [TTS streaming API](/api-reference/text-to-speech/endpoints/v1/stream-speech), but in `snake_case`. Here's an example: ```javascript const ttsCommand = { text: 'Hello, world! This is a test of the PlayAI TTS WebSocket API.', voice: 's3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d20a1/jennifersaad/manifest.json', output_format: 'mp3', temperature: 0.7, }; ws.send(JSON.stringify(ttsCommand)); ``` Examples of the [available options for the TTS command](/api-reference/text-to-speech/endpoints/v1/stream-speech) are: * `request_id` (optional): A unique identifier for the request, useful for correlating responses (see more details below). * `text` (required): The text to be converted to speech. * `voice` (required): The voice ID or URL to use for synthesis. * `output_format` (optional): The desired audio format (default is "mp3"). * `temperature` (optional): Controls the randomness of the generated speech (0.0 to 1.0). * `speed` (optional): The speed of the generated speech (0.5 to 2.0). For the complete list of parameters, refer to the [TTS API documentation](/api-reference/text-to-speech/endpoints/v1/stream-speech).
# Receiving Audio Output If you send a sequence of TTS commands, the audio output will be in the same order as the requests. After sending a TTS command, you'll receive two kinds of messages: * One initial text message with the format `{"type":"start","request_id":}` to acknowledge the request. * The audio output as a series of binary messages. * One final text message with the format `{"type":"end","request_id":}` to indicate the end of the audio stream. * In this response message, `request_id` is the unique identifier you provided in the TTS command, or `null` if you didn't provide one. To handle these messages and play the audio, you can use the following approach: ```javascript let audioChunks = []; ws.onmessage = (event) => { if (event.data instanceof Blob) { // Received binary audio data audioChunks.push(event.data); } else { // Received a text message (e.g., request_id ) const message = JSON.parse(event.data); if (message.type === 'end') { // If you provided a request_id, you can use it to correlate responses // End of audio stream, play the audio // If you specified a different output_format, you may need to adjust the audio player logic accordingly const audioBlob = new Blob(audioChunks, { type: 'audio/mpeg' }); const audioUrl = URL.createObjectURL(audioBlob); const audio = new Audio(audioUrl); audio.play(); // Clear the audio chunks for the next request audioChunks = []; } } }; ``` This code collects the binary audio chunks as they arrive and combines them into a single audio blob when the *End or Request* message (`{"type":"end","request_id":}`) is received. It then creates an audio URL and plays the audio using the Web Audio API.
# Error Handling It's important to implement error handling in your WebSocket client. Here's an example of how to handle errors and connection closures: ```javascript ws.onerror = (error) => { console.error('WebSocket Error:', error); }; ws.onclose = (event) => { console.log('WebSocket connection closed:', event.code, event.reason); // Implement reconnection logic if needed }; ```
# Best Practices 1. **Authentication**: Always keep your API key secure. While the WebSocket URL can be shared with client-side code, the API Key and User ID should be kept private. 2. **Error Handling**: Implement robust error handling and reconnection logic in your WebSocket client. 3. **Resource Management**: Close the WebSocket connection when it's no longer needed to free up server resources. 4. **Rate Limiting**: Be aware of [rate limits](/documentation/resources/rate-limits) on the API and implement appropriate throttling in your application. 5. **Testing**: Thoroughly test your implementation with various inputs and network conditions to ensure reliability. By following these guidelines and using the provided examples, you can effectively integrate the PlayAI TTS WebSocket API into your application, enabling real-time text-to-speech functionality with low latency and high performance. # Agent Flutter SDK Source: https://docs.play.ai/documentation/agent-sdks/flutter-sdk Integrate AI agents into your Flutter applications ## ๐Ÿš€ Features * ๐ŸŽ™๏ธ **Two-way voice conversations** with AI agents * ๐Ÿ”Š **Voice Activity Detection (VAD)** for natural conversations * ๐Ÿง  **Custom actions** that allow agents to trigger code in your app * ๐Ÿ“ฑ **Cross-platform** - works on iOS, Android, and Web * ๐Ÿ”Œ **Audio session management** for handling interruptions and device changes * ๐Ÿ“ **Real-time transcripts** of both user and agent speech * ๐Ÿšฆ **Rich state management** with ValueNotifiers for UI integration ## Installation Add the package to your `pubspec.yaml`: ```yaml dependencies: agents: git: url: https://github.com/playht/agents-client-sdk-flutter.git ref: main ``` Then save, or run: ```bash flutter pub get ``` ### Platform Configuration #### iOS 1. Add the following to your `Info.plist`: ```xml NSMicrophoneUsageDescription We need access to your microphone to enable voice conversations with the AI agent. ``` 2. Add the following to your `Podfile`, since we depend on `permission_handler` to manage permissions and `audio_session` to manage audio sessions. ``` post_install do |installer| installer.pods_project.targets.each do |target| target.build_configurations.each do |config| config.build_settings['GCC_PREPROCESSOR_DEFINITIONS'] ||= [ '$(inherited)', # audio_session settings 'AUDIO_SESSION_MICROPHONE=0', # For microphone access 'PERMISSION_MICROPHONE=1' end end end ``` 3. Due to an [issue](https://github.com/gtbluesky/onnxruntime_flutter/issues/24) of the Onnx Runtime getting stripped by XCode when archived, you need to follow these steps in XCode for the voice activity detector (VAD) to work on iOS builds: * Under "Targets", choose "Runner" (or your project's name) * Go to "Build Settings" tab * Filter for "Deployment" * Set "Stripped Linked Product" to "No" * Set "Strip Style" to "Non-Global-Symbols" #### Android 1. Add the following permissions to your `AndroidManifest.xml`: ```xml ``` 2. Add the following to `android/gradle.properties` (unless they're already there): ``` android.useAndroidX=true android.enableJetifier=true ``` 3. Add the following settings to `android/app/build.gradle`: ``` android { compileSdkVersion 34 ... } ``` #### Web For VAD to work on web platforms, please following the instructions [here](https://pub.dev/packages/vad#web). ## Getting Started ### 1. Create an Agent on PlayAI Follow the instructions [here](/documentation/agent/agent-quickstart) to create an agent on PlayAI. ### 2. Implement the Agent in Your Flutter App ```dart [expandable] final agent = Agent( // Replace with your agent ID from PlayAI agentId: 'your-agent-id-here', // Customize your agent's behavior prompt: 'You are a helpful assistant who speaks in a friendly, casual tone.', // Define actions the agent can take in your app actions: [ AgentAction( name: 'show_weather', triggerInstructions: 'Trigger this when the user asks about weather.', argumentSchema: { 'city': AgentActionParameter( type: 'string', description: 'The city to show weather for', ), }, callback: (data) async { final city = data['city'] as String; // In a real app, you would fetch weather data here return 'Weather data fetched for $city!'; }, ), ], // Configure callbacks to respond to agent events callbackConfig: AgentCallbackConfig( // Get user speech transcript onUserTranscript: (text) { setState(() => _messages.add(ChatMessage(text, isUser: true))); }, // Get agent speech transcript onAgentTranscript: (text) { setState(() => _messages.add(ChatMessage(text, isUser: false))); }, // Handle any errors onError: (error, isFatal) { ScaffoldMessenger.of(context).showSnackBar( SnackBar(content: Text('Error: $error')), ); }, ), ); ``` ### 3. Connect the Agent to Start a Conversation ```dart await agent.connect(); ``` ### 4. Mute and Unmute the User during a Conversation ```dart await agent.muteUser(); await agent.unmuteUser(); ``` ### 5. Disconnect the Agent ```dart await agent.disconnect(); ``` ## Key Features ### Monitor the Agent's State 1. `AgentState`: The agent can be in one of four states: * `idle`: Not connected to a conversation * `connecting`: In the process of establishing a connection * `connected`: Connected and ready to converse * `disconnecting`: In the process of ending a conversation 2. `Agent` also exposes `ValueListenable`s which you can listen to for changes in the agent's state. ```dart ValueListenableBuilder( valueListenable: agent.isUserSpeakingNotifier, builder: (context, isUserSpeaking, _) => Text('User is speaking: $isUserSpeaking'), ) ``` 3. Pass callbacks as `AgentCallbackConfig` to the `Agent` constructor to handle events from the agent. ```dart final config = AgentCallbackConfig( onUserTranscript: (text) => print('User just said: $text'), onAgentTranscript: (text) => print('Agent just said: $text'), ) final agent = Agent( // ... callbackConfig: config, ); ``` ### Agent Actions One of the most exciting features of the PlayAI Agents SDK is the ability to define custom actions that allow the agent to interact with your app. ```dart AgentAction( name: 'open_settings', triggerInstructions: 'Trigger this when the user asks to open settings', argumentSchema: { 'section': AgentActionParameter( type: 'string', description: 'The settings section to open', ), }, callback: (data) async { final section = data['section'] as String; // Navigate to settings section in your app return 'Opened $section settings'; }, ) ``` ### Developer Messages Send contextual information to the agent during a conversation to inform it of changes in your app. ```dart // When user navigates to a new screen void _onNavigate(String routeName) { agent.sendDeveloperMessage( 'User navigated to $routeName screen. You can now discuss the content on this page.', ); } // When relevant data changes void _onCartUpdated(List products) { agent.sendDeveloperMessage( 'User\'s cart has been updated, now containing: ${products.map((p) => p.name).join(", ")}.', ); } ``` ## Error Handling The package uses a robust error handling system with specific exception types: ```dart try { await agent.connect(); } on MicrophonePermissionDenied { // Handle microphone permission issues } on WebSocketConnectionError catch (e) { // Handle connection issues } on ServerError catch (e) { // Handle server-side errors if (e.isFatal) { // Handle fatal errors } } on AgentException catch (e) { // Handle all other agent exceptions print('Error code: ${e.code}, Message: ${e.readableMessage}'); } ``` ## Lifecycle Management Don't forget to dispose of the agent when it's no longer needed to free up resources. ```dart @override void dispose() { // Clean up resources agent.dispose(); super.dispose(); } ``` ## UI Integration Examples ### Mute Button ```dart ValueListenableBuilder( valueListenable: agent.isMutedNotifier, builder: (context, isMuted, _) => IconButton( icon: Icon(isMuted ? Icons.mic_off : Icons.mic), onPressed: () => isMuted ? agent.unmuteUser() : agent.muteUser(), tooltip: isMuted ? 'Unmute' : 'Mute', ), ) ``` ### Speaking Indicator ```dart ValueListenableBuilder( valueListenable: agent.isAgentSpeakingNotifier, builder: (context, isSpeaking, _) => AnimatedContainer( duration: const Duration(milliseconds: 300), width: 40, height: 40, decoration: BoxDecoration( shape: BoxShape.circle, color: isSpeaking ? Colors.blue : Colors.grey.shade300, ), child: Center( child: Icon( Icons.record_voice_over, size: 24, color: Colors.white, ), ), ), ) ``` ## Tips for Effective Usage 1. **Prompt Engineering**: Craft clear, specific prompts to guide agent behavior 2. **Action Design**: Design actions with clear trigger instructions and parameter descriptions 3. **Context Management**: Use `sendDeveloperMessage` to keep the agent updated on app state 4. **Error Handling**: Implement comprehensive error handling for a smooth user experience 5. **UI Feedback**: Use the provided `ValueListenable`s to give clear feedback on conversation state ## Acknowledgments * Voice Activity Detection powered by [vad](https://pub.dev/packages/vad) * Audio session management by [audio\_session](https://pub.dev/packages/audio_session) # Agent Web SDK Source: https://docs.play.ai/documentation/agent-sdks/web-sdk Integrate AI agents into your web applications ## Overview The **Agent Web SDK** is a TypeScript SDK that facilitates real-time, bi-directional audio conversations with your PlayAI Agent via [WebSocket API](/api-reference/agents/websocket). It takes care of the following: * WebSocket connection management * Microphone capture and voice activity detection (VAD) * Sending user audio to the Agent * Receiving Agent audio and playing it back in the browser * Managing event listeners such as user or agent transcripts * Muting/unmuting the user's microphone * Hanging up (ending) the agent conversation * Error handling This SDK is designed for modern web browsers that support the [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) and [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket). If you want to integrate PlayAI Agent into a Flutter app, check out our [Flutter SDK](/documentation/agent-sdks/flutter-sdk). We plan to support other platforms in the future. ## Installation ```bash npm npm install @play-ai/agent-web-sdk ``` ```bash yarn yarn add @play-ai/agent-web-sdk ``` ```bash pnpm pnpm add @play-ai/agent-web-sdk ``` ## Create agent To start a conversation with your agent, first create an agent in [PlayAI app](https://play.ai/my-agents). Once you have an agent, you can find the agent ID in the agent "Deploy ยท Web" section, which is required to connect to the agent. ## Basic usage Below is a simple example illustrating how to initiate a conversation with your agent using the `connectAgent` function: ```ts import { connectAgent } from '@play-ai/agent-web-sdk'; async function startConversation() { try { const agentController = await connectAgent('YOUR_AGENT_ID'); console.log('Connected to agent. Conversation ID:', agentController.conversationId); // Use agentController to control the conversation... } catch (error) { console.error('Failed to start conversation:', error); } } startConversation(); ``` The function `connectAgent` returns a Promise * If any error occurs during the connection process, the Promise is rejected. * When the conversation is successfully established, the Promise resolves to `AgentConnectionController` object. ## Config You can customize the agent's configuration by passing an optional `ConnectAgentConfig` object as the second parameter to `connectAgent`. ```ts const agentController = await connectAgent('YOUR_AGENT_ID', { debug: true, // Enable debug logging in the console customGreeting: 'Hello, and welcome to my custom agent!', // Override the default greeting prompt: 'You are an AI that helps with scheduling tasks.', // Append additional instructions to the agent's prompt continueConversation: 'PREVIOUS_CONVERSATION_ID', // Continue a previous conversation }); ``` **Config Options**: * **`debug`:** Enables debug logging for troubleshooting. * **`customGreeting`:** Overrides the default greeting used by the agent. * **`prompt`:** Appends additional instructions to the agent's core prompt. * **`continueConversation`:** An optional conversation ID to continue a previous conversation. * **`listeners`:** Attach various listener callbacks (see [Event listeners](#event-listeners) section). ## Event listeners Event listeners enable you to handle specific moments during the conversation: ```ts const agentController = await connectAgent('YOUR_AGENT_ID', { listeners: { onUserTranscript: (transcript) => console.log(`USER said: "${transcript}".`), onAgentTranscript: (transcript) => console.log(`AGENT will say: "${transcript}".`), onUserStartedSpeaking: () => console.log(`USER started speaking...`), onUserStoppedSpeaking: () => console.log(`USER stopped speaking.`), onAgentDecidedToSpeak: () => console.log(`AGENT decided to speak... (not speaking yet, just thinking)`), onAgentStartedSpeaking: () => console.log(`AGENT started speaking...`), onAgentStoppedSpeaking: () => console.log(`AGENT stopped speaking.`), onHangup: (endedBy) => console.log(`Conversation has ended by ${endedBy}`), onError: (err) => console.error(err), }, }); ``` ## Mute/unmute Once you have an active `AgentConnectionController` from `connectAgent`, you can mute or unmute the user's microphone: ```ts const agentController = await connectAgent('YOUR_AGENT_ID'); agentController.mute(); // The agent won't hear any mic data agentController.unmute(); // The agent hears the mic data again ``` ## Hangup Use `agentController.hangup()` to end the conversation from the user side. ```ts const agentController = await connectAgent('YOUR_AGENT_ID'); setTimeout(() => { // End the conversation after 60 seconds agentController.hangup(); }, 60000); ``` When the conversation ends (either by user or agent), the `onHangup` callback (if provided) is triggered. ## Error handling Errors can occur at different stages of the conversation: * Starting the conversation. For example: * Microphone permissions denied * WebSocket fails to connect or closes unexpectedly * Invalid agent ID * During the conversation. For example: * Agent fails to generate a response * Internal Agent errors * Network issues Errors that occur before the conversation starts are caught by the `connectAgent` Promise. You can handle these errors in the `catch` block. Errors that occur during the conversation are caught by the `onError` listener. ```ts import { connectAgent } from '@play-ai/agent-web-sdk'; async function startConversation() { try { const agentController = await connectAgent('YOUR_AGENT_ID', { listeners: { onError: (error) => { console.error('Error occurred:', error.description); if (error.isFatal) { // Possibly reconnection logic or UI error message } }, }, }); } catch (err) { console.error('Failed to start the conversation:', err); } } ``` **Error object**: ```ts interface ErrorDuringConversation { description: string; // Human-readable message isFatal: boolean; // Whether the error ended the conversation serverCode?: number; // If the server gave a specific error code wsEvent?: Event; // Low-level WebSocket event cause?: Error; // JS error cause } ``` ## Code example