perso-interactive-sdk-web

v1.3.4

Published

7 days ago

Perso Interactive Web SDK - WebRTC-based real-time interactive AI avatar sessions

0High
0Medium
0Low

sw_est

perso interactive ai avatar webrtc llm tts stt stf real-time

perso-interactive-sdk-web

WebRTC-based real-time interactive AI avatar SDK for web applications.

Installation

# npm
npm install perso-interactive-sdk-web

# yarn
yarn add perso-interactive-sdk-web

# pnpm
pnpm add perso-interactive-sdk-web

Usage

📖 Looking for step-by-step examples? See the Example Guide for annotated code snippets covering LLM, TTS, STT, STF, and full pipeline patterns.

The SDK provides two entry points:

Server-side (`perso-interactive-sdk-web/server`)

Use this module in Node.js server environments to create sessions securely without exposing your API key. The client examples below (ES Module, TypeScript, IIFE) all call this server endpoint to obtain a sessionId.

Express.js Example

This example uses Express. Install the required packages:

# npm
npm install express perso-interactive-sdk-web

# yarn
yarn add express perso-interactive-sdk-web

# pnpm
pnpm add express perso-interactive-sdk-web

// server.js
const express = require("express");
const { createSessionId } = require("perso-interactive-sdk-web/server");

const app = express();

const API_SERVER = "https://live-api.perso.ai";
const API_KEY = process.env.PERSO_INTERACTIVE_API_KEY;

app.post("/api/session", async (req, res) => {
  try {
    const sessionId = await createSessionId(API_SERVER, API_KEY, {
      using_stf_webrtc: true,
      model_style: "<model_style_name>",
      prompt: "<prompt_id>",
      llm_type: "<llm_name>",
      tts_type: "<tts_name>",
      stt_type: "<stt_name>",
      // text_normalization_config: "<textnormalizationconfig_id>", // optional
    });
    res.json({ sessionId });
  } catch (error) {
    console.error("Session creation failed:", error);
    res.status(500).json({ error: "Failed to create session" });
  }
});

app.listen(3000, () => console.log("Server running on port 3000"));

Using a SessionTemplate

If you have pre-configured session templates, pass the template ID directly instead of assembling params manually:

const sessionId = await createSessionId(API_SERVER, API_KEY, "<sessiontemplate_id>");

⚠️ Security Warning: Never use createSessionId on the client-side in production. Exposing your API key in browser code can lead to unauthorized access and quota abuse. Always create sessions on the server and pass only the sessionId to the client.

Client-side Testing Only

⚠️ Warning: The following example exposes your API key in the browser. Use this only for local testing. Never deploy this to production. If your API key is compromised due to client-side usage, the SDK provider assumes no responsibility.

import {
  createSessionId,
  createSession,
} from "perso-interactive-sdk-web/client";

const apiServer = "https://live-api.perso.ai";
const apiKey = "YOUR_API_KEY"; // ⚠️ NEVER commit or expose this in production

const sessionId = await createSessionId(apiServer, apiKey, {
  using_stf_webrtc: true,
  model_style: "<model_style_name>",
  prompt: "<prompt_id>",
  llm_type: "<llm_name>",
  tts_type: "<tts_name>",
  stt_type: "<stt_name>",
  // text_normalization_config: "<textnormalizationconfig_id>", // optional
});

const session = await createSession(apiServer, sessionId, 1920, 1080, []);

const videoEl = document.getElementById("video");
if (videoEl instanceof HTMLVideoElement) {
  session.setSrc(videoEl);
}

Client-side (`perso-interactive-sdk-web/client`)

Use this module in browser environments to create and manage interactive sessions.

import {
  createSession,
  ChatTool,
  ChatState,
} from "perso-interactive-sdk-web/client";

const apiServer = "https://live-api.perso.ai";

// Obtain sessionId from your server (see Express.js example above)
const sessionId = await fetch("/api/session", { method: "POST" })
  .then((res) => res.json())
  .then((data) => data.sessionId);

// Create a session
const session = await createSession(apiServer, sessionId, 1920, 1080, []);

// Bind to video element
const videoEl = document.getElementById("video");
if (videoEl instanceof HTMLVideoElement) {
  session.setSrc(videoEl);
}

// Subscribe to chat states
session.subscribeChatStates((states) => {
  console.log("Chat states:", states);
});

// Subscribe to chat log
session.subscribeChatLog((chatLog) => {
  console.log("Chat log:", chatLog);
});

Chat (Recommended) — processLLM → processTTS → processSTF

Full pipeline with individual step control. Use this when you need to handle each stage (LLM response, TTS audio, avatar animation) separately.

// 1. Get LLM response
const llmGenerator = session.processLLM({ message: "Hello!" });
let llmResponse = "";
for await (const chunk of llmGenerator) {
  if (chunk.type === "message" && chunk.finish) {
    llmResponse = chunk.message;
  }
}

// 2. Convert text to speech
const audioBlob = await session.processTTS(llmResponse);

// 3. Animate avatar with audio
if (audioBlob) {
  await session.processSTF(audioBlob, "wav", llmResponse);
}

With voice input (STT → LLM → TTS → STF):

await session.startProcessSTT();
const text = await session.stopProcessSTT();
// Pass `text` to the processLLM pipeline above

Chat (Simple) — processChat

All-in-one call that runs LLM → TTS → STF internally. Use this when you don't need control over individual steps.

session.processChat("Hello!");

Direct Speech — processTTSTF

Avatar speaks text directly without LLM. Useful for scripted greetings, announcements, or guided messages.

session.processTTSTF("Welcome! How can I help you today?");

// Stop session
session.stopSession();

Client Tool Calling

Define custom tools that the LLM can invoke:

import { ChatTool } from "perso-interactive-sdk-web/client";

const weatherTool = new ChatTool(
  "get_weather",
  "Get current weather for a location",
  {
    type: "object",
    properties: {
      location: { type: "string", description: "City name" },
    },
    required: ["location"],
  },
  async (args) => {
    // Your implementation
    return { temperature: 22, condition: "Sunny" };
  },
  false, // executeOnly: if true, no follow-up LLM response
);

const session = await createSession(
  apiServer,
  sessionId,
  width,
  height,
  [weatherTool]
);

Browser (IIFE)

For direct browser usage via <script> tag without a bundler. The SDK exposes a global PersoInteractive namespace:

<script src="https://cdn.jsdelivr.net/npm/perso-interactive-sdk-web@latest/dist/client/index.iife.js"></script>
<script>
  async function start() {
    const apiServer = "https://live-api.perso.ai";

    // Obtain sessionId from your server (see Express.js example above)
    const sessionId = await fetch("/api/session", { method: "POST" })
      .then((res) => res.json())
      .then((data) => data.sessionId);

    const session = await PersoInteractive.createSession(
      apiServer,
      sessionId,
      1920,
      1080,
      []
    );

    const videoEl = document.getElementById("video");
    if (videoEl instanceof HTMLVideoElement) {
      session.setSrc(videoEl);
    }
  }

  start();
</script>

Note: The browser examples above call POST /api/session on your server. See the Express.js example for the server implementation. Never expose your API key in client-side code.

Example Guide

📖 Example Guide: English

API Reference

Server Exports

| Export | Description | | ---------------------------------------------- | ------------------------------ | | createSessionId(apiServer, apiKey, sessionTemplateId) | Create a session ID from a SessionTemplate | | createSessionId(apiServer, apiKey, params) | Create a new session ID | | getIntroMessage(apiServer, apiKey, promptId) | Get intro message for a prompt | | getSessionTemplates(apiServer, apiKey) | Get available session templates | | getSessionTemplate(apiServer, apiKey, sessionTemplateId) | Get a single session template by ID | | PersoUtilServer | Low-level API utilities | | ApiError | Error class for API errors |

Client Exports

| Export | Description | | ---------------------------------------------------------------------------------- | ---------------------------------------------------------- | | createSession(apiServer, sessionId, width, height, clientTools) | Create a session | | Session | Session class | | ChatTool | Client tool class | | ChatState | Enum for chat states (RECORDING, LLM, ANALYZING, SPEAKING, TTS) | | getLLMs(apiServer, apiKey) | Get available LLM providers | | getTTSs(apiServer, apiKey) | Get available TTS providers | | getSTTs(apiServer, apiKey) | Get available STT providers | | getModelStyles(apiServer, apiKey) | Get available avatar styles | | getBackgroundImages(apiServer, apiKey) | Get available backgrounds | | getPrompts(apiServer, apiKey) | Get available prompts | | getDocuments(apiServer, apiKey) | Get available documents | | getMcpServers(apiServer, apiKey) | Get available MCP servers | | getTextNormalizations(apiServer, apiKey) | Get available text normalization configs | | getTextNormalization(apiServer, apiKey, configId) | Download text normalization ruleset (pre-signed URL) | | getAllSettings(apiServer, apiKey) | Get all settings at once | | getSessionInfo(apiServer, sessionId) | Get session metadata | | makeTTS(apiServer, params) | Generate TTS audio from text (standalone) | | createSessionId(apiServer, apiKey, sessionTemplateId) | Create session ID from a SessionTemplate (exposes API key) | | createSessionId(apiServer, apiKey, params) | Create session ID (exposes API key in browser) | | getSessionTemplates(apiServer, apiKey) | Get available session templates | | ApiError | Error class for API errors | | LLMError | Error class for LLM errors | | LLMStreamingResponseError | Error class for streaming errors | | STTError | Error class for STT errors | | TTSError | Error class for TTS errors | | TTSDecodeError | Error class for TTS decode errors | | LlmProcessor | Standalone LLM streaming processor | | WavRecorder | Audio recorder producing WAV files | | createWavRecorder(options?) | Factory function for WavRecorder | | getWavSampleRate(wavData) | Extract sample rate from WAV data | | TTS_TARGET_SAMPLE_RATE | TTS target sample rate constant (16000) |

Session Methods

| Method | ----------------------------------- | setSrc(videoElement) | processChat(message) | processLLM(options) | processTTSTF(message) | processTTS(message, options?) | processSTF(file, format, | startProcessSTT(timeout?) | stopProcessSTT(language?) | isSTTRecording() | transcribeAudio(audio, | getMessageHistory() | getRemoteStream() | getLocalStream() | getSessionId() | clearBuffer() | changeSize(width, height) | logSessionEvent(detail?) | stopSession() | subscribeChatStates(callback) | subscribeChatLog(callback) | setSttResultCallback(callback) | setErrorHandler(callback) | onClose(callback) | Description | | ---------------------------------------------- | | Bind session to video element | | Send a message to the LLM | | Stream LLM responses with full control | | Speak a message without LLM | | Generate TTS audio from text (returns Blob). Options: resample, locale, output_format | message) | Send audio/video to STF pipeline | | Start recording voice for STT | | Stop recording and get text | | Check if STT recording is in progress | language?) | Transcribe audio Blob/File to text | | Get LLM conversation history | | Get AI human's media stream | | ~~Get user's audio stream~~ (Deprecated) | | Get session ID | | Stop AI human speaking | | Resize the avatar canvas | | Send a SESSION_LOG event (string or object) | | Close the session | | Subscribe to state changes | | Subscribe to chat log updates | | Set STT result callback | | Subscribe to errors | | Subscribe to session close |

Session Properties

| Property | Type | Description | | ------------------------ | -------------- | ---------------------------------------------- | | lastRecordedAudioFile | File \| null | Last recorded WAV audio file from STT |

For detailed API documentation, see api-docs.md.

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

perso-interactive-sdk-web

Installation

Usage

Server-side (perso-interactive-sdk-web/server)

Express.js Example

Using a SessionTemplate

Client-side Testing Only

Client-side (perso-interactive-sdk-web/client)

Chat (Recommended) — processLLM → processTTS → processSTF

Chat (Simple) — processChat

Direct Speech — processTTSTF

Client Tool Calling

Browser (IIFE)

Example Guide

API Reference

Server Exports

Client Exports

Session Methods

Session Properties

License

Server-side (`perso-interactive-sdk-web/server`)

Client-side (`perso-interactive-sdk-web/client`)