ta-agent-sdk

v1.0.88

Published

a month ago

SDK for Cashew.ai Voice agent

0High
0Medium
0Low

sandeshveerani4

d-kar

TA Agent SDK

A lightweight TypeScript SDK for building realtime, voice-forward travel experiences in the browser. It wraps mic recording, audio playback, a resilient WebSocket session, and a simple event bus—so you can focus on your UI.

🔁 WebSocket session management (connect, mute/unmute, inactivity)
📡 Typed event bus (bi-directional custom events)
⚙️ Framework-agnostic (works fine in React, Vue, plain JS)

Package: `ta-agent-sdk`
Types: included
CSS: `ta-agent-sdk/styles.css`

Example:

Live Playground

Code of demo

Installation

npm i ta-agent-sdk
# or
yarn add ta-agent-sdk
# or
pnpm add ta-agent-sdk

Import the CSS once in your app (it includes styles for transcript/error UI the SDK may render):

import "ta-agent-sdk/styles.css";

Quick Start (React example)

import { useEffect, useMemo, useState } from "react";
import TAAudioAgent, { RecommendationData } from "ta-agent-sdk";
import "ta-agent-sdk/styles.css";
import { useTranscriptMessages } from "ta-agent-sdk/react";

function App() {
  const [connected, setConnected] = useState(false);
  const [muted, setMuted] = useState(true);
  const [recommendations, setRecommendations] = useState<RecommendationData[]>(
    []
  );
  const [events, setEvents] = useState<
    { type: string; payload: any; incoming: boolean }[]
  >([]);

  const agent = useMemo(() => {
    return new TAAudioAgent({
      backendHost: '[Backend host URL without specifying the protocol]',
      agent:'[Agent name which our company gave you]'
    });
  }, []);

  const { messages, appendMessage } = useTranscriptMessages({
    agent,
  });

  useEffect(() => {
    const playVideo = () => {
      const video = document.querySelector("video");
      video?.play();
    };
    const changeRecommendations = (data: RecommendationData[]) => {
      setRecommendations(data);
    };

    agent.onConnected = setConnected;
    agent.onMicStateChange = setMuted;

    agent.onEvent("switchRecommendations", changeRecommendations);
    agent.onEvent("playVideo", playVideo);

    agent.onAnyEvent = (type, payload, incoming) => {
      setEvents((prev) => [{ type, payload, incoming }, ...prev]);
    };

    return () => {
      agent.offEvent("switchRecommendations", changeRecommendations);
      agent.offEvent("playVideo", playVideo);
    };
  }, [agent]);

  // Render your UI here...
  return <button onClick={()=>agent.toggleMic()}>Mic here</button>;
}

export default App;

Event System

The SDK provides a typed event bus for bidirectional communication with your backend.

Sending Events

// Send a changePlace event to the server with location object / location id (should be a number)
agent.sendEvent("changePlace", location_obj);

Receiving Events

// Listen for specific events from the server
agent.onEvent("switchRecommendations", (recommendations) => {
  setRecommendations(recommendations);
});

agent.onEvent("playVideo", () => {
  const video = document.querySelector("video");
  video?.play();
});

Get started

Use agent.toggleMic() to open a WebSocket to your backend (derived from backendHost or a full wsUrl).

API

`new TAAudioAgent(options)`

Options

| Option | Type | Default | Description | | --------------------- | ------------------------ | ------- | --------------------------------------------------------------------------------------------------------------------------- | | backendHost | string | — | Hostname for secure WS (e.g. api.example.com). SDK resolves wss://{host}/ws/{sessionId}. Will be provided by the company. | | wsUrl | string | — | Full WebSocket URL. If provided, overrides backendHost. | | isAudio | boolean | true | Whether the connection should use audio (mic capture). Set to false for text-only usage. | | transcript | boolean | false | Enable SDK-managed transcript UI and server hint (?transcript=true). | | city | string | — | Optional city sent with every outbound user message payload to backend (unless that message already includes city). | | excludeLocationIds | number[] | — | Array of location IDs to exclude. | | threshold | number | 500 | RMS threshold for activity detection. | | inactivityCheck | boolean | false | Enable inactivity auto-mute/auto-disconnect timers. | | inactivityMuteMs | number | 30000 | Time to auto-mute when inactive. | | inactivityRestartMs | number | 90000 | Time to auto-disconnect when inactive and already muted. | | rmsTimeoutMs | number | 5000 | Time after which low RMS may auto-stop recording (when VAD is present). | | autoMute | boolean | false | Start muted and auto-mute mic while agent is speaking (push-to-talk style). | | speakerMuted | boolean | false | Start with speaker audio muted. | | onAnalytics | (name, payload) => void | — | Hook for tracking analytics (connect/close/error). | | onOpen | (ws: WebSocket) => void | — | Called when the socket opens. | | onClose | (ev: CloseEvent) => void | — | Called when the socket closes. | | onError | (ev: Event) => void | — | Called on socket error. |

If both wsUrl and backendHost are missing, the SDK defaults to ws://localhost:8000/ws/….

Properties & Callbacks

onConnected?: (connected: boolean) => void
Notifies when WS connects/disconnects. You can also poll agent.isConnected().
onMicStateChange?: (muted: boolean) => void
Fires when the mic is muted/unmuted (via your UI, inactivity timeout, or VAD stop).
onAnyEvent?: (type: string, payload: any, incoming: boolean) => void
Traces every custom event sent/received via the event bus.

Methods

toggleMic(): void
If disconnected → connects.
If connected & unmuted → mutes (stops recorder).
If connected & muted → unmutes (starts recorder).
destroy(): void
Close socket, stop worklets, release mic. Safe to call multiple times.
isConnected(): boolean
Current WS state.
sendMessage(message: any): void
Low-level JSON send (auto-stringified). Prefer sendEvent for app-level messages.
sendEvent(type: OutgoingEvents, message?: any): void
Sends an app event as { mime_type: "text/plain", event: { type, message } }.
onEvent(type: IncomingEvents, handler: (payload?: any) => void): void
Subscribe to a specific app event.
offEvent(type: IncomingEvents, handler): void
Unsubscribe.

Event types

// Outgoing (client -> server) and also handled incoming by the SDK
type BiDiEvents = "switchRecommendations";
type IncomingEvents =
  | BiDiEvents
  | "playVideo"
  | "speechStarted"
  | "speechEnded";
type OutgoingEvents = BiDiEvents | "changePlace";

// Current Outgoing events will be of either type LocationData or ExperienceData
export type RecommendationData = LocationData | ExperienceData;

Data Formats

The SDK uses two main data structures for recommendations: LocationData and ExperienceData.

LocationData

Represents a physical location (restaurant, attraction, hotel, etc.) with comprehensive details:

interface LocationData {
  location: Location; // Core location details
  type: string; // Type identifier (e.g., "location")
  review_sources?: ReviewSource[]; // Review snippets
  photos?: PhotosData; // Photo gallery data
  reviews?: ReviewData; // Detailed reviews
  enrichment_status?: string; // Data enrichment status
  enrichment_time_ms?: number; // Time taken to enrich data
}

interface Location {
  id: number;
  names: Name[]; // Multi-language names
  coordinates?: Coordinates; // GPS coordinates
  phone_numbers?: PhoneNumber[];
  addresses?: Address[];
  categories: Category[]; // POI categories/tags
  urls?: Urls; // TripAdvisor & official website URLs
  status?: Status; // Open/closed status
  opening_hours?: OpeningHours; // Business hours
  traveler_ratings?: TravelerRatings;
  overall_traveller_ratings?: OverallTravellerRatings;
  rankings?: Ranking[]; // TripAdvisor rankings
  awards?: Award[]; // Travel awards received
  descriptions?: Description[]; // Multi-language descriptions
  photos?: Photos; // Photo count info
  recommended_visit_length?: number; // Suggested visit duration
  neighborhoods?: Neighborhood[];
}

ExperienceData

Represents activities, tours, or experiences:

interface ExperienceData {
  experience: Experience; // Core experience details
  type: string; // Type identifier (e.g., "experience")
  photos?: PhotosData; // Photo gallery data
  reviews?: ReviewData; // Detailed reviews
  enrichment_status?: string; // Data enrichment status
  enrichment_time_ms?: number; // Time taken to enrich data
}

interface Experience {
  id: number;
  names: Name[]; // Multi-language names
  overall_traveller_ratings: OverallTravellerRatings;
  coordinates?: Coordinates; // GPS coordinates
  photo_url: string; // Main photo URL
  url: string; // TripAdvisor URL
  description: Description[]; // Multi-language descriptions
}

Common Sub-types

interface Name {
  language: string;
  value: string;
  primary?: boolean;
}

interface Coordinates {
  latitude: number;
  longitude: number;
}

interface OverallTravellerRatings {
  bubble_rating: number; // Rating out of 5
  total_review_count: number;
}

interface Description {
  value: string;
  language: string;
}

For complete type definitions, refer to the SDK's TypeScript declarations.

Styling

Import once:

import "ta-agent-sdk/styles.css";

It includes:

Basic transcript UI
Basic flash error styling

You can override styles in your app CSS if needed.

Environment & Browser Requirements

User gesture is required before audio playback (browsers block autoplay). Call agent.toggleMic() from a click/tap handler or first user action.
HTTPS recommended for mic permissions.