ta-agent-sdk
v1.0.87
Published
SDK for Cashew.ai Voice agent
Readme
TA Agent SDK
A lightweight TypeScript SDK for building realtime, voice-forward travel experiences in the browser. It wraps mic recording, audio playback, a resilient WebSocket session, and a simple event bus—so you can focus on your UI.
- 🔁 WebSocket session management (connect, mute/unmute, inactivity)
- 📡 Typed event bus (bi-directional custom events)
- ⚙️ Framework-agnostic (works fine in React, Vue, plain JS)
Package: ta-agent-sdk
Types: included
CSS: ta-agent-sdk/styles.css
Example:
Installation
npm i ta-agent-sdk
# or
yarn add ta-agent-sdk
# or
pnpm add ta-agent-sdkImport the CSS once in your app (it includes styles for transcript/error UI the SDK may render):
import "ta-agent-sdk/styles.css";Quick Start (React example)
import { useEffect, useMemo, useState } from "react";
import TAAudioAgent, { RecommendationData } from "ta-agent-sdk";
import "ta-agent-sdk/styles.css";
import { useTranscriptMessages } from "ta-agent-sdk/react";
function App() {
const [connected, setConnected] = useState(false);
const [muted, setMuted] = useState(true);
const [recommendations, setRecommendations] = useState<RecommendationData[]>(
[]
);
const [events, setEvents] = useState<
{ type: string; payload: any; incoming: boolean }[]
>([]);
const agent = useMemo(() => {
return new TAAudioAgent({
backendHost: '[Backend host URL without specifying the protocol]',
agent:'[Agent name which our company gave you]'
});
}, []);
const { messages, appendMessage } = useTranscriptMessages({
agent,
});
useEffect(() => {
const playVideo = () => {
const video = document.querySelector("video");
video?.play();
};
const changeRecommendations = (data: RecommendationData[]) => {
setRecommendations(data);
};
agent.onConnected = setConnected;
agent.onMicStateChange = setMuted;
agent.onEvent("switchRecommendations", changeRecommendations);
agent.onEvent("playVideo", playVideo);
agent.onAnyEvent = (type, payload, incoming) => {
setEvents((prev) => [{ type, payload, incoming }, ...prev]);
};
return () => {
agent.offEvent("switchRecommendations", changeRecommendations);
agent.offEvent("playVideo", playVideo);
};
}, [agent]);
// Render your UI here...
return <button onClick={()=>agent.toggleMic()}>Mic here</button>;
}
export default App;
Event System
The SDK provides a typed event bus for bidirectional communication with your backend.
Sending Events
// Send a changePlace event to the server with location object / location id (should be a number)
agent.sendEvent("changePlace", location_obj);Receiving Events
// Listen for specific events from the server
agent.onEvent("switchRecommendations", (recommendations) => {
setRecommendations(recommendations);
});
agent.onEvent("playVideo", () => {
const video = document.querySelector("video");
video?.play();
});Get started
Use agent.toggleMic() to open a WebSocket to your backend (derived from backendHost or a full wsUrl).
API
new TAAudioAgent(options)
Options
| Option | Type | Default | Description |
| --------------------- | ------------------------ | ------- | --------------------------------------------------------------------------------------------------------------------------- |
| backendHost | string | — | Hostname for secure WS (e.g. api.example.com). SDK resolves wss://{host}/ws/{sessionId}. Will be provided by the company. |
| wsUrl | string | — | Full WebSocket URL. If provided, overrides backendHost. |
| isAudio | boolean | true | Whether the connection should use audio (mic capture). Set to false for text-only usage. |
| transcript | boolean | false | Enable SDK-managed transcript UI and server hint (?transcript=true). |
| city | string | — | Optional city sent with every outbound user message payload to backend (unless that message already includes city). |
| excludeLocationIds | number[] | — | Array of location IDs to exclude. |
| threshold | number | 500 | RMS threshold for activity detection. |
| inactivityCheck | boolean | false | Enable inactivity auto-mute/auto-disconnect timers. |
| inactivityMuteMs | number | 30000 | Time to auto-mute when inactive. |
| inactivityRestartMs | number | 90000 | Time to auto-disconnect when inactive and already muted. |
| rmsTimeoutMs | number | 5000 | Time after which low RMS may auto-stop recording (when VAD is present). |
| autoMute | boolean | false | Start muted and auto-mute mic while agent is speaking (push-to-talk style). |
| speakerMuted | boolean | false | Start with speaker audio muted. |
| onAnalytics | (name, payload) => void | — | Hook for tracking analytics (connect/close/error). |
| onOpen | (ws: WebSocket) => void | — | Called when the socket opens. |
| onClose | (ev: CloseEvent) => void | — | Called when the socket closes. |
| onError | (ev: Event) => void | — | Called on socket error. |
If both wsUrl and backendHost are missing, the SDK defaults to ws://localhost:8000/ws/….
Properties & Callbacks
onConnected?: (connected: boolean) => void
Notifies when WS connects/disconnects. You can also pollagent.isConnected().onMicStateChange?: (muted: boolean) => void
Fires when the mic is muted/unmuted (via your UI, inactivity timeout, or VAD stop).onAnyEvent?: (type: string, payload: any, incoming: boolean) => void
Traces every custom event sent/received via the event bus.
Methods
toggleMic(): void
If disconnected → connects.
If connected & unmuted → mutes (stops recorder).
If connected & muted → unmutes (starts recorder).destroy(): void
Close socket, stop worklets, release mic. Safe to call multiple times.isConnected(): boolean
Current WS state.sendMessage(message: any): void
Low-level JSON send (auto-stringified). PrefersendEventfor app-level messages.sendEvent(type: OutgoingEvents, message?: any): void
Sends an app event as{ mime_type: "text/plain", event: { type, message } }.onEvent(type: IncomingEvents, handler: (payload?: any) => void): void
Subscribe to a specific app event.offEvent(type: IncomingEvents, handler): void
Unsubscribe.
Event types
// Outgoing (client -> server) and also handled incoming by the SDK
type BiDiEvents = "switchRecommendations";
type IncomingEvents =
| BiDiEvents
| "playVideo"
| "speechStarted"
| "speechEnded";
type OutgoingEvents = BiDiEvents | "changePlace";
// Current Outgoing events will be of either type LocationData or ExperienceData
export type RecommendationData = LocationData | ExperienceData;Data Formats
The SDK uses two main data structures for recommendations: LocationData and ExperienceData.
LocationData
Represents a physical location (restaurant, attraction, hotel, etc.) with comprehensive details:
interface LocationData {
location: Location; // Core location details
type: string; // Type identifier (e.g., "location")
review_sources?: ReviewSource[]; // Review snippets
photos?: PhotosData; // Photo gallery data
reviews?: ReviewData; // Detailed reviews
enrichment_status?: string; // Data enrichment status
enrichment_time_ms?: number; // Time taken to enrich data
}
interface Location {
id: number;
names: Name[]; // Multi-language names
coordinates?: Coordinates; // GPS coordinates
phone_numbers?: PhoneNumber[];
addresses?: Address[];
categories: Category[]; // POI categories/tags
urls?: Urls; // TripAdvisor & official website URLs
status?: Status; // Open/closed status
opening_hours?: OpeningHours; // Business hours
traveler_ratings?: TravelerRatings;
overall_traveller_ratings?: OverallTravellerRatings;
rankings?: Ranking[]; // TripAdvisor rankings
awards?: Award[]; // Travel awards received
descriptions?: Description[]; // Multi-language descriptions
photos?: Photos; // Photo count info
recommended_visit_length?: number; // Suggested visit duration
neighborhoods?: Neighborhood[];
}ExperienceData
Represents activities, tours, or experiences:
interface ExperienceData {
experience: Experience; // Core experience details
type: string; // Type identifier (e.g., "experience")
photos?: PhotosData; // Photo gallery data
reviews?: ReviewData; // Detailed reviews
enrichment_status?: string; // Data enrichment status
enrichment_time_ms?: number; // Time taken to enrich data
}
interface Experience {
id: number;
names: Name[]; // Multi-language names
overall_traveller_ratings: OverallTravellerRatings;
coordinates?: Coordinates; // GPS coordinates
photo_url: string; // Main photo URL
url: string; // TripAdvisor URL
description: Description[]; // Multi-language descriptions
}Common Sub-types
interface Name {
language: string;
value: string;
primary?: boolean;
}
interface Coordinates {
latitude: number;
longitude: number;
}
interface OverallTravellerRatings {
bubble_rating: number; // Rating out of 5
total_review_count: number;
}
interface Description {
value: string;
language: string;
}For complete type definitions, refer to the SDK's TypeScript declarations.
Styling
Import once:
import "ta-agent-sdk/styles.css";It includes:
- Basic transcript UI
- Basic flash error styling
You can override styles in your app CSS if needed.
Environment & Browser Requirements
- User gesture is required before audio playback (browsers block autoplay). Call
agent.toggleMic()from a click/tap handler or first user action. - HTTPS recommended for mic permissions.
License
MIT © Cashew.ai Team
