@eleven-am/hu-sdk
v0.0.6
Published
TypeScript SDK for building voice agents on the Voice Gateway
Maintainers
Readme
HU SDK (TypeScript)
TypeScript SDK for building voice agents on the Voice Gateway.
Installation
npm install @eleven-am/hu-sdkQuick Start
import { VoiceAgent, ConnectionModes } from '@eleven-am/hu-sdk';
const agent = new VoiceAgent({
apiKey: 'sk-voice-xxx',
gatewayUrl: 'wss://gateway.example.com',
mode: ConnectionModes.WebSocket,
});
agent
.onUtterance(async (ctx) => {
console.log(`User said: ${ctx.text}`);
// Stream response
ctx.sendDelta('Hello ');
ctx.sendDelta('World!');
ctx.done();
})
.onInterrupt((sessionId, reason) => {
console.log(`Interrupted: ${reason}`);
})
.onError((error) => {
console.error('Error:', error);
});
await agent.connect();Streaming with LLM
import { VoiceAgent } from '@eleven-am/hu-sdk';
import OpenAI from 'openai';
const openai = new OpenAI();
const agent = new VoiceAgent({
apiKey: process.env.VOICE_API_KEY!,
gatewayUrl: process.env.GATEWAY_URL!,
});
agent.onUtterance(async (ctx) => {
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: ctx.text }],
stream: true,
});
for await (const chunk of stream) {
if (ctx.abortSignal.aborted) break;
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
ctx.sendDelta(delta);
}
}
ctx.done();
});
await agent.connect();Using Vision (Video Frames)
Agents with vision scope can request video frames from the user's session:
agent.onUtterance(async (ctx) => {
// Check if vision context is available
if (ctx.vision?.available) {
console.log('Auto-analyzed:', ctx.vision.description);
}
// Request raw frames for custom analysis
const frames = await ctx.requestFrames({
limit: 5,
rawBase64: true,
});
if (frames.frames) {
for (const frame of frames.frames) {
// frame.base64 contains the image data
// frame.timestamp is when it was captured
}
}
// Or get pre-analyzed descriptions
const analyzed = await ctx.requestFrames({ limit: 3 });
if (analyzed.descriptions) {
console.log('Frame descriptions:', analyzed.descriptions);
}
ctx.done('I can see what you\'re showing me!');
});Using Memory
Agents with memory scope can query the user's stored facts:
agent.onUtterance(async (ctx) => {
// Query relevant memories
const memories = await ctx.queryMemory({
query: ctx.text,
topK: 5,
threshold: 0.7,
types: ['preference', 'fact'],
});
if (memories.facts && memories.facts.length > 0) {
const context = memories.facts
.map(f => f.content)
.join('\n');
// Use memories as context for LLM
const response = await generateWithContext(ctx.text, context);
ctx.done(response);
} else {
ctx.done('I don\'t have any relevant memories about that.');
}
});Routing Filters
Agents can register filters to control which utterances are routed to them. Filters are evaluated server-side for efficient routing in multi-agent setups:
await agent.connect();
// Register filters after connecting
agent.registerFilters({
// Match utterances containing these entity types or values
entities: ['PERSON', 'John'],
// Match utterances about these topics
topics: ['weather', 'travel'],
// Match utterances containing these keywords
keywords: ['urgent', 'help'],
// Match specific speakers
speakers: ['user'],
// Number of previous utterances to include for context (used with "filtered" tier)
includeContext: 5,
// Data access tier - controls what data the agent receives:
// - "full": everything (whole conversation stream)
// - "filtered": matching messages + context window (default)
// - "summary": just {entities, topics} - no text
tier: 'filtered',
});Filters can be updated at any time while connected. The gateway will apply the new filters to subsequent utterances.
Handling Interrupts
When the user starts speaking, the gateway sends an interrupt. Use the abort signal to stop processing:
agent.onUtterance(async (ctx) => {
for await (const chunk of streamResponse(ctx.text)) {
// Check before each operation
if (ctx.abortSignal.aborted) {
console.log('User interrupted, stopping');
return;
}
ctx.sendDelta(chunk);
}
ctx.done();
});
agent.onInterrupt((sessionId, reason) => {
// reason: "new_user_speech" | "lost_arbitration" | "supersede"
console.log(`Session ${sessionId} interrupted: ${reason}`);
});Configuration
interface VoiceAgentConfig {
apiKey: string; // Your API key (sk-voice-xxx)
gatewayUrl: string; // Gateway WebSocket/HTTP URL
mode?: ConnectionMode; // 'websocket' (default) or 'sse'
reconnect?: boolean; // Auto-reconnect on disconnect (default: true)
reconnectInterval?: number; // Base reconnect delay in ms (default: 1000)
maxReconnectAttempts?: number; // Max reconnect attempts (default: unlimited)
logger?: Logger; // Custom logger (default: console)
}Context API
The UtteranceContext provides:
| Property | Type | Description |
|----------|------|-------------|
| text | string | The user's utterance text |
| isFinal | boolean | Whether this is a final transcript |
| user | UserInfo \| undefined | User info (if profile/email/location scope) |
| vision | VisionContext \| undefined | Vision context (if vision scope) |
| entities | EntityInfo[] | Entities extracted from the utterance (NER) |
| topics | string[] | Topics detected in the utterance |
| context | ContextUtterance[] | Previous utterances (if includeContext filter set) |
| sessionId | string | Current session ID |
| requestId | string | Current request ID |
| userId | string \| undefined | User ID |
| timestamp | Date | When the utterance was received |
| abortSignal | AbortSignal | Signals when interrupted |
| Method | Description |
|--------|-------------|
| sendDelta(delta) | Stream a text chunk to the user |
| done(finalText?) | Complete the response |
| requestFrames(options?) | Request video frames (async) |
| queryMemory(options) | Query user memories (async) |
Connection Modes
WebSocket (recommended)
Full-duplex communication, lower latency:
const agent = new VoiceAgent({
mode: ConnectionModes.WebSocket,
// ...
});Server-Sent Events (SSE)
One-way server push with HTTP POST for sending. Works in browser environments:
const agent = new VoiceAgent({
mode: ConnectionModes.SSE,
// ...
});License
MIT
