@eleven-am/voice-agent
v0.0.1
Published
TypeScript SDK for building voice agents on the Voice Gateway
Maintainers
Readme
Voice Agent SDK (TypeScript)
TypeScript SDK for building voice agents on the Voice Gateway.
Installation
npm install @eleven-am/voice-agentQuick Start
import { VoiceAgent, ConnectionModes } from '@eleven-am/voice-agent';
const agent = new VoiceAgent({
apiKey: 'sk-voice-xxx',
gatewayUrl: 'wss://gateway.example.com',
mode: ConnectionModes.WebSocket,
});
agent
.onUtterance(async (ctx) => {
console.log(`User said: ${ctx.text}`);
// Stream response
ctx.sendDelta('Hello ');
ctx.sendDelta('World!');
ctx.done();
})
.onInterrupt((sessionId, reason) => {
console.log(`Interrupted: ${reason}`);
})
.onError((error) => {
console.error('Error:', error);
});
await agent.connect();Streaming with LLM
import { VoiceAgent } from '@eleven-am/voice-agent';
import OpenAI from 'openai';
const openai = new OpenAI();
const agent = new VoiceAgent({
apiKey: process.env.VOICE_API_KEY!,
gatewayUrl: process.env.GATEWAY_URL!,
});
agent.onUtterance(async (ctx) => {
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: ctx.text }],
stream: true,
});
for await (const chunk of stream) {
if (ctx.abortSignal.aborted) break;
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
ctx.sendDelta(delta);
}
}
ctx.done();
});
await agent.connect();Using Vision (Video Frames)
Agents with vision scope can request video frames from the user's session:
agent.onUtterance(async (ctx) => {
// Check if vision context is available
if (ctx.vision?.available) {
console.log('Auto-analyzed:', ctx.vision.description);
}
// Request raw frames for custom analysis
const frames = await ctx.requestFrames({
limit: 5,
rawBase64: true,
});
if (frames.frames) {
for (const frame of frames.frames) {
// frame.base64 contains the image data
// frame.timestamp is when it was captured
}
}
// Or get pre-analyzed descriptions
const analyzed = await ctx.requestFrames({ limit: 3 });
if (analyzed.descriptions) {
console.log('Frame descriptions:', analyzed.descriptions);
}
ctx.done('I can see what you\'re showing me!');
});Using Memory
Agents with memory scope can query the user's stored facts:
agent.onUtterance(async (ctx) => {
// Query relevant memories
const memories = await ctx.queryMemory({
query: ctx.text,
topK: 5,
threshold: 0.7,
types: ['preference', 'fact'],
});
if (memories.facts && memories.facts.length > 0) {
const context = memories.facts
.map(f => f.content)
.join('\n');
// Use memories as context for LLM
const response = await generateWithContext(ctx.text, context);
ctx.done(response);
} else {
ctx.done('I don\'t have any relevant memories about that.');
}
});Handling Interrupts
When the user starts speaking, the gateway sends an interrupt. Use the abort signal to stop processing:
agent.onUtterance(async (ctx) => {
for await (const chunk of streamResponse(ctx.text)) {
// Check before each operation
if (ctx.abortSignal.aborted) {
console.log('User interrupted, stopping');
return;
}
ctx.sendDelta(chunk);
}
ctx.done();
});
agent.onInterrupt((sessionId, reason) => {
// reason: "new_user_speech" | "lost_arbitration" | "supersede"
console.log(`Session ${sessionId} interrupted: ${reason}`);
});Configuration
interface VoiceAgentConfig {
apiKey: string; // Your API key (sk-voice-xxx)
gatewayUrl: string; // Gateway WebSocket/HTTP URL
mode?: ConnectionMode; // 'websocket' (default) or 'sse'
reconnect?: boolean; // Auto-reconnect on disconnect (default: true)
reconnectInterval?: number; // Base reconnect delay in ms (default: 1000)
maxReconnectAttempts?: number; // Max reconnect attempts (default: unlimited)
logger?: Logger; // Custom logger (default: console)
}Context API
The UtteranceContext provides:
| Property | Type | Description |
|----------|------|-------------|
| text | string | The user's utterance text |
| isFinal | boolean | Whether this is a final transcript |
| user | UserInfo \| undefined | User info (if profile/email/location scope) |
| vision | VisionContext \| undefined | Vision context (if vision scope) |
| sessionId | string | Current session ID |
| requestId | string | Current request ID |
| userId | string \| undefined | User ID |
| timestamp | Date | When the utterance was received |
| abortSignal | AbortSignal | Signals when interrupted |
| Method | Description |
|--------|-------------|
| sendDelta(delta) | Stream a text chunk to the user |
| done(finalText?) | Complete the response |
| requestFrames(options?) | Request video frames (async) |
| queryMemory(options) | Query user memories (async) |
Connection Modes
WebSocket (recommended)
Full-duplex communication, lower latency:
const agent = new VoiceAgent({
mode: ConnectionModes.WebSocket,
// ...
});Server-Sent Events (SSE)
One-way server push with HTTP POST for sending. Works in browser environments:
const agent = new VoiceAgent({
mode: ConnectionModes.SSE,
// ...
});License
MIT
