@sammy-labs/sammy-three
v0.1.38
Published
Sammy Agent Core: AI voice agent with screen capture, memory management, and live API integration
Downloads
1,379
Maintainers
Readme
Sammy Agent Core
Core AI voice agent functionality with screen capture, memory management, and live API integration. For an example implementaiton see this Github Repo.
Installation
npm install @sammy-labs/sammy-threeFor UI components, also install:
npm install @sammy-labs/sammy-three-ui-kitOverview
@sammy-labs/sammy-three is a comprehensive AI voice agent package that provides:
- Real-time voice conversation with Google's Gemini Live API
- Advanced audio processing with noise suppression and noise gate
- Screen capture (render-based or video-based) with worker optimization
- Memory management with semantic search and context injection
- Interactive guides for walkthrough experiences
- Tool system with built-in tools and MCP integration
- Observability with comprehensive event tracking and analytics
- Worker-based architecture for optimal performance
- Audio debugging with built-in stutter analyzer
Table of Contents
- Quick Start
- Configuration Reference
- Authentication
- Screen Capture
- Audio Processing
- Memory Management
- Interactive Guides
- Custom Tools
- MCP Integration
- Observability
- Error Handling
- Performance Optimization
- Advanced Features
- Environment Variables
- Troubleshooting
- API Reference
Quick Start
1. Installation
npm install @sammy-labs/sammy-three
# or
pnpm add @sammy-labs/sammy-three
# or
yarn add @sammy-labs/sammy-three2. Basic Setup
import {
SammyAgentProvider,
useSammyAgentContext,
SammyApiClient,
createMemoryServices,
createCoreServices,
} from '@sammy-labs/sammy-three';
import '@sammy-labs/sammy-three/styles.css';
// 1. Create your authentication hook
const useAuth = () => {
return {
token: 'your-jwt-token',
baseUrl: 'https://your-api-url.com',
onTokenExpired: async () => {
// Handle token refresh
await refreshYourToken();
},
};
};
// 2. Wrap your app
function App() {
const auth = useAuth();
if (!auth.token) {
return <div>Loading authentication...</div>;
}
return (
<SammyAgentProvider
config={{
auth: auth,
captureMethod: 'render', // or 'video'
debugLogs: true,
model: 'models/gemini-2.5-flash-preview-native-audio-dialog',
// Audio processing (NEW)
audioConfig: {
noiseSuppression: {
enabled: true,
enhancementLevel: 'medium',
},
noiseGate: {
enabled: true,
threshold: 0.04,
},
},
}}
onError={(error) => console.error('Agent error:', error)}
onTokenExpired={auth.onTokenExpired}
>
<YourApp />
</SammyAgentProvider>
);
}
// 3. Use in components
function ChatComponent() {
const {
startAgent,
stopAgent,
sendMessage,
toggleMuted,
agentStatus,
activeSession,
agentVolume,
userVolume,
} = useSammyAgentContext();
const handleStart = async () => {
const success = await startAgent({
agentMode: 'user', // 'admin', 'user', or 'sammy'
sammyThreeOrganisationFeatureId: 'your-org-id', // optional
guideId: 'guide-123', // optional - for guided experiences
});
if (success) {
console.log('Agent started successfully');
}
};
return (
<div>
<button onClick={handleStart} disabled={agentStatus === 'connecting'}>
{agentStatus === 'connecting' ? 'Starting...' : 'Start Agent'}
</button>
<button onClick={stopAgent} disabled={agentStatus === 'disconnected'}>
Stop Agent
</button>
<button onClick={() => sendMessage('Hello!')}>
Send Message
</button>
<button onClick={toggleMuted}>
{/* Toggle mute status */}
Mute/Unmute
</button>
<div>Status: {agentStatus}</div>
<div>Agent Volume: {Math.round(agentVolume * 100)}%</div>
<div>User Volume: {Math.round(userVolume * 100)}%</div>
</div>
);
}Configuration Reference
Core Configuration Options
interface SammyAgentConfigProps {
// REQUIRED: Authentication configuration
auth: {
token: string; // Your JWT token
baseUrl?: string; // API base URL (defaults to production)
onTokenExpired?: () => void; // Token refresh callback
};
// Screen capture method
captureMethod?: 'render' | 'video'; // Default: 'render'
// AI model to use
model?: string; // Default: 'models/gemini-2.5-flash-preview-native-audio-dialog'
// Enable debug logging
debugLogs?: boolean; // Default: false
// Voice settings
defaultVoice?: string; // Default: 'aoede'
// Capture configuration
captureConfig?: {
frameRate?: number; // For video capture
quality?: number; // JPEG quality (0.0-1.0)
enableAudioAdaptation?: boolean; // Reduce capture during audio
};
// Audio processing configuration (NEW)
audioConfig?: {
noiseSuppression?: {
enabled?: boolean;
enhancementLevel?: 'light' | 'medium' | 'aggressive';
accessKey?: string; // For Koala noise suppression
modelPath?: string; // For Koala model
};
noiseGate?: {
enabled?: boolean;
threshold?: number; // 0-1, default 0.04
attackTime?: number; // ms, default 30
holdTime?: number; // ms, default 400
releaseTime?: number; // ms, default 150
};
environmentPreset?: 'office' | 'home' | 'noisy' | 'studio' | 'custom';
};
// Observability configuration
observability?: ObservabilityConfig;
// Audio performance debugging
debugAudioPerformance?: boolean;
// MCP (Model Context Protocol) configuration (NEW)
mcp?: MCPConfig;
// Guides configuration (NEW)
guides?: {
enabled?: boolean;
autoStartFromURL?: boolean;
queryParamName?: string; // Default: 'walkthrough'
};
}Canvas Ref Requirements
⚠️ IMPORTANT: Regardless of which capture method you use ('render' or 'video'), you MUST include a canvas element in your component tree:
import { useRef } from 'react';
import { SammyAgentProvider } from '@sammy-labs/sammy-three';
function App() {
// Required: Canvas ref for all capture methods
const canvasRef = useRef<HTMLCanvasElement>(null);
return (
<SammyAgentProvider
config={{
captureMethod: 'render', // or 'video' - both need canvas
// ... other config
}}
>
{/* REQUIRED: Hidden canvas for processing */}
<canvas
ref={canvasRef}
style={{ display: 'none' }}
/>
<YourApp />
</SammyAgentProvider>
);
}Why Canvas is Always Required
Both capture methods use the canvas element for:
- Frame processing: Scaling and optimizing captured frames
- Base64 encoding: Converting images for AI processing
- Performance optimization: Reusing canvas context for efficiency
- Fallback rendering: Backup processing when primary capture fails
Common Setup Error:
// ❌ Wrong: Missing canvas
<SammyAgentProvider config={{ captureMethod: 'render' }}>
<App />
</SammyAgentProvider>
// ✅ Correct: Canvas included
<SammyAgentProvider config={{ captureMethod: 'render' }}>
<canvas style={{ display: 'none' }} />
<App />
</SammyAgentProvider>Audio Processing
The package includes advanced audio processing capabilities for cleaner voice input:
Noise Suppression
Two noise suppression modes are available:
Browser Native (Default)
- Uses browser's built-in echo cancellation and noise suppression
- No additional configuration needed
Koala AI (Advanced)
- Requires Koala access key and model
- Superior noise removal for professional use
const config = {
audioConfig: {
noiseSuppression: {
enabled: true,
enhancementLevel: 'medium', // 'light', 'medium', 'aggressive'
// For Koala (optional)
accessKey: process.env.KOALA_ACCESS_KEY,
modelPath: '/models/koala_model.pv',
},
},
};Noise Gate
Software-based noise gate to filter out background noise:
const config = {
audioConfig: {
noiseGate: {
enabled: true,
threshold: 0.04, // Volume threshold (0-1)
attackTime: 30, // How fast gate opens (ms)
holdTime: 400, // Hold open during pauses (ms)
releaseTime: 150, // How fast gate closes (ms)
},
},
};Environment Presets
Pre-configured settings for different environments:
const config = {
audioConfig: {
environmentPreset: 'home', // 'office', 'home', 'noisy', 'studio', 'custom'
},
};
// Presets automatically configure:
// - office: Quiet office (low threshold, light suppression)
// - home: Home environment (balanced settings)
// - noisy: Cafe/public space (aggressive filtering)
// - studio: Professional recording (minimal processing)Audio Stutter Debugging
Built-in analyzer for debugging audio issues:
// Enable audio debugging
const config = {
debugAudioPerformance: true,
};
// In browser console:
audioStutterAnalyzer.setDebugMode(true); // Enable collection
audioStutterAnalyzer.getAnalysis(); // View analysis
audioStutterAnalyzer.clear(); // Clear buffer
// The analyzer tracks:
// - Audio underruns (stuttering)
// - Long DOM captures during audio
// - Buffer statistics
// - Correlation between renders and stuttersInteractive Guides
The package includes a complete guided walkthrough system for interactive tutorials:
Basic Guide Setup
import { useGuidesManager } from '@sammy-labs/sammy-three';
function GuidedExperience() {
const { startAgent } = useSammyAgentContext();
// Initialize guides manager
const guides = useGuidesManager({
enabled: true,
authConfig: {
token: 'your-jwt-token',
baseUrl: 'https://your-api.com',
},
autoStartFromURL: true, // Auto-start from ?walkthrough=guide-id
onStartAgent: startAgent,
onWalkthroughStart: (guideId) => {
console.log('Starting walkthrough:', guideId);
},
});
if (!guides) return null;
return (
<div>
{/* Display available guides */}
{guides.userGuides.map(guide => (
<button
key={guide.guideId}
onClick={() => guides.startWalkthrough(guide.guideId)}
>
{guide.title} {guide.isCompleted && '✓'}
</button>
))}
{/* Current guide info */}
{guides.currentGuide && (
<div>
<h3>{guides.currentGuide.title}</h3>
<p>{guides.currentGuide.description}</p>
</div>
)}
</div>
);
}URL-Based Guide Activation
Guides can be triggered via URL parameters:
https://yourapp.com?walkthrough=onboarding-guide-v1Guide Lifecycle
// Manual guide start
const success = await guides.startWalkthrough('guide-id');
// Check for guide in URL
const hasGuide = await guides.checkForGuideInUrl();
// Mark as completed (automatic when conversation starts)
guides.markGuideAsCompleted('guide-id');
// Refresh guide data
guides.refreshGuide();
guides.refreshUserGuides();
// Clean up URL params after use
guides.cleanupUrlParams();Custom Tools
Extend agent capabilities with custom tools:
Basic Tool Definition
import { ToolDefinition, ToolCategories } from '@sammy-labs/sammy-three';
const customTool: ToolDefinition = {
declaration: {
name: 'sendEmail',
description: 'Send an email to a user',
parameters: {
type: 'object',
properties: {
to: { type: 'string', description: 'Email recipient' },
subject: { type: 'string', description: 'Email subject' },
body: { type: 'string', description: 'Email body' },
},
required: ['to', 'subject', 'body'],
},
},
category: ToolCategories.ACTION,
handler: async (functionCall, context) => {
const { to, subject, body } = functionCall.args;
try {
// Your email sending logic
await sendEmail({ to, subject, body });
// Emit events for state management
context.emit('email:sent', { to, subject });
return {
success: true,
message: `Email sent to ${to}`,
};
} catch (error) {
return {
success: false,
error: error.message,
};
}
},
};
// Use with provider
<SammyAgentProvider
tools={[customTool]}
config={config}
>
{children}
</SammyAgentProvider>Built-in Tools
The package includes several built-in tools:
- End Session Tool - Gracefully end the conversation
- Get Context Tool - Retrieve contextual information
- Escalate Tool - Escalate to human support
// Escalation is automatic when AI can't help
// The escalate tool is called by the AI when needed
// It will mark the conversation for human reviewMCP Integration
Support for Model Context Protocol (MCP) enables dynamic tool discovery:
const config = {
mcp: {
enabled: true,
debug: true,
servers: [
{
name: 'hubspot',
type: 'sse',
sse: {
url: 'https://mcp-server.example.com/sse',
apiKey: process.env.MCP_API_KEY,
},
autoConnect: true,
toolPrefix: 'crm', // Tools will be prefixed: crm_create_contact
},
],
autoReconnect: true,
maxReconnectAttempts: 3,
},
};
// MCP tools are automatically discovered and registered
// They appear alongside your custom toolsObservability Configuration
Enable comprehensive event tracking and analytics:
const config = {
observability: {
enabled: true,
logToConsole: true, // Log events to console for debugging
// Worker mode for performance
useWorker: true,
workerConfig: {
batchSize: 50,
batchIntervalMs: 5000, // 5 seconds
},
// Data inclusion controls
includeSystemPrompt: false, // Exclude sensitive prompts
includeAudioData: false, // Exclude raw audio (large)
includeImageData: true, // Include screen captures
// Event filtering (reduce noise)
disableEventTypes: [
'audio.send',
'audio.receive',
],
// Custom metadata
metadata: {
environment: 'production',
version: '1.0.0',
userId: 'user-123',
},
// Audio aggregation
audioAggregation: {
flushIntervalMs: 30000, // 30 seconds
onFlush: async (data) => {
// Handle flushed audio data
await uploadToS3(data);
},
},
// Custom event callback
callback: async (event) => {
// Send to your analytics service
await sendToAnalytics(event);
},
},
};Performance Optimization
Critical DOM Renders
The system automatically captures fresh visual context at conversation boundaries:
// Automatic critical renders happen at:
// - User starts/stops speaking
// - Agent starts/stops speaking
// - Conversation turns complete
// - User interrupts agent
// This ensures the agent always has current visual context
// even during audio playback when captures are throttledAudio-Aware Capture
Screen capture automatically adapts during audio playback:
const config = {
captureConfig: {
enableAudioAdaptation: true, // Default: true
quality: 0.8,
},
};
// Normal mode: High-frequency captures (100ms-2s)
// Audio mode: Reduced captures (500ms-5s)
// Critical renders: Always execute at conversation boundariesWorker Optimization
Multiple workers handle heavy processing:
// Workers are automatic - no configuration needed
// - Canvas processing worker (image encoding)
// - Observability worker (API batching)
// - DOM capture workers (screenshot generation)
// All workers use Data URLs for CSP compatibility
// Fallback to main thread if workers unavailableAdvanced Features
Context Management
Automatic context tracking and updates:
import { useContextUpdater } from '@sammy-labs/sammy-three';
function MyComponent() {
// Automatically track page changes
useContextUpdater({
trackPageChanges: true,
updateInterval: 5000,
includeMetadata: true,
});
return <div>Content tracked automatically</div>;
}
// Manual context updates
import { ContextStateManager } from '@sammy-labs/sammy-three';
const contextManager = new ContextStateManager();
contextManager.updatePageMetadata({
url: window.location.href,
title: document.title,
});
contextManager.updateUserPreferences({
name: 'John Doe',
email: '[email protected]',
});Microphone Permission
Better permission handling:
import { useMicrophonePermission } from '@sammy-labs/sammy-three';
function MicCheck() {
const {
permission,
isChecking,
checkPermission,
requestPermission
} = useMicrophonePermission();
if (permission === 'denied') {
return <div>Microphone access denied. Please enable in settings.</div>;
}
if (permission === 'prompt') {
return (
<button onClick={requestPermission}>
Grant Microphone Access
</button>
);
}
return <div>Microphone ready!</div>;
}Direct Core Usage
For advanced use cases:
import {
SammyAgentCore,
SammyApiClient,
createMemoryServices,
ObservabilityManager,
} from '@sammy-labs/sammy-three';
// Create services
const apiClient = new SammyApiClient({
token: 'your-jwt-token',
baseUrl: 'https://your-api.com',
});
const memoryService = createMemoryServices(apiClient);
// Create agent core directly
const agentCore = new SammyAgentCore({
config: {
auth: { token: 'your-jwt-token' },
captureMethod: 'render',
debugLogs: true,
},
tools: [/* your tools */],
callbacks: {
onError: (error) => console.error(error),
onTurnComplete: (summary) => console.log(summary),
},
services: {
memoryService,
},
});
// Start agent
await agentCore.start({
agentMode: 'user',
sammyThreeOrganisationFeatureId: 'org-123',
});
// Access observability
const trace = agentCore.getObservabilityTrace();
const stats = agentCore.getObservabilityStatistics();Environment Variables
Common environment variables for configuration:
# API Configuration
NEXT_PUBLIC_SAMMY_API_BASE_URL=https://your-api.com
NEXT_PUBLIC_APP_VERSION=1.0.0
# Feature Flags
NEXT_PUBLIC_DISABLE_WORKER_MODE=false
# Audio Processing (optional)
NEXT_PUBLIC_KOALA_ACCESS_KEY=your-key
NEXT_PUBLIC_KOALA_MODEL_PATH=/models/koala.pv
# MCP Integration (optional)
NEXT_PUBLIC_MCP_API_KEY=your-mcp-key
# Debug Settings (development only)
NEXT_PUBLIC_DEBUG_AUDIO=true
NEXT_PUBLIC_DEBUG_OBSERVABILITY=trueError Handling
Provider-Level Error Handling
<SammyAgentProvider
config={config}
onError={(error) => {
console.error('Agent error:', error);
// Handle different error types
if (error.message.includes('token')) {
// Handle authentication errors
redirectToLogin();
} else if (error.message.includes('microphone')) {
// Handle microphone permission errors
showMicrophonePermissionDialog();
} else {
// Handle other errors
showErrorToast(error.message);
}
}}
onTokenExpired={() => {
// Handle token expiration
refreshAuthToken();
}}
>
{children}
</SammyAgentProvider>Component-Level Error Handling
function ChatComponent() {
const { error, agentStatus, startAgent } = useSammyAgentContext();
const handleStart = async () => {
try {
const success = await startAgent({
agentMode: 'user',
});
if (!success) {
console.error('Failed to start agent');
}
} catch (error) {
console.error('Start agent error:', error);
}
};
if (error) {
return (
<div className="error-state">
<p>Agent Error: {error.message}</p>
<button onClick={() => window.location.reload()}>
Reload Page
</button>
</div>
);
}
return (
<div>
<button onClick={handleStart}>Start Agent</button>
<div>Status: {agentStatus}</div>
</div>
);
}Troubleshooting
Common Issues
"Token expired" errors
// Ensure onTokenExpired is implemented const config = { auth: { token: jwtToken, onTokenExpired: async () => { await refreshToken(); }, }, };Microphone permission denied
// Use the microphone permission hook const { permission, requestPermission } = useMicrophonePermission(); if (permission === 'denied') { // Show instructions to enable in browser settings }Audio stuttering
// Enable audio debugging audioStutterAnalyzer.setDebugMode(true); audioStutterAnalyzer.getAnalysis(); // Reduce capture quality const config = { captureConfig: { quality: 0.7, enableAudioAdaptation: true, }, };Noise issues
// Try different environment presets const config = { audioConfig: { environmentPreset: 'noisy', // More aggressive filtering }, };Worker not loading
// Disable workers if needed const config = { observability: { useWorker: false, // Fallback to main thread }, };
Debug Mode
Enable comprehensive debugging:
const config = {
debugLogs: true,
debugAudioPerformance: true,
observability: {
enabled: true,
logToConsole: true,
},
mcp: {
debug: true,
},
};
// In browser console:
audioStutterAnalyzer.setDebugMode(true);API Reference
Core Exports
// Main Components
export { SammyAgentProvider, useSammyAgentContext }
// Configuration
export { getAgentConfig }
// Hooks
export {
useScreenCapture,
useRenderCapture,
useVideoCapture,
useContextUpdater,
useMicrophonePermission,
useGuides,
useGuidesManager,
useGuidesQueryParams,
}
// Core Classes
export {
SammyAgentCore,
SammyApiClient,
ScreenCaptureManager,
ObservabilityManager,
ToolManager,
MCPManager,
ContextStateManager,
ContextMemoryManager,
}
// Audio Classes
export {
AudioRecorder,
AudioStreamer,
audioStutterAnalyzer,
}
// Services
export {
createMemoryServices,
createCoreServices,
type MemoryService,
type CoreService,
}
// Types
export type {
SammyAgentConfig,
SammyAgentCallbacks,
AgentSession,
ObservabilityConfig,
ToolDefinition,
MemoryEntry,
TraceEvent,
MCPConfig,
AudioConfig,
NoiseGateConfig,
NoiseSuppressionConfig,
GuidesState,
}
// Utilities
export {
saveTextToFile,
audioContext,
setupAudioFlushOnUnload,
}Provider Props
interface SammyAgentProviderProps {
children: ReactNode;
config: SammyAgentConfigProps;
tools?: ToolDefinition[];
autoStartFromURL?: boolean;
// Callbacks
onError?: (error: unknown) => void;
onTokenExpired?: () => void;
onTurnComplete?: (summary: { agent: string; user: string }) => void;
onConnectionStateChange?: (connected: boolean) => void;
onMemoryUpdate?: (result: unknown) => void;
onToolCall?: (tool: LiveServerToolCall) => void;
onWalkthroughStart?: (walkthroughGuideId: string) => void;
}Context Methods
interface SammyAgentContextType {
// State
activeSession: AgentSession | null;
agentStatus: 'connected' | 'connecting' | 'disconnected' | 'disconnecting';
agentVolume: number; // 0-1
userVolume: number; // 0-1
config: SammyAgentConfig;
error: Error | null;
muted: boolean;
screenCapture: ScreenCapture;
// Methods
startAgent: (options: StartOptions) => Promise<boolean>;
stopAgent: () => void;
sendMessage: (message: string) => void;
toggleMuted: () => void;
// Start Options
interface StartOptions {
agentMode: 'user' | 'admin' | 'sammy';
sammyThreeOrganisationFeatureId?: string;
guideId?: string; // For guided experiences
}
}Best Practices
Always handle authentication properly
- Implement token refresh logic
- Handle token expiration gracefully
- Use environment variables for API URLs
Configure audio for your environment
- Use environment presets for quick setup
- Test noise gate threshold for your use case
- Enable noise suppression in noisy environments
Enable worker mode for production
- Better performance for high-frequency events
- Prevents main thread blocking
- Automatic fallback if workers fail
Use appropriate capture methods
renderfor web applications (recommended)videofor full screen capture needs- Enable audio adaptation to prevent stuttering
Implement proper error handling
- Provider-level error boundaries
- Component-level error states
- User-friendly error messages
Optimize for performance
- Enable audio-aware capture
- Use appropriate quality settings
- Filter unnecessary observability events
- Use worker mode for heavy operations
Monitor and debug
- Enable debug logs in development
- Use observability for production monitoring
- Use audio stutter analyzer for performance issues
- Check MCP debug logs for integration issues
Migration Guide
From Legacy Versions
If migrating from older versions:
Update imports:
// Old import { SammyAgent } from '@sammy-labs/sammy-three-legacy'; // New import { SammyAgentProvider, useSammyAgentContext } from '@sammy-labs/sammy-three';Update configuration:
// Old const agent = new SammyAgent({ apiKey: 'key' }); // New <SammyAgentProvider config={{ auth: { token: 'jwt-token' } }}>Update methods:
// Old agent.connect(); // New const { startAgent } = useSammyAgentContext(); await startAgent({ agentMode: 'user' });New features to adopt:
- Audio processing with noise suppression
- Interactive guides system
- MCP tool integration
- Worker-based observability
- Critical DOM renders
