@sammy-labs/sammy-three

v0.1.38

Published

7 months ago

Sammy Agent Core: AI voice agent with screen capture, memory management, and live API integration

0High
0Medium
0Low

sammy-labs

ai-agent voice-ai screen-capture live-api memory-management react-hooks gemini-live

Sammy Agent Core

Core AI voice agent functionality with screen capture, memory management, and live API integration. For an example implementaiton see this Github Repo.

Installation

npm install @sammy-labs/sammy-three

For UI components, also install:

npm install @sammy-labs/sammy-three-ui-kit

Overview

@sammy-labs/sammy-three is a comprehensive AI voice agent package that provides:

Real-time voice conversation with Google's Gemini Live API
Advanced audio processing with noise suppression and noise gate
Screen capture (render-based or video-based) with worker optimization
Memory management with semantic search and context injection
Interactive guides for walkthrough experiences
Tool system with built-in tools and MCP integration
Observability with comprehensive event tracking and analytics
Worker-based architecture for optimal performance
Audio debugging with built-in stutter analyzer

Quick Start

1. Installation

npm install @sammy-labs/sammy-three
# or
pnpm add @sammy-labs/sammy-three
# or
yarn add @sammy-labs/sammy-three

2. Basic Setup

import {
  SammyAgentProvider,
  useSammyAgentContext,
  SammyApiClient,
  createMemoryServices,
  createCoreServices,
} from '@sammy-labs/sammy-three';
import '@sammy-labs/sammy-three/styles.css';

// 1. Create your authentication hook
const useAuth = () => {
  return {
    token: 'your-jwt-token',
    baseUrl: 'https://your-api-url.com',
    onTokenExpired: async () => {
      // Handle token refresh
      await refreshYourToken();
    },
  };
};

// 2. Wrap your app
function App() {
  const auth = useAuth();

  if (!auth.token) {
    return <div>Loading authentication...</div>;
  }

  return (
    <SammyAgentProvider
      config={{
        auth: auth,
        captureMethod: 'render', // or 'video'
        debugLogs: true,
        model: 'models/gemini-2.5-flash-preview-native-audio-dialog',
        // Audio processing (NEW)
        audioConfig: {
          noiseSuppression: {
            enabled: true,
            enhancementLevel: 'medium',
          },
          noiseGate: {
            enabled: true,
            threshold: 0.04,
          },
        },
      }}
      onError={(error) => console.error('Agent error:', error)}
      onTokenExpired={auth.onTokenExpired}
    >
      <YourApp />
    </SammyAgentProvider>
  );
}

// 3. Use in components
function ChatComponent() {
  const {
    startAgent,
    stopAgent,
    sendMessage,
    toggleMuted,
    agentStatus,
    activeSession,
    agentVolume,
    userVolume,
  } = useSammyAgentContext();

  const handleStart = async () => {
    const success = await startAgent({
      agentMode: 'user', // 'admin', 'user', or 'sammy'
      sammyThreeOrganisationFeatureId: 'your-org-id', // optional
      guideId: 'guide-123', // optional - for guided experiences
    });
    if (success) {
      console.log('Agent started successfully');
    }
  };

  return (
    <div>
      <button onClick={handleStart} disabled={agentStatus === 'connecting'}>
        {agentStatus === 'connecting' ? 'Starting...' : 'Start Agent'}
      </button>
      
      <button onClick={stopAgent} disabled={agentStatus === 'disconnected'}>
        Stop Agent
      </button>
      
      <button onClick={() => sendMessage('Hello!')}>
        Send Message
      </button>
      
      <button onClick={toggleMuted}>
        {/* Toggle mute status */}
        Mute/Unmute
      </button>

      <div>Status: {agentStatus}</div>
      <div>Agent Volume: {Math.round(agentVolume * 100)}%</div>
      <div>User Volume: {Math.round(userVolume * 100)}%</div>
    </div>
  );
}

Configuration Reference

Core Configuration Options

interface SammyAgentConfigProps {
  // REQUIRED: Authentication configuration
  auth: {
    token: string;           // Your JWT token
    baseUrl?: string;        // API base URL (defaults to production)
    onTokenExpired?: () => void; // Token refresh callback
  };

  // Screen capture method
  captureMethod?: 'render' | 'video'; // Default: 'render'

  // AI model to use
  model?: string; // Default: 'models/gemini-2.5-flash-preview-native-audio-dialog'

  // Enable debug logging
  debugLogs?: boolean; // Default: false

  // Voice settings  
  defaultVoice?: string; // Default: 'aoede'

  // Capture configuration
  captureConfig?: {
    frameRate?: number;    // For video capture
    quality?: number;      // JPEG quality (0.0-1.0)
    enableAudioAdaptation?: boolean; // Reduce capture during audio
  };

  // Audio processing configuration (NEW)
  audioConfig?: {
    noiseSuppression?: {
      enabled?: boolean;
      enhancementLevel?: 'light' | 'medium' | 'aggressive';
      accessKey?: string;  // For Koala noise suppression
      modelPath?: string;  // For Koala model
    };
    noiseGate?: {
      enabled?: boolean;
      threshold?: number;      // 0-1, default 0.04
      attackTime?: number;     // ms, default 30
      holdTime?: number;       // ms, default 400
      releaseTime?: number;    // ms, default 150
    };
    environmentPreset?: 'office' | 'home' | 'noisy' | 'studio' | 'custom';
  };

  // Observability configuration
  observability?: ObservabilityConfig;

  // Audio performance debugging
  debugAudioPerformance?: boolean;

  // MCP (Model Context Protocol) configuration (NEW)
  mcp?: MCPConfig;

  // Guides configuration (NEW)
  guides?: {
    enabled?: boolean;
    autoStartFromURL?: boolean;
    queryParamName?: string; // Default: 'walkthrough'
  };
}

Canvas Ref Requirements

⚠️ IMPORTANT: Regardless of which capture method you use ('render' or 'video'), you MUST include a canvas element in your component tree:

import { useRef } from 'react';
import { SammyAgentProvider } from '@sammy-labs/sammy-three';

function App() {
  // Required: Canvas ref for all capture methods
  const canvasRef = useRef<HTMLCanvasElement>(null);

  return (
    <SammyAgentProvider
      config={{
        captureMethod: 'render', // or 'video' - both need canvas
        // ... other config
      }}
    >
      {/* REQUIRED: Hidden canvas for processing */}
      <canvas 
        ref={canvasRef}
        style={{ display: 'none' }} 
      />
      
      <YourApp />
    </SammyAgentProvider>
  );
}

Why Canvas is Always Required

Both capture methods use the canvas element for:

Frame processing: Scaling and optimizing captured frames
Base64 encoding: Converting images for AI processing
Performance optimization: Reusing canvas context for efficiency
Fallback rendering: Backup processing when primary capture fails

Common Setup Error:

// ❌ Wrong: Missing canvas
<SammyAgentProvider config={{ captureMethod: 'render' }}>
  <App />
</SammyAgentProvider>

// ✅ Correct: Canvas included
<SammyAgentProvider config={{ captureMethod: 'render' }}>
  <canvas style={{ display: 'none' }} />
  <App />
</SammyAgentProvider>

Audio Processing

The package includes advanced audio processing capabilities for cleaner voice input:

Noise Suppression

Two noise suppression modes are available:

Browser Native (Default)
- Uses browser's built-in echo cancellation and noise suppression
- No additional configuration needed
Koala AI (Advanced)
- Requires Koala access key and model
- Superior noise removal for professional use

const config = {
  audioConfig: {
    noiseSuppression: {
      enabled: true,
      enhancementLevel: 'medium', // 'light', 'medium', 'aggressive'
      // For Koala (optional)
      accessKey: process.env.KOALA_ACCESS_KEY,
      modelPath: '/models/koala_model.pv',
    },
  },
};

Noise Gate

Software-based noise gate to filter out background noise:

const config = {
  audioConfig: {
    noiseGate: {
      enabled: true,
      threshold: 0.04,    // Volume threshold (0-1)
      attackTime: 30,     // How fast gate opens (ms)
      holdTime: 400,      // Hold open during pauses (ms)
      releaseTime: 150,   // How fast gate closes (ms)
    },
  },
};

Environment Presets

Pre-configured settings for different environments:

const config = {
  audioConfig: {
    environmentPreset: 'home', // 'office', 'home', 'noisy', 'studio', 'custom'
  },
};

// Presets automatically configure:
// - office: Quiet office (low threshold, light suppression)
// - home: Home environment (balanced settings)
// - noisy: Cafe/public space (aggressive filtering)
// - studio: Professional recording (minimal processing)

Audio Stutter Debugging

Built-in analyzer for debugging audio issues:

// Enable audio debugging
const config = {
  debugAudioPerformance: true,
};

// In browser console:
audioStutterAnalyzer.setDebugMode(true);  // Enable collection
audioStutterAnalyzer.getAnalysis();       // View analysis
audioStutterAnalyzer.clear();             // Clear buffer

// The analyzer tracks:
// - Audio underruns (stuttering)
// - Long DOM captures during audio
// - Buffer statistics
// - Correlation between renders and stutters

Interactive Guides

The package includes a complete guided walkthrough system for interactive tutorials:

Basic Guide Setup

import { useGuidesManager } from '@sammy-labs/sammy-three';

function GuidedExperience() {
  const { startAgent } = useSammyAgentContext();
  
  // Initialize guides manager
  const guides = useGuidesManager({
    enabled: true,
    authConfig: {
      token: 'your-jwt-token',
      baseUrl: 'https://your-api.com',
    },
    autoStartFromURL: true, // Auto-start from ?walkthrough=guide-id
    onStartAgent: startAgent,
    onWalkthroughStart: (guideId) => {
      console.log('Starting walkthrough:', guideId);
    },
  });

  if (!guides) return null;

  return (
    <div>
      {/* Display available guides */}
      {guides.userGuides.map(guide => (
        <button 
          key={guide.guideId}
          onClick={() => guides.startWalkthrough(guide.guideId)}
        >
          {guide.title} {guide.isCompleted && '✓'}
        </button>
      ))}

      {/* Current guide info */}
      {guides.currentGuide && (
        <div>
          <h3>{guides.currentGuide.title}</h3>
          <p>{guides.currentGuide.description}</p>
        </div>
      )}
    </div>
  );
}

URL-Based Guide Activation

Guides can be triggered via URL parameters:

https://yourapp.com?walkthrough=onboarding-guide-v1

Guide Lifecycle

// Manual guide start
const success = await guides.startWalkthrough('guide-id');

// Check for guide in URL
const hasGuide = await guides.checkForGuideInUrl();

// Mark as completed (automatic when conversation starts)
guides.markGuideAsCompleted('guide-id');

// Refresh guide data
guides.refreshGuide();
guides.refreshUserGuides();

// Clean up URL params after use
guides.cleanupUrlParams();

Custom Tools

Extend agent capabilities with custom tools:

Basic Tool Definition

import { ToolDefinition, ToolCategories } from '@sammy-labs/sammy-three';

const customTool: ToolDefinition = {
  declaration: {
    name: 'sendEmail',
    description: 'Send an email to a user',
    parameters: {
      type: 'object',
      properties: {
        to: { type: 'string', description: 'Email recipient' },
        subject: { type: 'string', description: 'Email subject' },
        body: { type: 'string', description: 'Email body' },
      },
      required: ['to', 'subject', 'body'],
    },
  },
  category: ToolCategories.ACTION,
  handler: async (functionCall, context) => {
    const { to, subject, body } = functionCall.args;
    
    try {
      // Your email sending logic
      await sendEmail({ to, subject, body });
      
      // Emit events for state management
      context.emit('email:sent', { to, subject });
      
      return {
        success: true,
        message: `Email sent to ${to}`,
      };
    } catch (error) {
      return {
        success: false,
        error: error.message,
      };
    }
  },
};

// Use with provider
<SammyAgentProvider
  tools={[customTool]}
  config={config}
>
  {children}
</SammyAgentProvider>

Built-in Tools

The package includes several built-in tools:

End Session Tool - Gracefully end the conversation
Get Context Tool - Retrieve contextual information
Escalate Tool - Escalate to human support

// Escalation is automatic when AI can't help
// The escalate tool is called by the AI when needed
// It will mark the conversation for human review

MCP Integration

Support for Model Context Protocol (MCP) enables dynamic tool discovery:

const config = {
  mcp: {
    enabled: true,
    debug: true,
    servers: [
      {
        name: 'hubspot',
        type: 'sse',
        sse: {
          url: 'https://mcp-server.example.com/sse',
          apiKey: process.env.MCP_API_KEY,
        },
        autoConnect: true,
        toolPrefix: 'crm', // Tools will be prefixed: crm_create_contact
      },
    ],
    autoReconnect: true,
    maxReconnectAttempts: 3,
  },
};

// MCP tools are automatically discovered and registered
// They appear alongside your custom tools

Observability Configuration

Enable comprehensive event tracking and analytics:

const config = {
  observability: {
    enabled: true,
    logToConsole: true, // Log events to console for debugging
    
    // Worker mode for performance
    useWorker: true,
    workerConfig: {
      batchSize: 50,
      batchIntervalMs: 5000, // 5 seconds
    },
    
    // Data inclusion controls
    includeSystemPrompt: false,   // Exclude sensitive prompts
    includeAudioData: false,      // Exclude raw audio (large)
    includeImageData: true,       // Include screen captures
    
    // Event filtering (reduce noise)
    disableEventTypes: [
      'audio.send',
      'audio.receive',
    ],
    
    // Custom metadata
    metadata: {
      environment: 'production',
      version: '1.0.0',
      userId: 'user-123',
    },

    // Audio aggregation
    audioAggregation: {
      flushIntervalMs: 30000, // 30 seconds
      onFlush: async (data) => {
        // Handle flushed audio data
        await uploadToS3(data);
      },
    },

    // Custom event callback
    callback: async (event) => {
      // Send to your analytics service
      await sendToAnalytics(event);
    },
  },
};

Performance Optimization

Critical DOM Renders

The system automatically captures fresh visual context at conversation boundaries:

// Automatic critical renders happen at:
// - User starts/stops speaking
// - Agent starts/stops speaking
// - Conversation turns complete
// - User interrupts agent

// This ensures the agent always has current visual context
// even during audio playback when captures are throttled

Audio-Aware Capture

Screen capture automatically adapts during audio playback:

const config = {
  captureConfig: {
    enableAudioAdaptation: true, // Default: true
    quality: 0.8,
  },
};

// Normal mode: High-frequency captures (100ms-2s)
// Audio mode: Reduced captures (500ms-5s)
// Critical renders: Always execute at conversation boundaries

Worker Optimization

Multiple workers handle heavy processing:

// Workers are automatic - no configuration needed
// - Canvas processing worker (image encoding)
// - Observability worker (API batching)
// - DOM capture workers (screenshot generation)

// All workers use Data URLs for CSP compatibility
// Fallback to main thread if workers unavailable

Advanced Features

Context Management

Automatic context tracking and updates:

import { useContextUpdater } from '@sammy-labs/sammy-three';

function MyComponent() {
  // Automatically track page changes
  useContextUpdater({
    trackPageChanges: true,
    updateInterval: 5000,
    includeMetadata: true,
  });

  return <div>Content tracked automatically</div>;
}

// Manual context updates
import { ContextStateManager } from '@sammy-labs/sammy-three';

const contextManager = new ContextStateManager();
contextManager.updatePageMetadata({
  url: window.location.href,
  title: document.title,
});
contextManager.updateUserPreferences({
  name: 'John Doe',
  email: '[email protected]',
});

Microphone Permission

Better permission handling:

import { useMicrophonePermission } from '@sammy-labs/sammy-three';

function MicCheck() {
  const { 
    permission, 
    isChecking, 
    checkPermission,
    requestPermission 
  } = useMicrophonePermission();

  if (permission === 'denied') {
    return <div>Microphone access denied. Please enable in settings.</div>;
  }

  if (permission === 'prompt') {
    return (
      <button onClick={requestPermission}>
        Grant Microphone Access
      </button>
    );
  }

  return <div>Microphone ready!</div>;
}

Direct Core Usage

For advanced use cases:

import { 
  SammyAgentCore, 
  SammyApiClient,
  createMemoryServices,
  ObservabilityManager,
} from '@sammy-labs/sammy-three';

// Create services
const apiClient = new SammyApiClient({
  token: 'your-jwt-token',
  baseUrl: 'https://your-api.com',
});

const memoryService = createMemoryServices(apiClient);

// Create agent core directly
const agentCore = new SammyAgentCore({
  config: {
    auth: { token: 'your-jwt-token' },
    captureMethod: 'render',
    debugLogs: true,
  },
  tools: [/* your tools */],
  callbacks: {
    onError: (error) => console.error(error),
    onTurnComplete: (summary) => console.log(summary),
  },
  services: {
    memoryService,
  },
});

// Start agent
await agentCore.start({ 
  agentMode: 'user',
  sammyThreeOrganisationFeatureId: 'org-123',
});

// Access observability
const trace = agentCore.getObservabilityTrace();
const stats = agentCore.getObservabilityStatistics();

Environment Variables

Common environment variables for configuration:

# API Configuration
NEXT_PUBLIC_SAMMY_API_BASE_URL=https://your-api.com
NEXT_PUBLIC_APP_VERSION=1.0.0

# Feature Flags
NEXT_PUBLIC_DISABLE_WORKER_MODE=false

# Audio Processing (optional)
NEXT_PUBLIC_KOALA_ACCESS_KEY=your-key
NEXT_PUBLIC_KOALA_MODEL_PATH=/models/koala.pv

# MCP Integration (optional)
NEXT_PUBLIC_MCP_API_KEY=your-mcp-key

# Debug Settings (development only)
NEXT_PUBLIC_DEBUG_AUDIO=true
NEXT_PUBLIC_DEBUG_OBSERVABILITY=true

Error Handling

Provider-Level Error Handling

<SammyAgentProvider
  config={config}
  onError={(error) => {
    console.error('Agent error:', error);
    
    // Handle different error types
    if (error.message.includes('token')) {
      // Handle authentication errors
      redirectToLogin();
    } else if (error.message.includes('microphone')) {
      // Handle microphone permission errors
      showMicrophonePermissionDialog();
    } else {
      // Handle other errors
      showErrorToast(error.message);
    }
  }}
  onTokenExpired={() => {
    // Handle token expiration
    refreshAuthToken();
  }}
>
  {children}
</SammyAgentProvider>

Component-Level Error Handling

function ChatComponent() {
  const { error, agentStatus, startAgent } = useSammyAgentContext();

  const handleStart = async () => {
    try {
      const success = await startAgent({
        agentMode: 'user',
      });
      
      if (!success) {
        console.error('Failed to start agent');
      }
    } catch (error) {
      console.error('Start agent error:', error);
    }
  };

  if (error) {
    return (
      <div className="error-state">
        <p>Agent Error: {error.message}</p>
        <button onClick={() => window.location.reload()}>
          Reload Page
        </button>
      </div>
    );
  }

  return (
    <div>
      <button onClick={handleStart}>Start Agent</button>
      <div>Status: {agentStatus}</div>
    </div>
  );
}

Troubleshooting

Common Issues

"Token expired" errors

// Ensure onTokenExpired is implemented
const config = {
  auth: {
    token: jwtToken,
    onTokenExpired: async () => {
      await refreshToken();
    },
  },
};

Microphone permission denied

// Use the microphone permission hook
const { permission, requestPermission } = useMicrophonePermission();
if (permission === 'denied') {
  // Show instructions to enable in browser settings
}

Audio stuttering

// Enable audio debugging
audioStutterAnalyzer.setDebugMode(true);
audioStutterAnalyzer.getAnalysis();
   
// Reduce capture quality
const config = {
  captureConfig: {
    quality: 0.7,
    enableAudioAdaptation: true,
  },
};

Noise issues

// Try different environment presets
const config = {
  audioConfig: {
    environmentPreset: 'noisy', // More aggressive filtering
  },
};

Worker not loading

// Disable workers if needed
const config = {
  observability: {
    useWorker: false, // Fallback to main thread
  },
};

Debug Mode

Enable comprehensive debugging:

const config = {
  debugLogs: true,
  debugAudioPerformance: true,
  observability: {
    enabled: true,
    logToConsole: true,
  },
  mcp: {
    debug: true,
  },
};

// In browser console:
audioStutterAnalyzer.setDebugMode(true);

API Reference

Core Exports

// Main Components
export { SammyAgentProvider, useSammyAgentContext }

// Configuration
export { getAgentConfig }

// Hooks
export { 
  useScreenCapture, 
  useRenderCapture, 
  useVideoCapture,
  useContextUpdater,
  useMicrophonePermission,
  useGuides,
  useGuidesManager,
  useGuidesQueryParams,
}

// Core Classes
export { 
  SammyAgentCore,
  SammyApiClient,
  ScreenCaptureManager,
  ObservabilityManager,
  ToolManager,
  MCPManager,
  ContextStateManager,
  ContextMemoryManager,
}

// Audio Classes
export {
  AudioRecorder,
  AudioStreamer,
  audioStutterAnalyzer,
}

// Services
export { 
  createMemoryServices,
  createCoreServices,
  type MemoryService,
  type CoreService,
}

// Types
export type {
  SammyAgentConfig,
  SammyAgentCallbacks,
  AgentSession,
  ObservabilityConfig,
  ToolDefinition,
  MemoryEntry,
  TraceEvent,
  MCPConfig,
  AudioConfig,
  NoiseGateConfig,
  NoiseSuppressionConfig,
  GuidesState,
}

// Utilities
export { 
  saveTextToFile,
  audioContext,
  setupAudioFlushOnUnload,
}

Provider Props

interface SammyAgentProviderProps {
  children: ReactNode;
  config: SammyAgentConfigProps;
  tools?: ToolDefinition[];
  autoStartFromURL?: boolean;
  
  // Callbacks
  onError?: (error: unknown) => void;
  onTokenExpired?: () => void;
  onTurnComplete?: (summary: { agent: string; user: string }) => void;
  onConnectionStateChange?: (connected: boolean) => void;
  onMemoryUpdate?: (result: unknown) => void;
  onToolCall?: (tool: LiveServerToolCall) => void;
  onWalkthroughStart?: (walkthroughGuideId: string) => void;
}

Context Methods

interface SammyAgentContextType {
  // State
  activeSession: AgentSession | null;
  agentStatus: 'connected' | 'connecting' | 'disconnected' | 'disconnecting';
  agentVolume: number; // 0-1
  userVolume: number; // 0-1
  config: SammyAgentConfig;
  error: Error | null;
  muted: boolean;
  screenCapture: ScreenCapture;

  // Methods
  startAgent: (options: StartOptions) => Promise<boolean>;
  stopAgent: () => void;
  sendMessage: (message: string) => void;
  toggleMuted: () => void;
  
  // Start Options
  interface StartOptions {
    agentMode: 'user' | 'admin' | 'sammy';
    sammyThreeOrganisationFeatureId?: string;
    guideId?: string; // For guided experiences
  }
}

Best Practices

Always handle authentication properly
- Implement token refresh logic
- Handle token expiration gracefully
- Use environment variables for API URLs
Configure audio for your environment
- Use environment presets for quick setup
- Test noise gate threshold for your use case
- Enable noise suppression in noisy environments
Enable worker mode for production
- Better performance for high-frequency events
- Prevents main thread blocking
- Automatic fallback if workers fail
Use appropriate capture methods
- render for web applications (recommended)
- video for full screen capture needs
- Enable audio adaptation to prevent stuttering
Implement proper error handling
- Provider-level error boundaries
- Component-level error states
- User-friendly error messages
Optimize for performance
- Enable audio-aware capture
- Use appropriate quality settings
- Filter unnecessary observability events
- Use worker mode for heavy operations
Monitor and debug
- Enable debug logs in development
- Use observability for production monitoring
- Use audio stutter analyzer for performance issues
- Check MCP debug logs for integration issues

Migration Guide

From Legacy Versions

If migrating from older versions:

Update imports:

// Old
import { SammyAgent } from '@sammy-labs/sammy-three-legacy';
   
// New
import { SammyAgentProvider, useSammyAgentContext } from '@sammy-labs/sammy-three';

Update configuration:

// Old
const agent = new SammyAgent({ apiKey: 'key' });
   
// New
<SammyAgentProvider config={{ auth: { token: 'jwt-token' } }}>

Update methods:

// Old
agent.connect();
   
// New
const { startAgent } = useSammyAgentContext();
await startAgent({ agentMode: 'user' });

New features to adopt:
- Audio processing with noise suppression
- Interactive guides system
- MCP tool integration
- Worker-based observability
- Critical DOM renders