npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

sarvam-conv-ai-sdk

v0.0.42

Published

TypeScript SDK for Sarvam Conversational AI

Downloads

1,283

Readme

Sarvam Conv AI SDK

TypeScript SDK for building real-time voice-to-voice and text-based conversational AI applications across multiple platforms.

Features

  • Real-time voice-to-voice conversations in the browser
  • Text-based chat with streaming responses
  • Automatic microphone capture and speaker playback
  • Multi-language support (11 Indian languages + English)
  • WebSocket-based real-time communication
  • Cross-platform: Browser, React Native, and Node.js support

Installation

For Web/Browser

npm install sarvam-conv-ai-sdk

For React Native

npm install sarvam-conv-ai-sdk
npm install react-native-audio-api

For Node.js

npm install sarvam-conv-ai-sdk ws

Platform-Specific Imports

⚠️ Important: Always use platform-specific imports to avoid bundling errors and reduce bundle size.

The SDK provides platform-optimized entry points:

Browser/Web

// ✅ Always use the /browser entry point for web applications
import { ConversationAgent, BrowserAudioInterface } from 'sarvam-conv-ai-sdk/browser';

Why? The browser entry point excludes React Native dependencies, preventing bundler errors like Cannot resolve 'react-native'.

React Native

// ✅ Always use the /react-native entry point for React Native apps
import { ConversationAgent, RNAudioInterface } from 'sarvam-conv-ai-sdk/react-native';

Why? The React Native entry point includes native module support for iOS and Android.

Node.js

// Use the default entry point for Node.js
import { ConversationAgent } from 'sarvam-conv-ai-sdk';

Quick Start

Voice-to-Voice Conversation (Browser)

import React, { useRef, useState } from 'react';
import {
  ConversationAgent,
  BrowserAudioInterface,
  InteractionType,
  type ServerTextMsgType,
} from 'sarvam-conv-ai-sdk/browser';

function VoiceChat() {
  const [isConnected, setIsConnected] = useState(false);
  const [transcript, setTranscript] = useState('');
  const agentRef = useRef<ConversationAgent | null>(null);

  const startConversation = async () => {
    try {
      const audioInterface = new BrowserAudioInterface();
      
      const agent = new ConversationAgent({
        apiKey: 'your_api_key',
        platform: 'browser',
        config: {
          user_identifier_type: 'custom',
          user_identifier: 'user123',
          org_id: 'your_org_id',
          workspace_id: 'your_workspace_id',
          app_id: 'your_app_id',
          interaction_type: InteractionType.CALL,
          input_sample_rate: 16000,
          output_sample_rate: 16000,
        },
        audioInterface,
        textCallback: async (msg: ServerTextMsgType) => {
          setTranscript(prev => prev + msg.text);
        },
        startCallback: async () => {
          setIsConnected(true);
        },
        endCallback: async () => {
          setIsConnected(false);
        },
      });

      agentRef.current = agent;
      await agent.start();
      await agent.waitForConnect(10);
    } catch (error) {
      console.error('Error:', error);
    }
  };

  const stopConversation = async () => {
    if (agentRef.current) {
      await agentRef.current.stop();
      agentRef.current = null;
    }
  };

  return (
    <div>
      <h2>Voice Chat</h2>
      {!isConnected ? (
        <button onClick={startConversation}>Start Voice Chat</button>
      ) : (
        <button onClick={stopConversation}>Stop Voice Chat</button>
      )}
      <div>Transcript: {transcript}</div>
    </div>
  );
}

export default VoiceChat;

Text-Based Conversation (Node.js)

const { ConversationAgent, InteractionType } = require('sarvam-conv-ai-sdk');

async function main() {
  const agent = new ConversationAgent({
    apiKey: 'your_api_key',
    config: {
      org_id: 'your_org_id',
      workspace_id: 'your_workspace_id',
      app_id: 'your_app_id',
      user_identifier: '[email protected]',
      user_identifier_type: 'email',
      interaction_type: InteractionType.TEXT,
      input_sample_rate: 16000,
      output_sample_rate: 16000,
    },
    textCallback: async (msg) => {
      console.log('Agent:', msg.text);
    },
    startCallback: async () => {
      console.log('Conversation started!');
    },
  });

  await agent.start();
  const connected = await agent.waitForConnect(10);
  
  if (connected) {
    await agent.sendText('Hello, how are you?');
    await agent.waitForDisconnect();
  }
}

main().catch(console.error);

API Reference

ConversationAgent

The main class for managing conversational AI sessions.

Constructor Parameters

| Parameter | Type | Required | Description | | --- | --- | --- | --- | | apiKey | string | Yes | API key for authentication | | config | InteractionConfig | Yes | Interaction configuration | | platform | 'browser' | 'node' | No | Platform type (auto-detected) | | audioInterface | AsyncAudioInterface | No | Audio interface for voice interactions | | textCallback | (msg: ServerTextMsgType) => Promise<void> | No | Receives streaming text chunks | | audioCallback | (msg: ServerAudioChunkMsg) => Promise<void> | No | Receives audio chunks | | eventCallback | (event: ServerEventBase) => Promise<void> | No | Receives events | | startCallback | () => Promise<void> | No | Called when conversation starts | | endCallback | () => Promise<void> | No | Called when conversation ends | | baseUrl | string | No | Override base URL |

Methods

  • async start() - Start the conversation session
  • async stop() - Stop the conversation and cleanup
  • async waitForConnect(timeout?) - Wait for connection (returns boolean)
  • async waitForDisconnect() - Wait until disconnected
  • isConnected() - Check connection status
  • getInteractionId() - Get current interaction ID
  • async sendAudio(audioData) - Send raw audio (voice mode only)
  • async sendText(text) - Send text message (text mode only)
  • getAgentType() - Get agent type ('voice' or 'text')

InteractionConfig

Required Fields

| Field | Type | Description | | --- | --- | --- | | user_identifier_type | string | One of: 'custom', 'email', 'phone_number', 'unknown' | | user_identifier | string | User identifier value | | org_id | string | Your organization ID | | workspace_id | string | Your workspace ID | | app_id | string | The target application ID | | interaction_type | InteractionType | InteractionType.CALL or InteractionType.TEXT | | input_sample_rate | InputSampleRate | Input audio sample rate: 8000 or 16000 Hz | | output_sample_rate | OutputSampleRate | Output audio sample rate: 16000 or 22050 Hz |

Optional Fields

| Field | Type | Description | | --- | --- | --- | | version | number | App version (uses latest if not provided) | | agent_variables | Record<string, any> | Key-value pairs for agent context | | initial_language_name | SarvamToolLanguageName | Starting language | | initial_state_name | string | Starting state name | | initial_bot_message | string | First message from agent |

BrowserAudioInterface

Handles microphone capture and speaker playback in browser environments.

import { BrowserAudioInterface } from 'sarvam-conv-ai-sdk';

const audioInterface = new BrowserAudioInterface();

Features:

  • Automatic microphone access and audio capture
  • Real-time audio streaming at 16kHz
  • Automatic speaker playback
  • Handles user interruptions

Requirements:

  • HTTPS connection (required for microphone access)
  • Modern browser with WebAudio API support
  • User permission for microphone access

Event Handling

Text Callback

Receives streaming text chunks from the agent:

textCallback: async (msg: ServerTextMsgType) => {
  console.log('Agent says:', msg.text);
}

Event Callback

Receives various events during conversation:

eventCallback: async (event: ServerEventBase) => {
  switch (event.type) {
    case 'server.action.interaction_connected':
      console.log('Connected');
      break;
    case 'server.event.user_interrupt':
      console.log('User interrupted');
      break;
    case 'server.action.interaction_end':
      console.log('Conversation ended');
      break;
    case 'server.event.user_speech_start':
      console.log('User started speaking');
      break;
    case 'server.event.user_speech_end':
      console.log('User stopped speaking');
      break;
  }
}

Supported Languages

The SDK supports 11 Indian languages plus English:

import { SarvamToolLanguageName } from 'sarvam-conv-ai-sdk';

// Available: BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL, 
// TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH

const config = {
  initial_language_name: SarvamToolLanguageName.HINDI,
};

Best Practices

Resource Cleanup: Always cleanup resources when component unmounts

useEffect(() => {
  return () => agentRef.current?.stop().catch(console.error);
}, []);

Connection Timeout: Specify timeout when waiting for connection

const connected = await agent.waitForConnect(10); // 10 seconds
if (!connected) console.error('Connection timeout');

Error Handling: Wrap agent operations in try-catch blocks

try {
  await agent.start();
  await agent.waitForConnect(10);
} catch (error) {
  console.error('Error:', error);
  await agent.stop();
}

Secure API Keys: Use environment variables or backend proxy

// Use environment variables
const apiKey = import.meta.env.VITE_SARVAM_API_KEY;

// Or use backend proxy
const agent = new ConversationAgent({ baseUrl: '/api/proxy/' });

Examples

  • Web Example - See examples/web for a complete React + TypeScript application
  • Node.js Example - See examples/nodejs/simple-text-chat.js for a command-line text chat

Troubleshooting

Microphone Not Working: Ensure HTTPS connection, check browser permissions, verify microphone is not in use by another app

Connection Timeout: Check network connectivity, verify API key is valid, ensure app_id exists and has a committed version

Audio Quality Issues: Verify sample rate matches configuration (8000, 16000, or 22050), ensure audio format is LINEAR16 (16-bit PCM mono)

License

MIT