sarvam-conv-ai-sdk
v0.0.42
Published
TypeScript SDK for Sarvam Conversational AI
Downloads
1,283
Maintainers
Readme
Sarvam Conv AI SDK
TypeScript SDK for building real-time voice-to-voice and text-based conversational AI applications across multiple platforms.
Features
- Real-time voice-to-voice conversations in the browser
- Text-based chat with streaming responses
- Automatic microphone capture and speaker playback
- Multi-language support (11 Indian languages + English)
- WebSocket-based real-time communication
- Cross-platform: Browser, React Native, and Node.js support
Installation
For Web/Browser
npm install sarvam-conv-ai-sdkFor React Native
npm install sarvam-conv-ai-sdk
npm install react-native-audio-apiFor Node.js
npm install sarvam-conv-ai-sdk wsPlatform-Specific Imports
⚠️ Important: Always use platform-specific imports to avoid bundling errors and reduce bundle size.
The SDK provides platform-optimized entry points:
Browser/Web
// ✅ Always use the /browser entry point for web applications
import { ConversationAgent, BrowserAudioInterface } from 'sarvam-conv-ai-sdk/browser';Why? The browser entry point excludes React Native dependencies, preventing bundler errors like Cannot resolve 'react-native'.
React Native
// ✅ Always use the /react-native entry point for React Native apps
import { ConversationAgent, RNAudioInterface } from 'sarvam-conv-ai-sdk/react-native';Why? The React Native entry point includes native module support for iOS and Android.
Node.js
// Use the default entry point for Node.js
import { ConversationAgent } from 'sarvam-conv-ai-sdk';Quick Start
Voice-to-Voice Conversation (Browser)
import React, { useRef, useState } from 'react';
import {
ConversationAgent,
BrowserAudioInterface,
InteractionType,
type ServerTextMsgType,
} from 'sarvam-conv-ai-sdk/browser';
function VoiceChat() {
const [isConnected, setIsConnected] = useState(false);
const [transcript, setTranscript] = useState('');
const agentRef = useRef<ConversationAgent | null>(null);
const startConversation = async () => {
try {
const audioInterface = new BrowserAudioInterface();
const agent = new ConversationAgent({
apiKey: 'your_api_key',
platform: 'browser',
config: {
user_identifier_type: 'custom',
user_identifier: 'user123',
org_id: 'your_org_id',
workspace_id: 'your_workspace_id',
app_id: 'your_app_id',
interaction_type: InteractionType.CALL,
input_sample_rate: 16000,
output_sample_rate: 16000,
},
audioInterface,
textCallback: async (msg: ServerTextMsgType) => {
setTranscript(prev => prev + msg.text);
},
startCallback: async () => {
setIsConnected(true);
},
endCallback: async () => {
setIsConnected(false);
},
});
agentRef.current = agent;
await agent.start();
await agent.waitForConnect(10);
} catch (error) {
console.error('Error:', error);
}
};
const stopConversation = async () => {
if (agentRef.current) {
await agentRef.current.stop();
agentRef.current = null;
}
};
return (
<div>
<h2>Voice Chat</h2>
{!isConnected ? (
<button onClick={startConversation}>Start Voice Chat</button>
) : (
<button onClick={stopConversation}>Stop Voice Chat</button>
)}
<div>Transcript: {transcript}</div>
</div>
);
}
export default VoiceChat;Text-Based Conversation (Node.js)
const { ConversationAgent, InteractionType } = require('sarvam-conv-ai-sdk');
async function main() {
const agent = new ConversationAgent({
apiKey: 'your_api_key',
config: {
org_id: 'your_org_id',
workspace_id: 'your_workspace_id',
app_id: 'your_app_id',
user_identifier: '[email protected]',
user_identifier_type: 'email',
interaction_type: InteractionType.TEXT,
input_sample_rate: 16000,
output_sample_rate: 16000,
},
textCallback: async (msg) => {
console.log('Agent:', msg.text);
},
startCallback: async () => {
console.log('Conversation started!');
},
});
await agent.start();
const connected = await agent.waitForConnect(10);
if (connected) {
await agent.sendText('Hello, how are you?');
await agent.waitForDisconnect();
}
}
main().catch(console.error);API Reference
ConversationAgent
The main class for managing conversational AI sessions.
Constructor Parameters
| Parameter | Type | Required | Description | | --- | --- | --- | --- | | apiKey | string | Yes | API key for authentication | | config | InteractionConfig | Yes | Interaction configuration | | platform | 'browser' | 'node' | No | Platform type (auto-detected) | | audioInterface | AsyncAudioInterface | No | Audio interface for voice interactions | | textCallback | (msg: ServerTextMsgType) => Promise<void> | No | Receives streaming text chunks | | audioCallback | (msg: ServerAudioChunkMsg) => Promise<void> | No | Receives audio chunks | | eventCallback | (event: ServerEventBase) => Promise<void> | No | Receives events | | startCallback | () => Promise<void> | No | Called when conversation starts | | endCallback | () => Promise<void> | No | Called when conversation ends | | baseUrl | string | No | Override base URL |
Methods
async start()- Start the conversation sessionasync stop()- Stop the conversation and cleanupasync waitForConnect(timeout?)- Wait for connection (returns boolean)async waitForDisconnect()- Wait until disconnectedisConnected()- Check connection statusgetInteractionId()- Get current interaction IDasync sendAudio(audioData)- Send raw audio (voice mode only)async sendText(text)- Send text message (text mode only)getAgentType()- Get agent type ('voice' or 'text')
InteractionConfig
Required Fields
| Field | Type | Description | | --- | --- | --- | | user_identifier_type | string | One of: 'custom', 'email', 'phone_number', 'unknown' | | user_identifier | string | User identifier value | | org_id | string | Your organization ID | | workspace_id | string | Your workspace ID | | app_id | string | The target application ID | | interaction_type | InteractionType | InteractionType.CALL or InteractionType.TEXT | | input_sample_rate | InputSampleRate | Input audio sample rate: 8000 or 16000 Hz | | output_sample_rate | OutputSampleRate | Output audio sample rate: 16000 or 22050 Hz |
Optional Fields
| Field | Type | Description | | --- | --- | --- | | version | number | App version (uses latest if not provided) | | agent_variables | Record<string, any> | Key-value pairs for agent context | | initial_language_name | SarvamToolLanguageName | Starting language | | initial_state_name | string | Starting state name | | initial_bot_message | string | First message from agent |
BrowserAudioInterface
Handles microphone capture and speaker playback in browser environments.
import { BrowserAudioInterface } from 'sarvam-conv-ai-sdk';
const audioInterface = new BrowserAudioInterface();Features:
- Automatic microphone access and audio capture
- Real-time audio streaming at 16kHz
- Automatic speaker playback
- Handles user interruptions
Requirements:
- HTTPS connection (required for microphone access)
- Modern browser with WebAudio API support
- User permission for microphone access
Event Handling
Text Callback
Receives streaming text chunks from the agent:
textCallback: async (msg: ServerTextMsgType) => {
console.log('Agent says:', msg.text);
}Event Callback
Receives various events during conversation:
eventCallback: async (event: ServerEventBase) => {
switch (event.type) {
case 'server.action.interaction_connected':
console.log('Connected');
break;
case 'server.event.user_interrupt':
console.log('User interrupted');
break;
case 'server.action.interaction_end':
console.log('Conversation ended');
break;
case 'server.event.user_speech_start':
console.log('User started speaking');
break;
case 'server.event.user_speech_end':
console.log('User stopped speaking');
break;
}
}Supported Languages
The SDK supports 11 Indian languages plus English:
import { SarvamToolLanguageName } from 'sarvam-conv-ai-sdk';
// Available: BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL,
// TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH
const config = {
initial_language_name: SarvamToolLanguageName.HINDI,
};Best Practices
Resource Cleanup: Always cleanup resources when component unmounts
useEffect(() => {
return () => agentRef.current?.stop().catch(console.error);
}, []);Connection Timeout: Specify timeout when waiting for connection
const connected = await agent.waitForConnect(10); // 10 seconds
if (!connected) console.error('Connection timeout');Error Handling: Wrap agent operations in try-catch blocks
try {
await agent.start();
await agent.waitForConnect(10);
} catch (error) {
console.error('Error:', error);
await agent.stop();
}Secure API Keys: Use environment variables or backend proxy
// Use environment variables
const apiKey = import.meta.env.VITE_SARVAM_API_KEY;
// Or use backend proxy
const agent = new ConversationAgent({ baseUrl: '/api/proxy/' });Examples
- Web Example - See
examples/webfor a complete React + TypeScript application - Node.js Example - See
examples/nodejs/simple-text-chat.jsfor a command-line text chat
Troubleshooting
Microphone Not Working: Ensure HTTPS connection, check browser permissions, verify microphone is not in use by another app
Connection Timeout: Check network connectivity, verify API key is valid, ensure app_id exists and has a committed version
Audio Quality Issues: Verify sample rate matches configuration (8000, 16000, or 22050), ensure audio format is LINEAR16 (16-bit PCM mono)
License
MIT
