soprano-ai

v0.1.7

Published

6 days ago

Browser-safe voice agent loop with ASR, TTS, memory, and tool-calling orchestration.

0High
0Medium
0Low

lingosandi

ai voice-agent tts asr cartesia funasr agentic-loop

soprano-ai

Browser-safe voice agent runtime for ASR, LLM streaming, Cartesia TTS playback, memory, and tool-calling orchestration.

Secrets

This package does not ship API keys. Inject your own provider config, DashScope ASR key, and Cartesia key when creating VoiceAgentService.

import {
  AudioStreamPlayer,
  FunASRService,
  createSopranoVoiceAgent,
} from "soprano-ai"

const agent = createSopranoVoiceAgent({
  player: new AudioStreamPlayer(),
  asr: new FunASRService({ bridgeUrl: () => "ws://127.0.0.1:9231" }),
  apiKeys: {
    qwenApiKey: "YOUR_QWEN_DASHSCOPE_API_KEY",
    cartesiaApiKey: "YOUR_CARTESIA_API_KEY",
  },
})

Prompt and greeting

Use systemPrompt to replace the default voice-agent persona. Use greetingMessage to make first-time init() speak an exact greeting line. If persisted memory shows the user is returning, init() asks the LLM to generate a brief returning-user greeting instead:

const agent = createSopranoVoiceAgent({
  player: new AudioStreamPlayer(),
  asr: new FunASRService({ bridgeUrl: () => "ws://127.0.0.1:9231" }),
  apiKeys: {
    qwenApiKey: "YOUR_QWEN_DASHSCOPE_API_KEY",
    cartesiaApiKey: "YOUR_CARTESIA_API_KEY",
  },
  systemPrompt: "You are a concise, warm voice assistant.",
  greetingMessage: "Hey, I'm ready when you are.",
})

TTS voice and quality

Cartesia TTS defaults to the package voice and the original low-quality 8 kHz PCM stream. To choose a different Cartesia voice, pass its voice ID as ttsVoiceId. To opt into higher-quality 24 kHz PCM, pass the same quality to the default TTS config and to AudioStreamPlayer:

const ttsQuality = "high" as const
const ttsVoiceId = "f786b574-daa5-4673-aa0c-cbe3e8534c02"

const agent = createSopranoVoiceAgent({
  player: new AudioStreamPlayer({ quality: ttsQuality }),
  asr: new FunASRService({ bridgeUrl: () => "ws://127.0.0.1:9231" }),
  apiKeys: {
    qwenApiKey: "YOUR_QWEN_DASHSCOPE_API_KEY",
    cartesiaApiKey: "YOUR_CARTESIA_API_KEY",
  },
  ttsQuality,
  ttsVoiceId,
})

The chat orb or any other UI can subscribe with agent.on(...) and drive the service through init(), startListening(), sendTextOnly(), interrupt(), and destroy().

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

soprano-ai

Secrets

Prompt and greeting

TTS voice and quality