@voicelayer/sdk
v0.1.2
Published
Official SDK for VoiceLayer — define, run, and orchestrate voice AI agents on the VoiceLayer control plane.
Maintainers
Readme
@voicelayer/sdk
Build voice AI agents that run real business processes — in one file.
# The SDK is lean. Providers (STT / TTS / LLM / VAD / turn detection) install
# separately, so you only pull in the vendors you actually use. These four are
# the default pipeline:
pnpm add @voicelayer/sdk \
@voicelayer/agents-plugin-deepgram \
@voicelayer/agents-plugin-openai \
@voicelayer/agents-plugin-silero \
@voicelayer/agents-plugin-livekitSee Providers for the full list. Use a provider you haven't installed and the SDK throws a single, actionable error telling you what to add.
Hello, voice agent (30 seconds)
// agent.ts
import { defineAgent } from '@voicelayer/sdk';
export default await defineAgent({
name: 'hello',
prompt: 'You are a friendly receptionist. Ask the caller how you can help.',
}).start(import.meta.url);node --import tsx agent.ts devCall your LiveKit number. The agent answers. That's it. STT, LLM, TTS, VAD, turn detection — defaults are picked for you. Swap them later with one config line.
A real agent (FNOL in 100 lines)
The voice plumbing isn't the hard part. The hard part is running a business process to completion: collecting required fields, validating them, escalating on legal threats, persisting memory across calls, handing off to a human, and writing the result back to your CRM.
That's what this SDK does. One declaration:
import { defineAgent, defineProcess } from '@voicelayer/sdk';
import { salesforce } from '@voicelayer/connector-salesforce';
export default await defineAgent({
name: 'fnol',
prompt: 'You are a calm First Notice of Loss intake agent.',
// 1. The business process. Declarative — the SDK extracts fields from
// each utterance, validates them, and decides when the call can end.
process: defineProcess({
collect: {
incidentDate: { type: 'datetime', required: true },
vehiclesInvolved: { type: 'string[]', required: true, min: 1 },
injuries: { type: 'boolean', required: true },
policeReport: { type: 'boolean' },
claimNumber: { type: 'string', required: true, pattern: /^[A-Z]{2}-\d{8}$/ },
},
completeWhen: 'all-required-captured',
onComplete: async (data, ctx) => {
const claim = await ctx.connectors.salesforce.createClaim(data);
await ctx.say(`Your claim number is ${claim.id}. You will receive a text shortly.`);
},
}),
// 2. Triggers — fire deterministically before the LLM sees the turn.
triggers: {
legalThreat: { on: /\b(attorney|lawyer|sue|lawsuit)\b/i, then: 'handoff' },
severeInjury: { on: 'signal:medical_emergency', then: 'handoff' },
},
// 3. Handoff — real SIP transfer, with state briefing for the human agent.
handoff: {
fallback: '+15555550100',
briefing: ({ data, transcript }) =>
`Caller reported FNOL. Captured: ${JSON.stringify(data)}. Last 3 turns: ${transcript.tail(3)}`,
},
// 4. Memory — persisted across calls, scoped to the caller. PII auto-redacted.
memory: {
scope: 'caller',
retentionDays: 90,
piiFields: ['ssn', 'dob'],
},
// 5. Connectors — auto-wired tools, available to the LLM and in ctx.
connectors: {
salesforce: salesforce({ objects: ['Claim', 'Account'] }),
},
// 6. Compliance — opt-in posture, defaults are safe.
compliance: ['tcpa'],
}).start(import.meta.url);That's a complete production FNOL agent. 60 lines of config, no boilerplate. Same file runs in dev and prod. Same file is what LiveKit forks for each call.
Mental model
You declare one Agent. The Agent has six knobs:
| Knob | What it does |
|---------------|--------------------------------------------------------------------|
| prompt | What the LLM says when nothing else fires. |
| process | The structured outcome the call exists to produce. |
| triggers | Deterministic routes that run before the LLM (regex, signals). |
| handoff | How to transfer to a human, with what context. |
| memory | What persists across calls, scoped, with PII rules. |
| connectors | Typed external systems the agent can read/write. |
Everything else (STT, TTS, LLM, VAD, turn detection, transcript persistence, security guards, telemetry) is on by default with sane choices, and swappable in one line when you need to.
Three tiers of customization
Tier 1 — Config only. Covers ~80% of voice agents. The example above is Tier 1.
Tier 2 — Hooks. When config isn't enough, drop into hooks. Same agent, same file:
defineAgent({
// ...
async onUtterance(text, ctx) {
if (text.includes('refund')) ctx.metadata.flagged = true;
},
async onFieldCaptured(field, value, ctx) {
if (field === 'claimNumber') await ctx.connectors.salesforce.lookup(value);
},
async onCallEnd(outcome, ctx) {
await ctx.connectors.slack.send(`#claims`, `Call ${ctx.call.id} ended: ${outcome.reason}`);
},
});Tier 3 — Full control. When hooks aren't enough, take the wheel:
defineAgent({
// ...
async onCall(ctx) {
// You own the entire conversation loop.
await ctx.say('Hi! What is your policy number?');
const policy = await ctx.ask();
const account = await ctx.connectors.salesforce.lookupPolicy(policy);
// ... do whatever ...
},
});You never have to leave the SDK. You never import a primitive.
Swapping models
Each provider is a small factory you import from the SDK. Pick the pieces you want; the SDK wires them into the pipeline:
import { defineAgent, deepgram, openai, cartesia } from '@voicelayer/sdk';
defineAgent({
// ...
models: {
stt: deepgram.stt({ model: 'nova-3' }),
llm: openai.llm({ model: 'gpt-4o' }),
tts: cartesia.tts({ voice: 'sonic-english' }), // pnpm add @voicelayer/agents-plugin-cartesia
},
});Provider plugins load lazily — the vendor SDK (and its native binaries) only loads on first use, never at import time.
Providers
Install only what you use:
| Provider | Package | Factory |
|----------|---------|---------|
| Deepgram (STT, TTS) | @voicelayer/agents-plugin-deepgram | deepgram.stt(), deepgram.tts() |
| OpenAI (LLM, TTS, Realtime) | @voicelayer/agents-plugin-openai | openai.llm(), openai.tts(), openai.realtime() |
| Cartesia (TTS, STT) | @voicelayer/agents-plugin-cartesia | cartesia.tts(), cartesia.stt() |
| ElevenLabs (TTS) | @voicelayer/agents-plugin-elevenlabs | elevenlabs.tts() |
| AssemblyAI (STT) | @voicelayer/agents-plugin-assemblyai | assemblyai.stt() |
| Google Gemini (LLM, Realtime) | @voicelayer/agents-plugin-google | google.llm(), google.realtime() |
| Silero (VAD) | @voicelayer/agents-plugin-silero | silero.vad() |
| LiveKit (turn detection) | @voicelayer/agents-plugin-livekit | livekitTurn.english(), livekitTurn.multilingual() |
Each is a thin, versioned wrapper over the matching @livekit/agents-plugin-*
package. Need a provider that isn't listed? Wrap any LiveKit plugin instance
with wrap(...), or implement the STTProvider / LLMProvider / TTSProvider
interfaces directly.
Connectors
Connectors are typed integrations with external systems. They show up on ctx.connectors and (optionally) as LLM tools.
import { salesforce, twilio, slack, zendesk, webhook } from '@voicelayer/connectors';
defineAgent({
// ...
connectors: {
crm: salesforce({ objects: ['Claim'] }),
sms: twilio(),
alerts: slack({ channel: '#claims' }),
custom: webhook({ url: 'https://your-api.example.com/voicelayer' }),
},
});Available connectors: salesforce, twilio, slack, zendesk, webhook, epic (HIPAA-scoped), servicenow. More on the roadmap.
Testing locally
import { testAgent } from '@voicelayer/sdk/testing';
import agent from './agent.js';
test('captures all required fields', async () => {
const result = await testAgent(agent)
.say('I was in a wreck yesterday with my Honda Civic, no injuries.')
.say('Claim number is AB-12345678.')
.end();
expect(result.process.complete).toBe(true);
expect(result.process.data.vehiclesInvolved).toEqual(['Honda Civic']);
});No LiveKit, no audio, no API keys. Pure unit-test speed.
Production
await start(import.meta.url) boots a LiveKit worker in the parent process and exports the resolved LiveKit agent definition for the child job runner that re-imports the same file. The same file. No deploy pipeline differences between local and prod beyond environment variables.
# Required. The worker runs on YOUR infrastructure and connects directly to
# LiveKit, so it needs your own LiveKit project credentials (LiveKit Cloud or
# self-hosted) alongside your VoiceLayer project key.
export VOICELAYER_API_KEY=...
export LIVEKIT_URL=...
export LIVEKIT_API_KEY=...
export LIVEKIT_API_SECRET=...
# Provider keys for the models your agent uses (or set them per project in the
# VoiceLayer dashboard's connections, resolved at call time):
export OPENAI_API_KEY=...
export DEEPGRAM_API_KEY=...
node agent.ts startWhat's not in this SDK
- No "primitive" packages to import. The internal modularity is the platform team's problem, not yours.
- No multi-file boilerplate. One file per agent.
- No state-machine framework you have to learn.
processis a field bag with validation;triggersare if/then rules. If you need branching flows, you useonUtteranceoronCall. - No "agent vs module" distinction. One declaration, one mental model.
Stability
The SDK is currently 0.x. APIs may change between minor versions while we find product-market fit. We will document every breaking change in the changelog.
At 1.0 we'll commit to semver and a stable public surface. Until then, expect movement.
Reference
- API:
docs/API.md - Architecture (what's underneath):
docs/architecture.md - Migration from 0.x → 0.next:
CHANGELOG.md
