@voicelayer/sdk

v0.1.2

Published

8 days ago

Official SDK for VoiceLayer — define, run, and orchestrate voice AI agents on the VoiceLayer control plane.

0High
0Medium
0Low

paradox16

voicelayer voice voice-ai voice-agents ai-agents conversational-ai livekit telephony realtime stt tts sdk

@voicelayer/sdk

Build voice AI agents that run real business processes — in one file.

# The SDK is lean. Providers (STT / TTS / LLM / VAD / turn detection) install
# separately, so you only pull in the vendors you actually use. These four are
# the default pipeline:
pnpm add @voicelayer/sdk \
  @voicelayer/agents-plugin-deepgram \
  @voicelayer/agents-plugin-openai \
  @voicelayer/agents-plugin-silero \
  @voicelayer/agents-plugin-livekit

See Providers for the full list. Use a provider you haven't installed and the SDK throws a single, actionable error telling you what to add.

Hello, voice agent (30 seconds)

// agent.ts
import { defineAgent } from '@voicelayer/sdk';

export default await defineAgent({
  name: 'hello',
  prompt: 'You are a friendly receptionist. Ask the caller how you can help.',
}).start(import.meta.url);

node --import tsx agent.ts dev

Call your LiveKit number. The agent answers. That's it. STT, LLM, TTS, VAD, turn detection — defaults are picked for you. Swap them later with one config line.

A real agent (FNOL in 100 lines)

The voice plumbing isn't the hard part. The hard part is running a business process to completion: collecting required fields, validating them, escalating on legal threats, persisting memory across calls, handing off to a human, and writing the result back to your CRM.

That's what this SDK does. One declaration:

import { defineAgent, defineProcess } from '@voicelayer/sdk';
import { salesforce } from '@voicelayer/connector-salesforce';

export default await defineAgent({
  name: 'fnol',
  prompt: 'You are a calm First Notice of Loss intake agent.',

  // 1. The business process. Declarative — the SDK extracts fields from
  //    each utterance, validates them, and decides when the call can end.
  process: defineProcess({
    collect: {
      incidentDate:     { type: 'datetime', required: true },
      vehiclesInvolved: { type: 'string[]', required: true, min: 1 },
      injuries:         { type: 'boolean',  required: true },
      policeReport:     { type: 'boolean' },
      claimNumber:      { type: 'string',   required: true, pattern: /^[A-Z]{2}-\d{8}$/ },
    },
    completeWhen: 'all-required-captured',
    onComplete: async (data, ctx) => {
      const claim = await ctx.connectors.salesforce.createClaim(data);
      await ctx.say(`Your claim number is ${claim.id}. You will receive a text shortly.`);
    },
  }),

  // 2. Triggers — fire deterministically before the LLM sees the turn.
  triggers: {
    legalThreat:  { on: /\b(attorney|lawyer|sue|lawsuit)\b/i, then: 'handoff' },
    severeInjury: { on: 'signal:medical_emergency',          then: 'handoff' },
  },

  // 3. Handoff — real SIP transfer, with state briefing for the human agent.
  handoff: {
    fallback: '+15555550100',
    briefing: ({ data, transcript }) =>
      `Caller reported FNOL. Captured: ${JSON.stringify(data)}. Last 3 turns: ${transcript.tail(3)}`,
  },

  // 4. Memory — persisted across calls, scoped to the caller. PII auto-redacted.
  memory: {
    scope: 'caller',
    retentionDays: 90,
    piiFields: ['ssn', 'dob'],
  },

  // 5. Connectors — auto-wired tools, available to the LLM and in ctx.
  connectors: {
    salesforce: salesforce({ objects: ['Claim', 'Account'] }),
  },

  // 6. Compliance — opt-in posture, defaults are safe.
  compliance: ['tcpa'],
}).start(import.meta.url);

That's a complete production FNOL agent. 60 lines of config, no boilerplate. Same file runs in dev and prod. Same file is what LiveKit forks for each call.

Mental model

You declare one Agent. The Agent has six knobs:

| Knob | What it does | |---------------|--------------------------------------------------------------------| | prompt | What the LLM says when nothing else fires. | | process | The structured outcome the call exists to produce. | | triggers | Deterministic routes that run before the LLM (regex, signals). | | handoff | How to transfer to a human, with what context. | | memory | What persists across calls, scoped, with PII rules. | | connectors | Typed external systems the agent can read/write. |

Everything else (STT, TTS, LLM, VAD, turn detection, transcript persistence, security guards, telemetry) is on by default with sane choices, and swappable in one line when you need to.

Three tiers of customization

Tier 1 — Config only. Covers ~80% of voice agents. The example above is Tier 1.

Tier 2 — Hooks. When config isn't enough, drop into hooks. Same agent, same file:

defineAgent({
  // ...
  async onUtterance(text, ctx) {
    if (text.includes('refund')) ctx.metadata.flagged = true;
  },
  async onFieldCaptured(field, value, ctx) {
    if (field === 'claimNumber') await ctx.connectors.salesforce.lookup(value);
  },
  async onCallEnd(outcome, ctx) {
    await ctx.connectors.slack.send(`#claims`, `Call ${ctx.call.id} ended: ${outcome.reason}`);
  },
});

Tier 3 — Full control. When hooks aren't enough, take the wheel:

defineAgent({
  // ...
  async onCall(ctx) {
    // You own the entire conversation loop.
    await ctx.say('Hi! What is your policy number?');
    const policy = await ctx.ask();
    const account = await ctx.connectors.salesforce.lookupPolicy(policy);
    // ... do whatever ...
  },
});

You never have to leave the SDK. You never import a primitive.

Swapping models

Each provider is a small factory you import from the SDK. Pick the pieces you want; the SDK wires them into the pipeline:

import { defineAgent, deepgram, openai, cartesia } from '@voicelayer/sdk';

defineAgent({
  // ...
  models: {
    stt: deepgram.stt({ model: 'nova-3' }),
    llm: openai.llm({ model: 'gpt-4o' }),
    tts: cartesia.tts({ voice: 'sonic-english' }), // pnpm add @voicelayer/agents-plugin-cartesia
  },
});

Provider plugins load lazily — the vendor SDK (and its native binaries) only loads on first use, never at import time.

Providers

Install only what you use:

| Provider | Package | Factory | |----------|---------|---------| | Deepgram (STT, TTS) | @voicelayer/agents-plugin-deepgram | deepgram.stt(), deepgram.tts() | | OpenAI (LLM, TTS, Realtime) | @voicelayer/agents-plugin-openai | openai.llm(), openai.tts(), openai.realtime() | | Cartesia (TTS, STT) | @voicelayer/agents-plugin-cartesia | cartesia.tts(), cartesia.stt() | | ElevenLabs (TTS) | @voicelayer/agents-plugin-elevenlabs | elevenlabs.tts() | | AssemblyAI (STT) | @voicelayer/agents-plugin-assemblyai | assemblyai.stt() | | Google Gemini (LLM, Realtime) | @voicelayer/agents-plugin-google | google.llm(), google.realtime() | | Silero (VAD) | @voicelayer/agents-plugin-silero | silero.vad() | | LiveKit (turn detection) | @voicelayer/agents-plugin-livekit | livekitTurn.english(), livekitTurn.multilingual() |

Each is a thin, versioned wrapper over the matching @livekit/agents-plugin-* package. Need a provider that isn't listed? Wrap any LiveKit plugin instance with wrap(...), or implement the STTProvider / LLMProvider / TTSProvider interfaces directly.

Connectors

Connectors are typed integrations with external systems. They show up on ctx.connectors and (optionally) as LLM tools.

import { salesforce, twilio, slack, zendesk, webhook } from '@voicelayer/connectors';

defineAgent({
  // ...
  connectors: {
    crm:    salesforce({ objects: ['Claim'] }),
    sms:    twilio(),
    alerts: slack({ channel: '#claims' }),
    custom: webhook({ url: 'https://your-api.example.com/voicelayer' }),
  },
});

Available connectors: salesforce, twilio, slack, zendesk, webhook, epic (HIPAA-scoped), servicenow. More on the roadmap.

Testing locally

import { testAgent } from '@voicelayer/sdk/testing';
import agent from './agent.js';

test('captures all required fields', async () => {
  const result = await testAgent(agent)
    .say('I was in a wreck yesterday with my Honda Civic, no injuries.')
    .say('Claim number is AB-12345678.')
    .end();

  expect(result.process.complete).toBe(true);
  expect(result.process.data.vehiclesInvolved).toEqual(['Honda Civic']);
});

No LiveKit, no audio, no API keys. Pure unit-test speed.

Production

await start(import.meta.url) boots a LiveKit worker in the parent process and exports the resolved LiveKit agent definition for the child job runner that re-imports the same file. The same file. No deploy pipeline differences between local and prod beyond environment variables.

# Required. The worker runs on YOUR infrastructure and connects directly to
# LiveKit, so it needs your own LiveKit project credentials (LiveKit Cloud or
# self-hosted) alongside your VoiceLayer project key.
export VOICELAYER_API_KEY=...
export LIVEKIT_URL=...
export LIVEKIT_API_KEY=...
export LIVEKIT_API_SECRET=...

# Provider keys for the models your agent uses (or set them per project in the
# VoiceLayer dashboard's connections, resolved at call time):
export OPENAI_API_KEY=...
export DEEPGRAM_API_KEY=...

node agent.ts start

What's not in this SDK

No "primitive" packages to import. The internal modularity is the platform team's problem, not yours.
No multi-file boilerplate. One file per agent.
No state-machine framework you have to learn. process is a field bag with validation; triggers are if/then rules. If you need branching flows, you use onUtterance or onCall.
No "agent vs module" distinction. One declaration, one mental model.

Stability

The SDK is currently 0.x. APIs may change between minor versions while we find product-market fit. We will document every breaking change in the changelog.

At 1.0 we'll commit to semver and a stable public surface. Until then, expect movement.

Reference

API: docs/API.md
Architecture (what's underneath): docs/architecture.md
Migration from 0.x → 0.next: CHANGELOG.md

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@voicelayer/sdk

Hello, voice agent (30 seconds)

A real agent (FNOL in 100 lines)

Mental model

Three tiers of customization

Swapping models

Providers

Connectors

Testing locally

Production

What's not in this SDK

Stability

Reference