npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pinecall/sdk

v0.1.4

Published

Pinecall Voice SDK — connect to agents, handle calls, stream speech in real-time.

Readme


Table of Contents


Install

npm install @pinecall/sdk

Node.js ≥ 18 required. Only runtime dependency: ws.


Quick Start

Server-side LLM (recommended)

The Pinecall server runs the LLM and handles STT/TTS. You configure the agent and handle tool calls locally.

import { Pinecall } from "@pinecall/sdk";

const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const agent = pc.agent("receptionist", {
  voice: "elevenlabs:h2cd3gvcqTp3m65Dysk7",
  language: "es",
  stt: "deepgram-flux",
  llm: {
    engine: "openai",
    model: "gpt-4.1-mini",
    enabled: true,
    prompt: "You are a helpful receptionist. Be concise.",
  },
  tools: [
    {
      type: "function",
      function: {
        name: "lookupOrder",
        description: "Look up an order by ID",
        parameters: {
          type: "object",
          properties: {
            orderId: { type: "string", description: "The order ID" },
          },
          required: ["orderId"],
        },
      },
    },
  ],
});

agent.addChannel("phone", "+18045551234");
agent.addChannel("phone", "sip:[email protected]");
agent.addChannel("webrtc");

// Per-channel overrides: different voice/language per number
agent.addChannel("phone", "+34911234567", {
  voice: "elevenlabs:spanishVoiceId",
  language: "es",
  stt: "deepgram-flux",
});

// Greet on call start
agent.on("call.started", (call) => {
  if (call.direction === "inbound") {
    call.say("Hello! How can I help you today?");
  }
});

// Handle tool calls from the server-side LLM
agent.on("llm.tool_call", async (call, data) => {
  if (!data.tool_calls) return; // skip re-emissions
  const results = [];
  for (const tc of data.tool_calls) {
    const args = JSON.parse(tc.arguments);
    const result = await myToolHandler(tc.name, args);
    results.push({ tool_call_id: tc.id, result });
  }
  agent.send({
    event: "llm.tool_result",
    call_id: call.id,
    msg_id: data.msg_id,
    results,
  });
});

agent.on("call.ended", (call, reason) => {
  console.log(`Call ended: ${reason} (${call.duration}s)`);
});

Client-side LLM (bring your own)

You run the LLM yourself. The server handles STT → text and text → TTS.

import { Pinecall } from "@pinecall/sdk";
import OpenAI from "openai";

const pc = new Pinecall({ apiKey: "pk_..." });
await pc.connect();
const openai = new OpenAI();

const agent = pc.agent("my-bot", { voice: "cartesia:abc", language: "en" });
agent.addChannel("phone", "+13186330963");

agent.on("call.started", (call) => call.say("Hi there!"));

agent.on("turn.end", async (turn, call) => {
  const stream = call.replyStream(turn);

  const completion = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [
      { role: "system", content: "You are helpful. Be concise." },
      { role: "user", content: turn.text },
    ],
    stream: true,
  });

  for await (const chunk of completion) {
    if (stream.aborted) break;
    const token = chunk.choices[0]?.delta?.content;
    if (token) stream.write(token);
  }
  stream.end();
});

Deploy (one-liner)

The fastest way to get an agent running. pc.deploy() combines agent creation, LLM config, and channel registration in a single call:

import { Pinecall } from "@pinecall/sdk";

const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const mara = pc.deploy("mara", {
  prompt: "You are Mara, a friendly voice assistant. Be concise.",
  model: "gpt-4.1-mini",
  voice: "elevenlabs:EXAVITQu4vr4xnSDxMaL",
  language: "es",
  channels: ["webrtc", "+13186330963"],
});

mara.on("call.started", (call) => {
  console.log(`📞 Call from ${call.from}`);
});

mara.on("call.ended", (call, reason) => {
  console.log(`Call ended: ${reason} (${call.duration}s)`);
});

DeployConfig fields:

| Field | Type | Description | |-------|------|-------------| | prompt | string | System prompt for the LLM | | model | string | LLM model (default: gpt-4.1-mini) | | voice | string | TTS voice shortcut (e.g. elevenlabs:voiceId) | | language | string | BCP-47 language code | | stt | string | STT provider (default: deepgram-flux) | | tools | array | OpenAI function-calling tool definitions | | channels | array | "webrtc", "mic", "chat", "whatsapp", or phone numbers | | phones | string[] | Phone numbers (legacy, prefer channels) |

deploy() returns an Agent — you can attach event handlers, add more channels, or hot-reload config.

Greeting: Use call.say() in call.started to speak a greeting:

mara.on("call.started", (call) => call.say("¡Hola! ¿En qué puedo ayudarte?"));

API Reference

Pinecall (client)

WebSocket client. Manages auth, reconnection, and agent multiplexing.

const pc = new Pinecall({
  apiKey: "pk_...",                        // required
  url: "wss://voice.pinecall.io/client",  // default
  reconnect: true,                         // auto-reconnect (default: true)
  pingInterval: 30000,                     // keepalive ms (default: 30000)
});

await pc.connect();                // resolves on auth success
await pc.disconnect();             // graceful close

pc.on("connected", () => {});
pc.on("disconnected", (reason) => {});
pc.on("reconnecting", (attempt) => {});
pc.on("error", (err) => {});

Agent

Created via pc.agent(id, config?) or pc.deploy(id, config). Owns channels, routes call events, and stores defaults.

Creation

const agent = pc.agent("my-agent", {
  voice: "elevenlabs:abc",
  language: "es",
  stt: "deepgram-flux",
  llm: {
    engine: "openai",
    model: "gpt-4.1-mini",
    enabled: true,
    prompt: "System prompt with {{template_vars}}.",
  },
  tools: [/* OpenAI function-calling format */],
});

Channels

agent.addChannel("phone", "+18045551234");
agent.addChannel("phone", "sip:[email protected]");
agent.addChannel("webrtc");

// Per-channel config overrides
agent.addChannel("phone", "+34911234567", {
  voice: "elevenlabs:spanishVoiceId",
  language: "es",
});

// WhatsApp channel (see WhatsApp section for full setup)
agent.addChannel("whatsapp", {
  phoneNumberId: "123456789012345",
  accessToken: "EAABx...",
  verifyToken: "my-secret",
  appSecret: "abc123...",
});

// Update a channel's config at runtime
agent.configureChannel("+34911234567", { voice: "cartesia:newVoice" });

// Remove a channel
agent.removeChannel("+34911234567");

Agent Methods

| Method | Description | |--------|-------------| | agent.addChannel(type, ref?, config?) | Register a phone, webrtc, mic, chat, or whatsapp channel | | agent.removeChannel(ref) | Unregister a channel | | agent.configure(opts) | Hot-reload agent defaults (voice, language, STT, LLM) — affects all future calls | | agent.configureChannel(ref, config) | Update a specific channel's config | | agent.configureSession(callId, opts) | Update config for a live call (equivalent to call.configure) | | agent.dial(opts) | Make an outbound call — returns Promise<Call> | | agent.call(callId) | Get a Call object by ID (undefined if not found) | | agent.getConfig() | Returns the current AgentConfig | | agent.stream() | SSE stream of this agent's events (see SSE) | | agent.send(data) | Send a raw protocol message (low-level) |

agent.configure() — Hot-Reload

Update the agent's defaults at runtime. Changes take effect on all future calls — existing calls are not affected. Sends an agent.configure command over the WebSocket.

// Switch to French voice
agent.configure({ voice: "elevenlabs:frenchVoiceId", language: "fr" });

// Update LLM model
agent.configure({
  llm: { engine: "openai", model: "gpt-4.1", enabled: true,
         prompt: "Updated prompt." },
});

// Swap STT provider
agent.configure({ stt: "gladia" });

No REST call needed. agent.configure() uses the existing WebSocket — changes propagate instantly to the server.

agent.dial() — Outbound Calls

const call = await agent.dial({
  to: "+14155551234",
  from: "+13186330963",
  greeting: "Hi! This is a follow-up call.",  // server speaks via TTS
  metadata: { appointmentId: "appt_001" },
  config: { voice: "cartesia:uuid", language: "ar" }, // per-call override
});

call.on("call.ended", (_, reason) => console.log(`Done: ${reason}`));

| Field | Type | Required | Description | |-------|------|----------|-------------| | to | string | ✅ | Destination number (E.164) | | from | string | ✅ | Caller ID (must be a registered number) | | greeting | string | — | Text the server speaks when callee picks up | | metadata | object | — | Custom data attached to the call | | config | object | — | Per-call config override (voice, STT, language) |

Pinecall (client) — Additional Methods

// Agent management
const agent = pc.getAgent("mara");       // get by ID (undefined if not found)
const removed = pc.removeAgent("mara");  // unregister agent (returns boolean)

// Token generation (for browser WebRTC/Chat connections)
const token = await pc.createToken("webrtc", "mara");
const token = await agent.createToken("chat");

// REST helpers (no WebSocket needed)
const voices = await pc.fetchVoices({ provider: "elevenlabs" });
const phones = await pc.fetchPhones();

Call

Per-session handle. Created automatically on call.started.

Speech

| Method | Description | |--------|-------------| | call.say(text) | Speak text immediately (standalone, no in_reply_to) | | call.reply(text) | Reply to the latest user message (auto-tracks in_reply_to) | | call.replyStream(turn?) | Open a token stream → returns ReplyStream | | call.cancel(msgId?) | Cancel a specific or the current message | | call.clear() | Flush all queued TTS audio |

Greeting pattern: Use call.say() on call.started for inbound greetings. For outbound calls, pass greeting in agent.dial() — the server speaks it via TTS automatically.

// Inbound — SDK speaks the greeting
agent.on("call.started", (call) => {
  if (call.direction === "inbound") {
    call.say("Hello! How can I help you today?");
  }
});

// Outbound — server speaks the greeting
const call = await agent.dial({
  to: "+14155551234",
  from: "+13186330963",
  greeting: "Hi! This is a follow-up call.",
});

Call Control

| Method | Description | |--------|-------------| | call.hangup() | End the call | | call.forward(to, opts?) | Transfer to another number | | call.sendDTMF(digits) | Send DTMF tones (e.g. "1234#") | | call.hold() | Put on hold (plays hold music, mutes mic) | | call.unhold() | Resume from hold | | call.mute() | Mute mic (transcripts buffered) | | call.unmute() | Unmute (emits call.unmuted with buffered transcript) |

Mid-Call Configuration

| Method | Description | |--------|-------------| | call.configure(opts) | Change voice, STT, language — takes effect immediately | | call.setPrompt(text) | Replace the system prompt for this call | | call.setPromptVars(vars) | Set {{variable}} values in the prompt template | | call.addContext(text) | Append extra context after the system prompt | | call.setPromptFile(path) | Load a prompt file and set it |

Conversation History

| Method | Description | |--------|-------------| | call.getHistory() | Fetch conversation messages (OpenAI format) | | call.addHistory(msgs) | Inject messages into history (e.g. CRM context) | | call.setHistory(msgs) | Replace entire conversation history | | call.clearHistory() | Clear history (system prompt preserved) |

Properties

call.id          // "CA7ec979f5..." — unique call ID
call.from        // "+13186330963" or "sip:..."
call.to          // destination number/URI
call.direction   // "inbound" | "outbound"
call.transport   // "phone" | "webrtc" | "unknown"
call.metadata    // custom metadata from the channel
call.transcript  // [{ role: "user", content: "..." }, ...] — user + assistant only
call.messages    // full LLM history (populated on call.ended)
call.duration    // seconds (populated on call.ended)
call.startedAt   // epoch seconds
call.endedAt     // epoch seconds
call.reason      // "hangup" | "timeout" | ...

ReplyStream

Token-by-token streaming for LLM responses. TTS starts as soon as a sentence boundary is detected.

const stream = call.replyStream(turn);

for await (const token of llm.stream(prompt)) {
  if (stream.aborted) break;   // user interrupted
  stream.write(token);
}
stream.end();

Events

Agent Events

Subscribe via agent.on(event, handler). All call-scoped events include call as the last argument.

| Event | Signature | When | |-------|-----------|------| | Lifecycle | | | | call.started | (call) | New call connected | | call.ended | (call, reason) | Call disconnected | | User speech | | | | speech.started | (event, call) | User began speaking (VAD) | | speech.ended | (event, call) | User stopped speaking (VAD) | | user.speaking | (event, call) | Interim STT transcript (updates live) | | user.message | (event, call) | Final confirmed user text | | Turns | | | | eager.turn | (turn, call) | Early turn signal (low-latency response) | | turn.end | (turn, call) | Final turn signal | | turn.continued | (event, call) | User kept talking (auto-aborts active streams) | | Bot speech | | | | bot.speaking | (event, call) | Bot started speaking a message | | bot.word | (event, call) | Individual word as TTS plays it | | bot.finished | (event, call) | Bot finished speaking a message | | bot.interrupted | (event, call) | Bot was cut off by user | | Protocol | | | | message.confirmed | (event, call) | Server acknowledged bot message | | llm.tool_call | (call, data) | Server-side LLM requests a tool call | | session.idle_warning | (event, call) | Idle warning — user hasn't spoken, call will timeout soon | | session.timeout | (event, call) | Session timeout warning (max duration / idle) | | WhatsApp | | | | whatsapp.session_started | (event) | New WhatsApp conversation started | | whatsapp.message | (event) | Incoming WhatsApp message received | | whatsapp.response | (event) | Agent sent a WhatsApp response | | whatsapp.status | (event) | Message delivery status (sent/delivered/read) |

Real-Time Transcript Flow

User speaks    →  speech.started
               →  user.speaking  (interim, fires multiple times)
               →  speech.ended
               →  user.message   (final confirmed text)
               →  eager.turn / turn.end

Bot responds   →  bot.speaking   (message ID assigned)
               →  bot.word       (word-by-word as TTS plays)
               →  bot.finished   (done speaking)

Interruption   →  bot.interrupted
               →  turn.continued (active ReplyStreams auto-aborted)

bot.word Event

Build live transcripts word-by-word:

let currentMessage = "";
agent.on("bot.speaking", () => { currentMessage = ""; });
agent.on("bot.word", (event) => {
  currentMessage += event.word + " ";
  process.stdout.write(`\r🤖 ${currentMessage}`);
});
agent.on("bot.finished", () => console.log());

Hot-Reload: Live Configuration

Everything is hot-reloadable. Voice, language, STT, prompt, tools — all can change during an active call. The server applies changes on the next LLM turn.

Three Configuration Scopes

| Scope | Method | Affects | |-------|--------|---------| | Agent defaults | pc.agent("id", config) | All future calls | | Agent hot-reload | agent.configure(updates) | Updates defaults, future calls | | Session (mid-call) | call.configure(opts) | This call only | | Prompt (mid-call) | call.setPrompt(text) | This call's system prompt | | Template vars | call.setPromptVars(vars) | This call's {{var}} values | | Context | call.addContext(text) | Appended after prompt |

Prompt Template Variables

Define a prompt with {{placeholders}}. The server resolves them before each LLM request. Built-in variables: {{date}}, {{time}}.

const agent = pc.agent("support", {
  llm: {
    engine: "openai",
    model: "gpt-4.1-mini",
    enabled: true,
    prompt: `You are {{agent_name}}, support agent at {{company}}.
Today is {{date}}, {{time}}.
Customer: {{customer_name}} ({{tier}} tier).`,
  },
});

agent.on("call.started", async (call) => {
  const customer = await lookupCaller(call.from);
  await call.setPromptVars({
    agent_name: "Nova",
    company: "Acme Corp",
    customer_name: customer.name,
    tier: customer.tier,
  });
  call.say(`Hi ${customer.name}! How can I help?`);
});

Adding Context Mid-Call

Append dynamic context without replacing the prompt:

agent.on("call.started", async (call) => {
  const orders = await getRecentOrders(call.from);
  await call.addContext(
    `Recent orders:\n${orders.map(o => `- ${o.id}: ${o.status}`).join("\n")}`
  );
});

Switching Voice or Language Mid-Call

// User asks for Spanish
call.configure({ voice: "elevenlabs:spanishVoiceId", language: "es" });
call.reply("¡Claro! Ahora hablo en español.");

Configuration Shortcuts

Voice and STT accept string shortcuts or full config objects:

// Shortcuts
{ voice: "elevenlabs:voiceId" }
{ stt: "deepgram-flux" }
{ stt: "deepgram:nova-3:fr" }         // provider:model:language

// Full config objects
{
  voice: { engine: "cartesia", voiceId: "abc", speed: 1.1 },
  stt: { engine: "deepgram", model: "nova-3", language: "fr" },
}

Note: Turn detection and VAD are auto-derived from the STT provider. deepgram-flux → native turn detection + native VAD. All others → smart_turn + silero VAD.


REST API

Static helpers for the Pinecall management API. No WebSocket connection needed.

fetchVoices(opts?)

List available TTS voices. Filter by provider and language.

import { fetchVoices } from "@pinecall/sdk";

// All ElevenLabs voices
const voices = await fetchVoices();

// Spanish Cartesia voices only
const esVoices = await fetchVoices({ provider: "cartesia", language: "es" });

voices.forEach(v => console.log(`${v.name} (${v.provider}:${v.id})`));
// → "Rachel (elevenlabs:21m00Tcm4TlvDq8ikWAM)"

Returns: Voice[] — each voice has id, name, provider, gender, style, languages[], preview_url.

fetchPhones(opts)

List phone numbers on your Pinecall account.

import { fetchPhones } from "@pinecall/sdk";

const phones = await fetchPhones({ apiKey: "pk_..." });
phones.forEach(p => console.log(`${p.name} → ${p.number}`));
// → "(318) 633-0963 → +13186330963"

Returns: Phone[] — each phone has number (E.164), name, sid, isSdk.

createToken(opts)

Generate a short-lived, single-use token for browser WebRTC or Chat connections. Requires API key — call this from your backend.

import { createToken } from "@pinecall/sdk";

// From your backend endpoint (API key stays server-side)
const token = await createToken({
  channel: "webrtc",      // "webrtc" or "chat"
  agentId: "florencia",
  apiKey: process.env.PINECALL_API_KEY!,
});

// Or via instance methods:
const token = await pc.createToken("webrtc", "florencia");
const token = await agent.createToken("webrtc");

Returns: { token: string, server: string, expires_in: number }.

| Field | Type | Required | Description | |-------|------|----------|-------------| | channel | "webrtc" | "chat" | ✅ | Token type | | agentId | string | ✅ | Agent slug (wire ID) | | apiKey | string | ✅ | API key for authentication | | apiUrl | string | — | Custom server URL |

See Security for the full token security model.

fetchWebRTCToken(opts) (deprecated)

⚠️ Deprecated. Use createToken() instead. fetchWebRTCToken only works when the agent has allowedOrigins configured.

Legacy helper — fetches a token from the public endpoint (requires allowedOrigins on the agent).

import { fetchWebRTCToken } from "@pinecall/sdk";

const { token, server } = await fetchWebRTCToken({
  agentId: "my-agent",
  apiKey: "pk_...",  // optional: authenticates the request
});

Returns: { token: string, server?: string }.

fetchTwilioBalance(opts?)

Check your Twilio account balance.

import { fetchTwilioBalance } from "@pinecall/sdk";

const balance = await fetchTwilioBalance({ apiKey: "pk_..." });
if (balance) console.log(`$${balance.balance} ${balance.currency}`);

Returns: { balance: string, currency: string } | null.

Options

All REST helpers accept an apiUrl option to point to a custom server:

fetchVoices({ apiUrl: "http://localhost:1337" });
fetchPhones({ apiKey: "pk_...", apiUrl: "http://localhost:1337" });

SSE Streaming

Stream real-time agent events over HTTP using Server-Sent Events. Works with any framework — returns a Web API Response or writes to a Node.js ServerResponse.

WebRTC vs SSE: If your frontend uses @pinecall/voice-widget or @pinecall/voice-core, events already arrive through the WebRTC DataChannel — you don't need SSE. SSE is for server-side dashboards, monitoring UIs, or backends that need to observe calls without being in the WebRTC session.

Single Agent Stream

// Web API (Remix, Next.js, Hono, Bun)
app.get("/events", () => agent.stream());

// Express / Node.js
app.get("/events", (req, res) => agent.stream(res));

Multi-Agent Stream

Stream events from all agents via pc.stream(), or filter to specific ones:

// All agents
app.get("/events", () => pc.stream());

// Filtered to specific agents
app.get("/events", () => pc.stream({ agents: ["mara", "julia"] }));

// Express
app.get("/events", (req, res) => pc.stream(res));
app.get("/events", (req, res) => pc.stream(res, { agents: ["mara"] }));

Filtering — Multi-Tenant Example

The agents filter lets you build per-user dashboards where each user only sees their own agents:

// Each user owns specific agents
const userAgents = {
  "user_1": ["mara", "julia"],
  "user_2": ["nova", "receptionist"],
};

// User-scoped SSE endpoint
app.get("/api/events", (req, res) => {
  const userId = req.auth.userId;              // from your auth middleware
  const allowed = userAgents[userId] || [];

  // Only streams events from agents this user owns
  pc.stream(res, { agents: allowed });
});

The filter works by subscribing only to the specified agents' event emitters — events from other agents never reach the stream. This is purely server-side filtering, so there's no data leakage.

Browser A (user_1)                Browser B (user_2)
    │                                  │
    └── EventSource("/api/events") ──► SSE: mara, julia events only
                                       │
                                       └── EventSource("/api/events") ──► SSE: nova, receptionist only

Streamed Events

Each SSE message has an event: field and a JSON data: body with agent ID:

| Event | Data Fields | When | |-------|------------|------| | connected | agent or agents | Stream established | | call.started | callId, from, to, direction, transport | Call begins | | call.ended | callId, reason, duration | Call ends | | user.speaking | callId, text | Interim STT transcript | | user.message | callId, text, messageId | Final user text | | turn.end | callId, text, probability | User turn ended | | turn.pause | callId, probability | Turn pause detected | | speech.started | callId | User began speaking | | speech.ended | callId | User stopped speaking | | bot.speaking | callId, messageId, text | Bot started speaking | | bot.word | callId, messageId, word | Word-by-word playback | | bot.finished | callId, messageId | Bot done speaking | | bot.interrupted | callId, messageId | Bot cut off by user |

Wire format:

event: user.message
data: {"callId":"CA123","text":"Hello","messageId":"msg_abc","agent":"mara"}

event: bot.speaking
data: {"callId":"CA123","messageId":"msg_def","text":"Hi!","agent":"mara"}

A :ping comment is sent every 30s as keepalive.

Client Example

const source = new EventSource("/api/events");

source.addEventListener("call.started", (e) => {
  const { agent, from, transport } = JSON.parse(e.data);
  console.log(`📞 [${agent}] Call from ${from} via ${transport}`);
});

source.addEventListener("user.message", (e) => {
  const { agent, text } = JSON.parse(e.data);
  console.log(`[${agent}] User: ${text}`);
});

source.addEventListener("bot.speaking", (e) => {
  const { agent, text } = JSON.parse(e.data);
  console.log(`[${agent}] Bot: ${text}`);
});

WhatsApp

WhatsApp is a text-based channel — no STT/TTS/VAD pipeline. Messages route directly to the server-side LLM. The agent receives text, generates a response, and sends it back as a WhatsApp message.

Requires server-side LLM. WhatsApp channels use the same llm config as voice channels. Client-side LLM (bring your own) is not supported for WhatsApp.

WhatsApp Setup

  1. Create a Meta Business App at developers.facebook.com
  2. Add the WhatsApp product to your app
  3. Get your credentials from the API Setup page:
    • Phone Number ID — numeric string (e.g. 123456789012345)
    • Permanent Access Token — generate a system user token with whatsapp_business_messaging permission
    • App Secret — from App Settings → Basic (for webhook signature verification)
  4. Configure the webhook URL in your Meta app:
    https://voice.pinecall.io/whatsapp/webhook
    Verification token: set to match your verifyToken (default: pinecall-wa-verify)
  5. Subscribe to messages — check messages in the webhook fields

WhatsApp Usage

import { Pinecall } from "@pinecall/sdk";

const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const agent = pc.agent("support", {
  language: "en",
  llm: {
    engine: "openai",
    model: "gpt-4.1-mini",
    enabled: true,
    prompt: "You are a helpful support agent on WhatsApp. Be concise.",
  },
  tools: [
    {
      type: "function",
      function: {
        name: "lookupOrder",
        description: "Look up an order by ID",
        parameters: {
          type: "object",
          properties: { orderId: { type: "string" } },
          required: ["orderId"],
        },
      },
    },
  ],
});

// Register WhatsApp channel
agent.addChannel("whatsapp", {
  phoneNumberId: "123456789012345",      // From Meta API Setup
  accessToken: process.env.WA_TOKEN!,    // Permanent Graph API token
  verifyToken: "my-verify-token",        // Must match Meta webhook config
  appSecret: process.env.WA_APP_SECRET!, // HMAC verification (recommended)
});

// Also register voice channels on the same agent
agent.addChannel("phone", "+13186330963");
agent.addChannel("webrtc");

// Voice greeting (WhatsApp doesn't use this)
agent.on("call.started", (call) => call.say("Hello!"));

// WhatsApp events
agent.on("whatsapp.session_started", (event) => {
  console.log(`💬 New WhatsApp chat: ${event.contact_name} (${event.contact_phone})`);
});

agent.on("whatsapp.message", (event) => {
  console.log(`📩 ${event.name}: ${event.text}`);
});

agent.on("whatsapp.status", (event) => {
  console.log(`✓ ${event.status} → ${event.recipient}`);
});

// Handle tool calls (works for both voice AND WhatsApp)
agent.on("llm.tool_call", async (call, data) => {
  const results = [];
  for (const tc of data.tool_calls) {
    const args = JSON.parse(tc.arguments);
    const result = await myToolHandler(tc.name, args);
    results.push({ tool_call_id: tc.id, result });
  }
  agent.send({ event: "llm.tool_result", call_id: call.id, msg_id: data.msg_id, results });
});

Multi-channel agent: The same agent can handle voice calls AND WhatsApp messages simultaneously. The LLM config, tools, and prompt are shared — only the transport differs.

WhatsApp Events

| Event | Data Fields | When | |-------|------------|------| | whatsapp.session_started | session_id, contact_phone, contact_name | First message from a new contact | | whatsapp.message | session_id, from, name, type, text, message_id | Incoming message received | | whatsapp.response | session_id, to, text | Agent sent a response | | whatsapp.status | status, recipient, message_id | Delivery status update |

Status values: sentdeliveredread

WhatsAppChannelConfig

import type { WhatsAppChannelConfig } from "@pinecall/sdk";

| Field | Type | Required | Description | |-------|------|----------|-------------| | phoneNumberId | string | ✅ | Meta Phone Number ID from API Setup | | accessToken | string | ✅ | Permanent Graph API access token | | verifyToken | string | — | Webhook verification token (default: pinecall-wa-verify) | | appSecret | string | — | Meta App Secret for HMAC signature verification |

Voice Notes

When a user sends a voice note on WhatsApp, the server automatically:

  1. Downloads the audio (OGG/Opus format) via the Cloud API
  2. Transcribes it using Deepgram Nova-3
  3. Feeds the transcript to the LLM as text

The agent sees voice notes as regular text messages — no special handling needed.

Requires DEEPGRAM_API_KEY environment variable on the voice server.

24h Service Window

Meta enforces a 24-hour service window for free-form messaging:

  • Inside window: The agent can send any text message. Window refreshes on each inbound message.
  • Outside window: Only pre-approved template messages can be sent.

The SDK tracks this automatically. If the window is closed, the server logs a warning. Template message support is planned for a future release.

Environment Variables

Set these on the voice server (sdk-server):

| Variable | Required | Description | |----------|----------|-------------| | WHATSAPP_VERIFY_TOKEN | No | Hub verification token (default: pinecall-wa-verify) | | WHATSAPP_APP_SECRET | No | Meta App Secret for webhook HMAC verification | | DEEPGRAM_API_KEY | For voice notes | Required if you want audio message transcription |


Configuration Reference

STT Providers

Deepgram Flux (recommended)

Best for real-time voice agents. Turn detection and VAD are auto-derived — no configuration needed.

stt: {
  provider: "deepgram-flux",
  keyterms: ["pinecall"],      // boost recognition for specific terms
  eot_threshold: 0.5,          // end-of-turn sensitivity (0-1)
  eager_eot_threshold: 0.7,    // eager turn threshold
  eot_timeout_ms: 2000,
}

// Shortcut: "deepgram-flux"

Auto-derived: Flux → native turn detection + native VAD. No need to specify turnDetection.

Deepgram Nova

Classic STT — turn detection and VAD auto-derived (smart_turn + silero).

stt: {
  provider: "deepgram",
  model: "nova-3",
  language: "en",
  interim_results: true,
  smart_format: true,
  punctuate: true,
  profanity_filter: false,
  endpointing_ms: 300,
  utterance_end_ms: 1000,
  keywords: ["pinecall"],
}

// Shortcut: "deepgram" or "deepgram:nova-3" or "deepgram:nova-3:es"

Gladia

stt: {
  provider: "gladia",
  model: "accurate",
  language: "en",
  endpointing: 300,
  speech_threshold: 0.8,
  code_switching: false,
  audio_enhancer: true,
}

// Shortcut: "gladia"

AWS Transcribe

stt: { provider: "transcribe", language: "en-US" }

// Shortcut: "transcribe"

TTS Providers

ElevenLabs

voice: {
  provider: "elevenlabs",
  voice_id: "JBFqnCBsd6RMkjVDRZzb",
  model: "eleven_turbo_v2_5",
  speed: 1.0,
  stability: 0.5,
  similarity_boost: 0.75,
  style: 0,
  use_speaker_boost: true,
}

// Shortcut: "elevenlabs:JBFqnCBsd6RMkjVDRZzb"

Cartesia

voice: {
  provider: "cartesia",
  voice_id: "a0e99841-438c-4a64-b679-ae501e7d6091",
  model: "sonic",
  speed: 1.0,
  volume: 1.0,
  emotion: null,
  language: "en",
}

// Shortcut: "cartesia:a0e99841-438c-4a64-b679-ae501e7d6091"

AWS Polly

voice: {
  provider: "polly",
  voice_id: "Joanna",
  engine: "neural",
  language: "en-US",
}

// Shortcut: "polly:Joanna"

LLM Providers

OpenAI

llm: {
  engine: "openai",
  model: "gpt-4.1-mini",     // or "gpt-4.1", "gpt-4.1-nano"
  enabled: true,
  prompt: "System prompt here.",
  temperature: 0.7,
  max_tokens: 1024,
}

Mistral

llm: {
  engine: "mistral",
  model: "mistral-medium",
  enabled: true,
  prompt: "System prompt here.",
}

LLM shortcut: llm: "openai:gpt-4.1-mini" expands to { engine: "openai", model: "gpt-4.1-mini", enabled: true }.


Session Limits

Calls have built-in safety limits to prevent runaway sessions. The server enforces these defaults:

| Setting | Default | Description | |---------|---------|-------------| | max_duration_seconds | 600 (10 min) | Hard cap on total call length. Call is terminated after this time regardless of activity. | | idle_timeout_seconds | 60 | Auto-hangup after this many seconds of no user speech. | | idle_warning_seconds | 15 | Emit session.idle_warning event this many seconds before idle timeout. Use it to prompt the user or change the UI. 0 = no warning. | | idle_grace_seconds | 10 | After idle timeout fires, the agent gets this many seconds to prompt the user before force-hangup. |

Override per-agent:

const agent = pc.agent("receptionist", {
  voice: "elevenlabs:abc",
  stt: "deepgram-flux",
  llm: { engine: "openai", model: "gpt-4.1-mini", enabled: true, prompt: "..." },
  session_limits: {
    max_duration_seconds: 1800,  // 30 minutes
    idle_timeout_seconds: 120,   // 2 minutes of silence
    idle_warning_seconds: 30,    // warn 30s before timeout
    idle_grace_seconds: 15,
  },
});

Disable limits (not recommended):

session_limits: {
  max_duration_seconds: 0,  // 0 = unlimited
  idle_timeout_seconds: 0,  // 0 = disabled
}

How it works:

  1. The server starts two watchdog tasks when a call begins.
  2. _watchdog_max_duration fires after max_duration_seconds — emits session.timeout then hangs up.
  3. _watchdog_idle tracks _last_user_activity. When the user hasn't spoken for idle_timeout_seconds, it emits session.timeout with a grace period.
  4. The session.timeout event fires before the actual hangup, giving you a chance to warn the user:
agent.on("session.idle_warning", (event, call) => {
  // event.remaining_seconds: seconds until timeout
  // event.idle_timeout_seconds: the configured idle timeout
  call.say("Are you still there?");
});

agent.on("session.timeout", (event, call) => {
  // event.reason: "max_duration" | "idle_timeout"
  call.say("Goodbye! The call is ending due to inactivity.");
});

Timeline:

[silence starts] ──── idle_warning fires ──── idle_timeout fires ──── hangup
     0s              (timeout - warning)s         timeout s

Note: Bot speech (e.g. "Are you still there?") pauses the idle counter but does not reset it. Only real user speech resets the timer. This prevents infinite warning loops.

WebRTC widget integration: The @pinecall/voice-widget automatically responds to session.idle_warning by switching the orb to a blinking amber state (.idle-warning CSS class, configurable via colorWarning theme prop). On session.timeout, the widget auto-disconnects.


Interruption

Controls whether users can interrupt the bot mid-speech.

interruption: {
  enabled: true,
  energy_threshold_db: -40,   // min energy to trigger interrupt
  min_duration_ms: 200,       // min speech duration to trigger
}

// Shortcut: false (disables interruption entirely)

Analysis & Audio Metrics

Real-time audio metrics for waveform visualization and energy monitoring.

config: {
  analysis: {
    send_audio_metrics: true,
    audio_metrics_interval_ms: 100,
    send_turn_audio: false,
    send_bot_audio: false,
  }
}

audio.metrics Event

Emitted per interval — one for user (mic) and one for bot (TTS):

agent.on("audio.metrics", (evt, call) => {
  // evt.source: "user" | "bot"
  // evt.energy_db: -60 to 0 (higher = louder)
  // evt.rms: 0 to 1 (normalized amplitude)
  // evt.peak: 0 to 1
  // evt.is_speech: boolean (VAD state)
  // evt.vad_prob: 0 to 1
});

| Field | Type | Description | |-------|------|-------------| | source | "user" | "bot" | Audio source | | energy_db | number | Energy in decibels (-60 to 0) | | rms | number | Root mean square amplitude (0–1) | | peak | number | Peak amplitude (0–1) | | is_speech | boolean | VAD speech detection state | | vad_prob | number | VAD probability (0–1) |


Multi-Environment

Run dev, staging, and production agents simultaneously on the same voice server, sharing the same phone numbers. No extra Twilio costs. Each developer gets their own isolated agent instance.

How It Works

The SDK reads PINECALL_MODE from the environment and prefixes agent IDs automatically:

| PINECALL_MODE | Wire slug | Notes | |-----------------|-----------|-------| | (empty/unset) | florencia | Production — all callers | | dev | dev-berna-florencia | Dev — includes developer ID for isolation | | staging | staging-florencia | Staging — shared environment, no dev ID |

The server routes phone calls based on the caller's phone number:

            Incoming call to +13186330963
                       │
              ┌────────┴────────┐
              │                 │
         Caller in          Caller NOT in
         DEV_CALLERS        DEV_CALLERS
              │                 │
    ┌─────────┴─────────┐  ┌───┴───┐
    │  dev-berna-        │  │       │
    │  florencia         │  │ florencia │
    │  (your dev agent)  │  │ (prod)    │
    └───────────────────┘  └───────┘

Dev and prod coexist on the same phone number. The server's caller-based routing handles the split.

Setup

Set PINECALL_MODE before importing @pinecall/sdk. The SDK reads it at initialization time.

// agent/index.js — set mode before SDK import
const ENV = process.env.NODE_ENV || "production";
if (ENV === "development") process.env.PINECALL_MODE = "dev";
else if (ENV === "staging") process.env.PINECALL_MODE = "staging";

import { Pinecall } from "@pinecall/sdk";

const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY });
await pc.connect();

const agent = pc.deploy("florencia", { /* config */ });
// In dev: registers as "dev-berna-florencia"
// In prod: registers as "florencia"

// Configure caller-based routing for dev/staging
if (pc.mode) {
  const callers = process.env.DEV_CALLERS;
  if (callers) {
    agent.send({
      event: "dev.config",
      callers: callers.split(",").map(s => s.trim()),
    });
  }
}

Each developer creates a .env.local file (gitignored) with their personal config:

# .env.local — each developer sets their own
PINECALL_DEV_ID=berna
DEV_CALLERS=+34607827824

Multi-Developer Isolation

In dev mode, the SDK includes a developer identity in the agent slug to prevent collisions:

dev-{PINECALL_DEV_ID}-{agentName}

The developer ID is resolved in order:

  1. PINECALL_DEV_ID environment variable
  2. OS username (automatic fallback)

This means multiple developers can run the same agent simultaneously without interfering:

| Developer | .env.local | Wire Slug | Phone Routing | |-----------|-------------|-----------|---------------| | Berna | PINECALL_DEV_ID=berna | dev-berna-florencia | Calls from +34607... → Berna's agent | | Juan | PINECALL_DEV_ID=juan | dev-juan-florencia | Calls from +34612... → Juan's agent | | Production | (none) | florencia | All other callers |

Phone Routing

The voice server supports caller-based routing for non-production agents:

  1. Production agent registers +13186330963 → stored in the main phone map
  2. Dev agent registers the same number → stored in the dev override map
  3. On incoming call:
    • If the caller is in _dev_allowed_callers → routes to the dev agent
    • Otherwise → routes to the production agent

To set your dev callers, send a dev.config event after connecting:

if (pc.mode) {
  agent.send({
    event: "dev.config",
    callers: ["+34607827824"],  // your phone number(s)
  });
}

Multi-Developer Strategies

When multiple developers work on the same agent, there are two approaches for phone testing:

Option A: Shared number + caller override (recommended)

All developers share the same Twilio number. Each developer configures their personal phone number in DEV_CALLERS. The server routes based on who's calling:

+13186330963 (shared Twilio number)
    │
    ├── Call from +34607... → dev-berna-florencia
    ├── Call from +34612... → dev-juan-florencia
    ├── Call from +34699... → dev-flor-florencia
    └── Call from anyone else → florencia (production)
# Berna's .env.local
PINECALL_DEV_ID=berna
DEV_CALLERS=+34607827824

# Juan's .env.local
PINECALL_DEV_ID=juan
DEV_CALLERS=+34612345678

# Flor's .env.local
PINECALL_DEV_ID=flor
DEV_CALLERS=+34699887766

Zero extra Twilio cost. One number serves all environments simultaneously.

Option B: Dedicated number per developer

Each developer uses their own Twilio number. No caller override needed — all calls to that number go to the dev agent:

// Berna uses a dedicated dev number
agent.addChannel("phone", "+18005551001");  // Berna's dev number

// Production uses the main number
agent.addChannel("phone", "+13186330963");

Simpler routing, but requires extra Twilio numbers ($1/month each).

Comparison:

| | Shared + Override | Dedicated Numbers | |---|---|---| | Cost | No extra | $1/month per dev | | Setup | DEV_CALLERS in .env.local | Separate Twilio number per dev | | Routing | Caller-based | Number-based | | External callers | Can't reach dev agent | Can reach dev agent | | Best for | Internal testing | External/client testing |

WhatsApp Dev Routing

WhatsApp uses the same sender-based routing pattern as phone calls. Multiple developers can share the same WhatsApp Business number, with messages routed to dev agents based on the sender's phone number.

Meta WhatsApp Business Number (phone_number_id: 123456)
    │
    ├── Message from +34607... → dev-berna-florencia
    ├── Message from +34612... → dev-juan-florencia
    └── Message from anyone else → florencia (production)

The dev.config event configures both phone and WhatsApp routing in one call:

if (pc.mode) {
  agent.send({
    event: "dev.config",
    callers: ["+34607827824"],  // routes BOTH phone calls AND WhatsApp messages
  });
}

Same DEV_CALLERS, both channels. When your phone number sends a WhatsApp message to the business number, it routes to your dev agent. When your phone number calls the Twilio number, it also routes to your dev agent. One config, all channels.

Alternatively, each developer can register a separate Meta test number (from the Meta API console), avoiding the need for caller-based routing on WhatsApp.

WebRTC & Chat Dev Routing

WebRTC and Chat channels don't need caller-based routing — they use slug-based isolation automatically:

// Dev mode → agent registers as "dev-berna-florencia"
// The browser requests a token for "dev-berna-florencia" specifically
const { token } = await fetchWebRTCToken({ agentId: "dev-berna-florencia" });

Each developer gets their own slug, their own tokens, their own sessions. Multiple developers can test simultaneously without interference.

Any web app can connect. WebRTC and Chat connections go directly to voice.pinecall.io via DataChannel (audio) or WebSocket (text). The browser never needs access to the agent process. This means any number of web apps, mobile apps, or third-party integrations can connect to the same agent using tokens — without the developer exposing SSE endpoints, webhook URLs, or the agent's Node.js process. The voice server is the relay.

Staging

Staging uses a simple prefix without developer ID — it's a shared environment:

NODE_ENV=staging node agent/index.js
# → Agent slug: "staging-florencia"

Staging agents use the same caller-based override map. Useful for pre-production testing on a staging server.

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | PINECALL_MODE | "" | "dev", "staging", or empty for production | | PINECALL_DEV_ID | OS username | Developer identity for slug isolation | | DEV_CALLERS | — | Comma-separated phone numbers for caller-based routing |

Vite Integration

When using Vite as your dev server, agents can be embedded in the same process via a plugin:

// vite-agent-plugin.mjs
export default function agentPlugin() {
  return {
    name: "my-agent",
    async configureServer() {
      const { startAgent } = await import("./agent/index.js");
      await startAgent();
    },
  };
}
// vite.config.js
import agentPlugin from "./vite-agent-plugin.mjs";

export default defineConfig({
  plugins: [react(), agentPlugin()],
});

npm run dev starts both the web server and the voice agent in a single process. Vite sets NODE_ENV=development automatically, so the agent runs in dev mode with no extra configuration.

npm run dev
  🟢 SDK connected
  🔧 DEV mode [berna] — calls from +34607827824 → dev-berna-florencia
  🌸 Florencia agent ready (Phone + WebRTC + WhatsApp) [dev]
  ➜  Local: http://localhost:5173/

Public API

const pc = new Pinecall({ apiKey: "pk_..." });

pc.mode;     // "dev" | "staging" | ""  — current environment mode
pc.devMode;  // true if mode === "dev"  — backward-compatible getter
pc.devId;    // "berna" — developer identity for slug isolation

Deployment Topologies

Pinecall uses two fundamentally different communication patterns. Understanding this distinction is key to choosing the right deployment topology.

Observe vs Interact

There are three communication patterns in Pinecall. Which one you use depends on the channel and your use case.

1. Phone calls (inbound + outbound) — Backend only, EventEmitter

Phone calls are inherently backend-side. Registering an agent with pc.agent() requires a PINECALL_API_KEY — this must never be exposed in frontend code. The agent runs in your Node.js process and receives all call events via the SDK's WebSocket → in-memory EventEmitter.

         Twilio ──► voice.pinecall.io ──► SDK WebSocket ──► Your Node.js
                                                               │
                                                          EventEmitter
                                                      agent.on("call.started")
                                                      agent.on("user.message")
                                                      agent.on("llm.tool_call")

There is no browser involvement. The entire call lifecycle (STT → LLM → TTS → tool calls) happens server-side. If your agent is phone-only, your architecture is simple: a single Node.js process with the SDK.

2. Browser interaction (WebRTC / Chat) — Direct to voice server

When users interact from a web app (voice widget, chatbox), the browser connects directly to voice.pinecall.io — it never touches your backend:

Browser ──► GET  /webrtc/token?agent_id=mara   (public, no API key)
        ──► POST /webrtc/offer  { sdp, token }  → audio via DataChannel

Browser ──► GET  /chat/token?agent_id=mara     (public, no API key)
        ──► WS   /chat/ws?token=cht_xxx        → text via WebSocket

The token endpoints are public because they only verify that the agent is online — no secrets are exchanged. The browser gets a short-lived signed token, then opens a direct connection to the voice server. Your agent process can run anywhere.

🔒 Origin restriction (recommended): By default, any website can request a token for your agent. To restrict which domains can embed your voice widget or chatbox, configure allowedOrigins:

const agent = pc.agent("mara", {
  allowedOrigins: ["https://yourdomain.com", "http://localhost:*"],
  // ...config
});

When set, the server validates the Origin header and rejects requests from unlisted domains. For maximum security (mobile apps, multi-tenant platforms), proxy token requests through your own backend with API key authentication.

3. SSE — Observe events for dashboards and panels

SSE is for observing agent events from a web frontend — call center panels, admin dashboards, monitoring UIs. It requires the agent to run in the same Node.js process as your web server (embedded topology):

Browser ←── SSE ←── Your Express/Remix ←── agent.stream() ←── EventEmitter

This is how you build a call center panel without exposing API keys:

// Your backend — agent + SSE in the same process
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const agent = pc.agent("support", { /* config */ });
agent.addChannel("phone", "+13186330963");

// SSE endpoint — filter by user role, no API key to the browser
app.get("/api/events", (req, res) => {
  const userId = req.auth.userId;
  const allowed = getUserAgents(userId);  // your auth logic
  pc.stream(res, { agents: allowed });    // only their agents
});

The browser sees real-time call events (who's calling, transcripts, tool calls) but has zero access to the API key or agent internals. You control exactly which events reach which user.

Summary:

| Channel | Who initiates | Where it runs | How events flow | API key exposed? | |---------|--------------|---------------|----------------|-----------------| | Phone (inbound) | Twilio | Backend only | EventEmitter → SDK WebSocket | ❌ Server-side only | | Phone (outbound) | agent.dial() | Backend only | EventEmitter → SDK WebSocket | ❌ Server-side only | | WebRTC | Browser user | Browser → voice server | DataChannel (direct) | ❌ Token-based | | Chat | Browser user | Browser → voice server | WebSocket (direct) | ❌ Token-based | | WhatsApp | Meta webhook | voice server | SDK WebSocket → EventEmitter | ❌ Server-side only | | SSE | Browser (observe) | Your backend → browser | EventEmitter → agent.stream() | ❌ Your auth controls access |

Key insight: API keys never leave your backend. Phone calls and tool execution happen server-side. Browser users connect via tokens. SSE lets you build dashboards with your own auth layer on top.


With this in mind, your agent can run embedded inside your web server or as a standalone process:

Embedded Agent (same process)

The agent runs inside your web server (Express, Remix, Hono, etc.) or via a Vite plugin. Both the web app and the agent share the same Node.js process.

┌──────────────────────────────────────┐
│           Your Node process          │
│                                      │
│  ┌──────────┐     ┌──────────────┐   │
│  │ Web App  │     │ Agent (SDK)  │   │
│  │ Express  │◄────│ pc.agent()   │   │
│  │ /api/*   │     │ event bus    │   │
│  └──────────┘     └──────┬───────┘   │
│                          │           │
│    SSE ✅               WS          │
│    agent.stream()        │           │
│    pc.stream()           ▼           │
│                   voice.pinecall.io  │
└──────────────────────────────────────┘

What works:

  • SSE Streamingagent.stream() and pc.stream() pipe events directly from the in-memory EventEmitter
  • REST endpointsreq.app.agent or module-level reference
  • Hot-reload — file watchers, Vite HMR
  • Single npm run dev — Vite plugin boots the agent automatically

Example (Vite plugin — recommended for dev):

// vite-agent-plugin.mjs
export default function agentPlugin() {
  return {
    name: "my-agent",
    async configureServer() {
      const { startAgent } = await import("./agent/index.js");
      await startAgent();
    },
  };
}

Example (Express):

import express from "express";
import { Pinecall } from "@pinecall/sdk";

const app = express();
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const agent = pc.agent("receptionist", { /* config */ });
agent.addChannel("phone", "+13186330963");
agent.addChannel("webrtc");
agent.addChannel("chat");

// SSE endpoint — works because agent is in the same process
app.get("/api/events", (req, res) => agent.stream(res));

// Custom API that reads agent state
app.get("/api/calls", (req, res) => {
  res.json({ activeCalls: agent.calls.size });
});

app.listen(3000);

Standalone Agent (separate process)

The agent runs as its own Node process, alongside a separate web server. Both connect to voice.pinecall.io independently.

┌──────────────┐     ┌──────────────────┐
│  Web App     │     │  Agent Process   │
│  (Next.js,   │     │  node agent.js   │
│  Remix, etc) │     │  pc.agent()      │
│              │     │                  │
│  SSE ❌      │     │  WS ────────►    │
│  No agent    │     │  voice.pinecall  │
│  reference   │     │  .io             │
└──────────────┘     └──────────────────┘
        │                     │
        │    ┌────────────────┘
        ▼    ▼
   voice.pinecall.io

Browser users (WebRTC, chat) connect directly to the voice server via tokens — they don't care where the agent process lives. SSE is the only thing that breaks because it needs in-process access to the EventEmitter.

Headless Agent (no web server)

The agent doesn't need a web server at all. Many agents are pure phone/SIP agents — they answer calls, run tools, and hang up. No frontend, no API, no UI. Just a Node process running 24/7.

┌─────────────────────────┐
│  node agent.js          │
│                         │
│  pc.agent("julia")      │
│  addChannel("phone")    │
│  addChannel("sip:...")  │
│                         │
│  WS ────────────────►   │
│  voice.pinecall.io      │
└─────────────────────────┘
       That's it.
// agent.js — a complete production agent, no web server needed
import { Pinecall } from "@pinecall/sdk";
import { openDoor, identifyVisitor } from "./tools.js";

const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const julia = pc.deploy("julia", {
  prompt: "You are Julia, the intercom concierge...",
  model: "gpt-4.1-mini",
  voice: "elevenlabs:abc",
  language: "es",
  channels: ["phone:+13186330963", "sip:[email protected]"],
  tools: [openDoor, identifyVisitor],
});

julia.on("call.started", (call) => call.say("¿Quién es?"));

julia.on("llm.tool_call", async (call, data) => {
  // Tools run locally — no webhooks, no exposed APIs
  for (const tc of data.tool_calls) {
    const result = await handleTool(tc.name, JSON.parse(tc.arguments));
    julia.send({ event: "llm.tool_result", call_id: call.id, msg_id: data.msg_id, results: [{ tool_call_id: tc.id, result }] });
  }
});

console.log("Julia is live. Ctrl+C to stop.");
// Runs forever — PM2, Docker, systemd, whatever.

This is the simplest possible deployment. Deploy it with PM2, Docker, systemd — it connects to the voice server and waits for calls. The tool handlers (openDoor, identifyVisitor) call your internal APIs, databases, or hardware directly from the same process. No webhook URLs, no public endpoints, no attack surface.

Comparison

| Feature | Embedded | Standalone | Headless | |---------|----------|------------|----------| | Web server | ✅ Same process | Separate process | ❌ None | | SSE (agent.stream()) | ✅ Works | ❌ Not available | ❌ N/A | | WebRTC (browser voice) | ✅ Via DataChannel | ✅ Via DataChannel | ✅ Via DataChannel | | Chat (browser text) | ✅ Via /chat/ws | ✅ Via /chat/ws | ✅ Via /chat/ws | | Phone / SIP | ✅ | ✅ | ✅ | | WhatsApp | ✅ | ✅ | ✅ | | Tool calls | ✅ In-process | ✅ In-process | ✅ In-process | | Agent state in web API | ✅ Direct reference | ❌ No shared memory | ❌ N/A | | Complexity | Medium | Medium | Lowest | | Best for | Dev + dashboards | Web app + agent | Phone/SIP agents |

Recommendation:

  • Embedded for development (Vite plugin) and apps that need SSE dashboards
  • Standalone for production web apps where the agent and web server scale independently
  • Headless for phone/SIP agents, IoT, background services — anything without a UI

Philosophy

Pinecall SDK is designed around one idea: any existing app can add a voice agent without changing its architecture.

Traditional voice AI platforms (Vapi, Retell, Bland) are platform-first — you configure agents in their dashboard, define tools as JSON schemas, and expose webhook URLs for the platform to call. Your app adapts to the platform.

Pinecall is code-first — the agent is your code. It runs inside your app, uses your database, calls your internal APIs, and handles tool calls locally. The platform adapts to your app.

Platform-first (Vapi):
  Your App ──webhook──► Vapi Dashboard ──POST──► Your Webhook URL
                         (config UI)              (exposed endpoint)

Code-first (Pinecall):
  ┌─── Your App ──────────────────────┐
  │  your code + pc.agent() + tools   │──WS──► voice.pinecall.io
  │  everything runs here             │        (audio pipeline only)
  └───────────────────────────────────┘

This matters because:

  • Existing chatbots (Langchain, LlamaIndex, custom LLM pipelines) can become voice agents by hooking into turn.end and streaming to call.replyStream(). No rewrite needed.
  • Tool calls are local functions, not webhook URLs. Your agent can call db.query(), redis.get(), hardware.openDoor() — anything your process can reach. No exposed endpoints, no public API surface.
  • Multi-channel is native. The same agent instance handles phone calls, SIP intercoms, WebRTC voice widgets, text chat, and WhatsApp. One codebase, all channels.
  • No vendor lock-in on the LLM. Use server-side LLM (we run it) or bring your own (OpenAI, Anthropic, local Ollama). Switch mid-call if you want.

The voice server (voice.pinecall.io) handles the hard real-time parts — audio transport, STT, TTS, VAD, turn detection. Your code handles everything else — business logic, tools, prompts, history, state. Each side does what it's good at.


Security

Token Security Model

Browser connections (WebRTC and Chat) use short-lived tokens generated by the voice server. The recommended model: your backend generates tokens using your API key, and distributes them to browsers through your own auth layer.

This is the same model used by LiveKit, Twilio, Daily.co, and every major real-time platform.

Browser → Your Backend (your auth: session, JWT, OAuth)
              ↓
         pc.createToken("webrtc", "florencia")
              ↓  (API key in Authorization header)
         voice.pinecall.io → { token, server, expires_in }
              ↓
         Your Backend returns token to browser
              ↓
         Browser connects to voice.pinecall.io with token

Backend (Express, Next.js, Hono, etc.):

import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();

const agent = pc.agent("florencia", { /* config */ });

// Token endpoint — protected by YOUR auth
app.get("/api/token", authMiddleware, async (req, res) => {
  const channel = req.query.channel as "webrtc" | "chat";
  const token = await agent.createToken(channel);
  res.json(token);
});

// Or if agent is in a separate process:
app.get("/api/token", authMiddleware, async (req, res) => {
  const token = await pc.createToken("webrtc", "florencia");
  res.json(token);
});

Frontend (VoiceWidget):

<VoiceWidget
  agent="florencia"
  tokenProvider={async () => {
    const res = await fetch("/api/token?channel=webrtc", {
      credentials: "include",  // send your session cookie
    });
    return res.json();
  }}
/>

Why Tokens Are Safe

Tokens have three security properties that make them safe to pass to browsers:

| Property | Value | Effect | |----------|-------|--------| | Single-use | Consumed on first connection | Can't be reused by an attacker | | Short-lived | 60 second TTL | Expires before anyone can steal it | | Scoped | Locked to agent + org | Can't be used for a different agent |

The token is not the security boundary — your backend is. The token is a short-lived capability that proves "someone authorized gave me permission to connect." The security question is: who can call your /api/token endpoint?

  • Requires login → only authenticated users get tokens
  • Rate limited → can't bulk-generate tokens
  • Permission-checked