@pinecall/sdk
v0.1.4
Published
Pinecall Voice SDK — connect to agents, handle calls, stream speech in real-time.
Maintainers
Readme
Table of Contents
- Install
- Quick Start
- API Reference
- Events
- Hot-Reload
- Configuration Shortcuts
- REST API
- SSE Streaming
- Configuration Reference
- Multi-Environment
- Philosophy
- Security
Install
npm install @pinecall/sdkNode.js ≥ 18 required. Only runtime dependency:
ws.
Quick Start
Server-side LLM (recommended)
The Pinecall server runs the LLM and handles STT/TTS. You configure the agent and handle tool calls locally.
import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const agent = pc.agent("receptionist", {
voice: "elevenlabs:h2cd3gvcqTp3m65Dysk7",
language: "es",
stt: "deepgram-flux",
llm: {
engine: "openai",
model: "gpt-4.1-mini",
enabled: true,
prompt: "You are a helpful receptionist. Be concise.",
},
tools: [
{
type: "function",
function: {
name: "lookupOrder",
description: "Look up an order by ID",
parameters: {
type: "object",
properties: {
orderId: { type: "string", description: "The order ID" },
},
required: ["orderId"],
},
},
},
],
});
agent.addChannel("phone", "+18045551234");
agent.addChannel("phone", "sip:[email protected]");
agent.addChannel("webrtc");
// Per-channel overrides: different voice/language per number
agent.addChannel("phone", "+34911234567", {
voice: "elevenlabs:spanishVoiceId",
language: "es",
stt: "deepgram-flux",
});
// Greet on call start
agent.on("call.started", (call) => {
if (call.direction === "inbound") {
call.say("Hello! How can I help you today?");
}
});
// Handle tool calls from the server-side LLM
agent.on("llm.tool_call", async (call, data) => {
if (!data.tool_calls) return; // skip re-emissions
const results = [];
for (const tc of data.tool_calls) {
const args = JSON.parse(tc.arguments);
const result = await myToolHandler(tc.name, args);
results.push({ tool_call_id: tc.id, result });
}
agent.send({
event: "llm.tool_result",
call_id: call.id,
msg_id: data.msg_id,
results,
});
});
agent.on("call.ended", (call, reason) => {
console.log(`Call ended: ${reason} (${call.duration}s)`);
});Client-side LLM (bring your own)
You run the LLM yourself. The server handles STT → text and text → TTS.
import { Pinecall } from "@pinecall/sdk";
import OpenAI from "openai";
const pc = new Pinecall({ apiKey: "pk_..." });
await pc.connect();
const openai = new OpenAI();
const agent = pc.agent("my-bot", { voice: "cartesia:abc", language: "en" });
agent.addChannel("phone", "+13186330963");
agent.on("call.started", (call) => call.say("Hi there!"));
agent.on("turn.end", async (turn, call) => {
const stream = call.replyStream(turn);
const completion = await openai.chat.completions.create({
model: "gpt-4.1-mini",
messages: [
{ role: "system", content: "You are helpful. Be concise." },
{ role: "user", content: turn.text },
],
stream: true,
});
for await (const chunk of completion) {
if (stream.aborted) break;
const token = chunk.choices[0]?.delta?.content;
if (token) stream.write(token);
}
stream.end();
});Deploy (one-liner)
The fastest way to get an agent running. pc.deploy() combines agent creation, LLM config, and channel registration in a single call:
import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const mara = pc.deploy("mara", {
prompt: "You are Mara, a friendly voice assistant. Be concise.",
model: "gpt-4.1-mini",
voice: "elevenlabs:EXAVITQu4vr4xnSDxMaL",
language: "es",
channels: ["webrtc", "+13186330963"],
});
mara.on("call.started", (call) => {
console.log(`📞 Call from ${call.from}`);
});
mara.on("call.ended", (call, reason) => {
console.log(`Call ended: ${reason} (${call.duration}s)`);
});DeployConfig fields:
| Field | Type | Description |
|-------|------|-------------|
| prompt | string | System prompt for the LLM |
| model | string | LLM model (default: gpt-4.1-mini) |
| voice | string | TTS voice shortcut (e.g. elevenlabs:voiceId) |
| language | string | BCP-47 language code |
| stt | string | STT provider (default: deepgram-flux) |
| tools | array | OpenAI function-calling tool definitions |
| channels | array | "webrtc", "mic", "chat", "whatsapp", or phone numbers |
| phones | string[] | Phone numbers (legacy, prefer channels) |
deploy() returns an Agent — you can attach event handlers, add more channels, or hot-reload config.
Greeting: Use
call.say()incall.startedto speak a greeting:mara.on("call.started", (call) => call.say("¡Hola! ¿En qué puedo ayudarte?"));
API Reference
Pinecall (client)
WebSocket client. Manages auth, reconnection, and agent multiplexing.
const pc = new Pinecall({
apiKey: "pk_...", // required
url: "wss://voice.pinecall.io/client", // default
reconnect: true, // auto-reconnect (default: true)
pingInterval: 30000, // keepalive ms (default: 30000)
});
await pc.connect(); // resolves on auth success
await pc.disconnect(); // graceful close
pc.on("connected", () => {});
pc.on("disconnected", (reason) => {});
pc.on("reconnecting", (attempt) => {});
pc.on("error", (err) => {});Agent
Created via pc.agent(id, config?) or pc.deploy(id, config). Owns channels, routes call events, and stores defaults.
Creation
const agent = pc.agent("my-agent", {
voice: "elevenlabs:abc",
language: "es",
stt: "deepgram-flux",
llm: {
engine: "openai",
model: "gpt-4.1-mini",
enabled: true,
prompt: "System prompt with {{template_vars}}.",
},
tools: [/* OpenAI function-calling format */],
});Channels
agent.addChannel("phone", "+18045551234");
agent.addChannel("phone", "sip:[email protected]");
agent.addChannel("webrtc");
// Per-channel config overrides
agent.addChannel("phone", "+34911234567", {
voice: "elevenlabs:spanishVoiceId",
language: "es",
});
// WhatsApp channel (see WhatsApp section for full setup)
agent.addChannel("whatsapp", {
phoneNumberId: "123456789012345",
accessToken: "EAABx...",
verifyToken: "my-secret",
appSecret: "abc123...",
});
// Update a channel's config at runtime
agent.configureChannel("+34911234567", { voice: "cartesia:newVoice" });
// Remove a channel
agent.removeChannel("+34911234567");Agent Methods
| Method | Description |
|--------|-------------|
| agent.addChannel(type, ref?, config?) | Register a phone, webrtc, mic, chat, or whatsapp channel |
| agent.removeChannel(ref) | Unregister a channel |
| agent.configure(opts) | Hot-reload agent defaults (voice, language, STT, LLM) — affects all future calls |
| agent.configureChannel(ref, config) | Update a specific channel's config |
| agent.configureSession(callId, opts) | Update config for a live call (equivalent to call.configure) |
| agent.dial(opts) | Make an outbound call — returns Promise<Call> |
| agent.call(callId) | Get a Call object by ID (undefined if not found) |
| agent.getConfig() | Returns the current AgentConfig |
| agent.stream() | SSE stream of this agent's events (see SSE) |
| agent.send(data) | Send a raw protocol message (low-level) |
agent.configure() — Hot-Reload
Update the agent's defaults at runtime. Changes take effect on all future calls — existing calls are not affected. Sends an agent.configure command over the WebSocket.
// Switch to French voice
agent.configure({ voice: "elevenlabs:frenchVoiceId", language: "fr" });
// Update LLM model
agent.configure({
llm: { engine: "openai", model: "gpt-4.1", enabled: true,
prompt: "Updated prompt." },
});
// Swap STT provider
agent.configure({ stt: "gladia" });No REST call needed.
agent.configure()uses the existing WebSocket — changes propagate instantly to the server.
agent.dial() — Outbound Calls
const call = await agent.dial({
to: "+14155551234",
from: "+13186330963",
greeting: "Hi! This is a follow-up call.", // server speaks via TTS
metadata: { appointmentId: "appt_001" },
config: { voice: "cartesia:uuid", language: "ar" }, // per-call override
});
call.on("call.ended", (_, reason) => console.log(`Done: ${reason}`));| Field | Type | Required | Description |
|-------|------|----------|-------------|
| to | string | ✅ | Destination number (E.164) |
| from | string | ✅ | Caller ID (must be a registered number) |
| greeting | string | — | Text the server speaks when callee picks up |
| metadata | object | — | Custom data attached to the call |
| config | object | — | Per-call config override (voice, STT, language) |
Pinecall (client) — Additional Methods
// Agent management
const agent = pc.getAgent("mara"); // get by ID (undefined if not found)
const removed = pc.removeAgent("mara"); // unregister agent (returns boolean)
// Token generation (for browser WebRTC/Chat connections)
const token = await pc.createToken("webrtc", "mara");
const token = await agent.createToken("chat");
// REST helpers (no WebSocket needed)
const voices = await pc.fetchVoices({ provider: "elevenlabs" });
const phones = await pc.fetchPhones();Call
Per-session handle. Created automatically on call.started.
Speech
| Method | Description |
|--------|-------------|
| call.say(text) | Speak text immediately (standalone, no in_reply_to) |
| call.reply(text) | Reply to the latest user message (auto-tracks in_reply_to) |
| call.replyStream(turn?) | Open a token stream → returns ReplyStream |
| call.cancel(msgId?) | Cancel a specific or the current message |
| call.clear() | Flush all queued TTS audio |
Greeting pattern: Use call.say() on call.started for inbound greetings. For outbound calls, pass greeting in agent.dial() — the server speaks it via TTS automatically.
// Inbound — SDK speaks the greeting
agent.on("call.started", (call) => {
if (call.direction === "inbound") {
call.say("Hello! How can I help you today?");
}
});
// Outbound — server speaks the greeting
const call = await agent.dial({
to: "+14155551234",
from: "+13186330963",
greeting: "Hi! This is a follow-up call.",
});Call Control
| Method | Description |
|--------|-------------|
| call.hangup() | End the call |
| call.forward(to, opts?) | Transfer to another number |
| call.sendDTMF(digits) | Send DTMF tones (e.g. "1234#") |
| call.hold() | Put on hold (plays hold music, mutes mic) |
| call.unhold() | Resume from hold |
| call.mute() | Mute mic (transcripts buffered) |
| call.unmute() | Unmute (emits call.unmuted with buffered transcript) |
Mid-Call Configuration
| Method | Description |
|--------|-------------|
| call.configure(opts) | Change voice, STT, language — takes effect immediately |
| call.setPrompt(text) | Replace the system prompt for this call |
| call.setPromptVars(vars) | Set {{variable}} values in the prompt template |
| call.addContext(text) | Append extra context after the system prompt |
| call.setPromptFile(path) | Load a prompt file and set it |
Conversation History
| Method | Description |
|--------|-------------|
| call.getHistory() | Fetch conversation messages (OpenAI format) |
| call.addHistory(msgs) | Inject messages into history (e.g. CRM context) |
| call.setHistory(msgs) | Replace entire conversation history |
| call.clearHistory() | Clear history (system prompt preserved) |
Properties
call.id // "CA7ec979f5..." — unique call ID
call.from // "+13186330963" or "sip:..."
call.to // destination number/URI
call.direction // "inbound" | "outbound"
call.transport // "phone" | "webrtc" | "unknown"
call.metadata // custom metadata from the channel
call.transcript // [{ role: "user", content: "..." }, ...] — user + assistant only
call.messages // full LLM history (populated on call.ended)
call.duration // seconds (populated on call.ended)
call.startedAt // epoch seconds
call.endedAt // epoch seconds
call.reason // "hangup" | "timeout" | ...ReplyStream
Token-by-token streaming for LLM responses. TTS starts as soon as a sentence boundary is detected.
const stream = call.replyStream(turn);
for await (const token of llm.stream(prompt)) {
if (stream.aborted) break; // user interrupted
stream.write(token);
}
stream.end();Events
Agent Events
Subscribe via agent.on(event, handler). All call-scoped events include call as the last argument.
| Event | Signature | When |
|-------|-----------|------|
| Lifecycle | | |
| call.started | (call) | New call connected |
| call.ended | (call, reason) | Call disconnected |
| User speech | | |
| speech.started | (event, call) | User began speaking (VAD) |
| speech.ended | (event, call) | User stopped speaking (VAD) |
| user.speaking | (event, call) | Interim STT transcript (updates live) |
| user.message | (event, call) | Final confirmed user text |
| Turns | | |
| eager.turn | (turn, call) | Early turn signal (low-latency response) |
| turn.end | (turn, call) | Final turn signal |
| turn.continued | (event, call) | User kept talking (auto-aborts active streams) |
| Bot speech | | |
| bot.speaking | (event, call) | Bot started speaking a message |
| bot.word | (event, call) | Individual word as TTS plays it |
| bot.finished | (event, call) | Bot finished speaking a message |
| bot.interrupted | (event, call) | Bot was cut off by user |
| Protocol | | |
| message.confirmed | (event, call) | Server acknowledged bot message |
| llm.tool_call | (call, data) | Server-side LLM requests a tool call |
| session.idle_warning | (event, call) | Idle warning — user hasn't spoken, call will timeout soon |
| session.timeout | (event, call) | Session timeout warning (max duration / idle) |
| WhatsApp | | |
| whatsapp.session_started | (event) | New WhatsApp conversation started |
| whatsapp.message | (event) | Incoming WhatsApp message received |
| whatsapp.response | (event) | Agent sent a WhatsApp response |
| whatsapp.status | (event) | Message delivery status (sent/delivered/read) |
Real-Time Transcript Flow
User speaks → speech.started
→ user.speaking (interim, fires multiple times)
→ speech.ended
→ user.message (final confirmed text)
→ eager.turn / turn.end
Bot responds → bot.speaking (message ID assigned)
→ bot.word (word-by-word as TTS plays)
→ bot.finished (done speaking)
Interruption → bot.interrupted
→ turn.continued (active ReplyStreams auto-aborted)bot.word Event
Build live transcripts word-by-word:
let currentMessage = "";
agent.on("bot.speaking", () => { currentMessage = ""; });
agent.on("bot.word", (event) => {
currentMessage += event.word + " ";
process.stdout.write(`\r🤖 ${currentMessage}`);
});
agent.on("bot.finished", () => console.log());Hot-Reload: Live Configuration
Everything is hot-reloadable. Voice, language, STT, prompt, tools — all can change during an active call. The server applies changes on the next LLM turn.
Three Configuration Scopes
| Scope | Method | Affects |
|-------|--------|---------|
| Agent defaults | pc.agent("id", config) | All future calls |
| Agent hot-reload | agent.configure(updates) | Updates defaults, future calls |
| Session (mid-call) | call.configure(opts) | This call only |
| Prompt (mid-call) | call.setPrompt(text) | This call's system prompt |
| Template vars | call.setPromptVars(vars) | This call's {{var}} values |
| Context | call.addContext(text) | Appended after prompt |
Prompt Template Variables
Define a prompt with {{placeholders}}. The server resolves them before each LLM request. Built-in variables: {{date}}, {{time}}.
const agent = pc.agent("support", {
llm: {
engine: "openai",
model: "gpt-4.1-mini",
enabled: true,
prompt: `You are {{agent_name}}, support agent at {{company}}.
Today is {{date}}, {{time}}.
Customer: {{customer_name}} ({{tier}} tier).`,
},
});
agent.on("call.started", async (call) => {
const customer = await lookupCaller(call.from);
await call.setPromptVars({
agent_name: "Nova",
company: "Acme Corp",
customer_name: customer.name,
tier: customer.tier,
});
call.say(`Hi ${customer.name}! How can I help?`);
});Adding Context Mid-Call
Append dynamic context without replacing the prompt:
agent.on("call.started", async (call) => {
const orders = await getRecentOrders(call.from);
await call.addContext(
`Recent orders:\n${orders.map(o => `- ${o.id}: ${o.status}`).join("\n")}`
);
});Switching Voice or Language Mid-Call
// User asks for Spanish
call.configure({ voice: "elevenlabs:spanishVoiceId", language: "es" });
call.reply("¡Claro! Ahora hablo en español.");Configuration Shortcuts
Voice and STT accept string shortcuts or full config objects:
// Shortcuts
{ voice: "elevenlabs:voiceId" }
{ stt: "deepgram-flux" }
{ stt: "deepgram:nova-3:fr" } // provider:model:language
// Full config objects
{
voice: { engine: "cartesia", voiceId: "abc", speed: 1.1 },
stt: { engine: "deepgram", model: "nova-3", language: "fr" },
}Note: Turn detection and VAD are auto-derived from the STT provider.
deepgram-flux→ native turn detection + native VAD. All others → smart_turn + silero VAD.
REST API
Static helpers for the Pinecall management API. No WebSocket connection needed.
fetchVoices(opts?)
List available TTS voices. Filter by provider and language.
import { fetchVoices } from "@pinecall/sdk";
// All ElevenLabs voices
const voices = await fetchVoices();
// Spanish Cartesia voices only
const esVoices = await fetchVoices({ provider: "cartesia", language: "es" });
voices.forEach(v => console.log(`${v.name} (${v.provider}:${v.id})`));
// → "Rachel (elevenlabs:21m00Tcm4TlvDq8ikWAM)"Returns: Voice[] — each voice has id, name, provider, gender, style, languages[], preview_url.
fetchPhones(opts)
List phone numbers on your Pinecall account.
import { fetchPhones } from "@pinecall/sdk";
const phones = await fetchPhones({ apiKey: "pk_..." });
phones.forEach(p => console.log(`${p.name} → ${p.number}`));
// → "(318) 633-0963 → +13186330963"Returns: Phone[] — each phone has number (E.164), name, sid, isSdk.
createToken(opts)
Generate a short-lived, single-use token for browser WebRTC or Chat connections. Requires API key — call this from your backend.
import { createToken } from "@pinecall/sdk";
// From your backend endpoint (API key stays server-side)
const token = await createToken({
channel: "webrtc", // "webrtc" or "chat"
agentId: "florencia",
apiKey: process.env.PINECALL_API_KEY!,
});
// Or via instance methods:
const token = await pc.createToken("webrtc", "florencia");
const token = await agent.createToken("webrtc");Returns: { token: string, server: string, expires_in: number }.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| channel | "webrtc" | "chat" | ✅ | Token type |
| agentId | string | ✅ | Agent slug (wire ID) |
| apiKey | string | ✅ | API key for authentication |
| apiUrl | string | — | Custom server URL |
See Security for the full token security model.
fetchWebRTCToken(opts) (deprecated)
⚠️ Deprecated. Use
createToken()instead.fetchWebRTCTokenonly works when the agent hasallowedOriginsconfigured.
Legacy helper — fetches a token from the public endpoint (requires allowedOrigins on the agent).
import { fetchWebRTCToken } from "@pinecall/sdk";
const { token, server } = await fetchWebRTCToken({
agentId: "my-agent",
apiKey: "pk_...", // optional: authenticates the request
});Returns: { token: string, server?: string }.
fetchTwilioBalance(opts?)
Check your Twilio account balance.
import { fetchTwilioBalance } from "@pinecall/sdk";
const balance = await fetchTwilioBalance({ apiKey: "pk_..." });
if (balance) console.log(`$${balance.balance} ${balance.currency}`);Returns: { balance: string, currency: string } | null.
Options
All REST helpers accept an apiUrl option to point to a custom server:
fetchVoices({ apiUrl: "http://localhost:1337" });
fetchPhones({ apiKey: "pk_...", apiUrl: "http://localhost:1337" });SSE Streaming
Stream real-time agent events over HTTP using Server-Sent Events. Works with any framework — returns a Web API Response or writes to a Node.js ServerResponse.
WebRTC vs SSE: If your frontend uses
@pinecall/voice-widgetor@pinecall/voice-core, events already arrive through the WebRTC DataChannel — you don't need SSE. SSE is for server-side dashboards, monitoring UIs, or backends that need to observe calls without being in the WebRTC session.
Single Agent Stream
// Web API (Remix, Next.js, Hono, Bun)
app.get("/events", () => agent.stream());
// Express / Node.js
app.get("/events", (req, res) => agent.stream(res));Multi-Agent Stream
Stream events from all agents via pc.stream(), or filter to specific ones:
// All agents
app.get("/events", () => pc.stream());
// Filtered to specific agents
app.get("/events", () => pc.stream({ agents: ["mara", "julia"] }));
// Express
app.get("/events", (req, res) => pc.stream(res));
app.get("/events", (req, res) => pc.stream(res, { agents: ["mara"] }));Filtering — Multi-Tenant Example
The agents filter lets you build per-user dashboards where each user only sees their own agents:
// Each user owns specific agents
const userAgents = {
"user_1": ["mara", "julia"],
"user_2": ["nova", "receptionist"],
};
// User-scoped SSE endpoint
app.get("/api/events", (req, res) => {
const userId = req.auth.userId; // from your auth middleware
const allowed = userAgents[userId] || [];
// Only streams events from agents this user owns
pc.stream(res, { agents: allowed });
});The filter works by subscribing only to the specified agents' event emitters — events from other agents never reach the stream. This is purely server-side filtering, so there's no data leakage.
Browser A (user_1) Browser B (user_2)
│ │
└── EventSource("/api/events") ──► SSE: mara, julia events only
│
└── EventSource("/api/events") ──► SSE: nova, receptionist onlyStreamed Events
Each SSE message has an event: field and a JSON data: body with agent ID:
| Event | Data Fields | When |
|-------|------------|------|
| connected | agent or agents | Stream established |
| call.started | callId, from, to, direction, transport | Call begins |
| call.ended | callId, reason, duration | Call ends |
| user.speaking | callId, text | Interim STT transcript |
| user.message | callId, text, messageId | Final user text |
| turn.end | callId, text, probability | User turn ended |
| turn.pause | callId, probability | Turn pause detected |
| speech.started | callId | User began speaking |
| speech.ended | callId | User stopped speaking |
| bot.speaking | callId, messageId, text | Bot started speaking |
| bot.word | callId, messageId, word | Word-by-word playback |
| bot.finished | callId, messageId | Bot done speaking |
| bot.interrupted | callId, messageId | Bot cut off by user |
Wire format:
event: user.message
data: {"callId":"CA123","text":"Hello","messageId":"msg_abc","agent":"mara"}
event: bot.speaking
data: {"callId":"CA123","messageId":"msg_def","text":"Hi!","agent":"mara"}A :ping comment is sent every 30s as keepalive.
Client Example
const source = new EventSource("/api/events");
source.addEventListener("call.started", (e) => {
const { agent, from, transport } = JSON.parse(e.data);
console.log(`📞 [${agent}] Call from ${from} via ${transport}`);
});
source.addEventListener("user.message", (e) => {
const { agent, text } = JSON.parse(e.data);
console.log(`[${agent}] User: ${text}`);
});
source.addEventListener("bot.speaking", (e) => {
const { agent, text } = JSON.parse(e.data);
console.log(`[${agent}] Bot: ${text}`);
});WhatsApp is a text-based channel — no STT/TTS/VAD pipeline. Messages route directly to the server-side LLM. The agent receives text, generates a response, and sends it back as a WhatsApp message.
Requires server-side LLM. WhatsApp channels use the same
llmconfig as voice channels. Client-side LLM (bring your own) is not supported for WhatsApp.
WhatsApp Setup
- Create a Meta Business App at developers.facebook.com
- Add the WhatsApp product to your app
- Get your credentials from the API Setup page:
- Phone Number ID — numeric string (e.g.
123456789012345) - Permanent Access Token — generate a system user token with
whatsapp_business_messagingpermission - App Secret — from App Settings → Basic (for webhook signature verification)
- Phone Number ID — numeric string (e.g.
- Configure the webhook URL in your Meta app:
Verification token: set to match yourhttps://voice.pinecall.io/whatsapp/webhookverifyToken(default:pinecall-wa-verify) - Subscribe to messages — check
messagesin the webhook fields
WhatsApp Usage
import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const agent = pc.agent("support", {
language: "en",
llm: {
engine: "openai",
model: "gpt-4.1-mini",
enabled: true,
prompt: "You are a helpful support agent on WhatsApp. Be concise.",
},
tools: [
{
type: "function",
function: {
name: "lookupOrder",
description: "Look up an order by ID",
parameters: {
type: "object",
properties: { orderId: { type: "string" } },
required: ["orderId"],
},
},
},
],
});
// Register WhatsApp channel
agent.addChannel("whatsapp", {
phoneNumberId: "123456789012345", // From Meta API Setup
accessToken: process.env.WA_TOKEN!, // Permanent Graph API token
verifyToken: "my-verify-token", // Must match Meta webhook config
appSecret: process.env.WA_APP_SECRET!, // HMAC verification (recommended)
});
// Also register voice channels on the same agent
agent.addChannel("phone", "+13186330963");
agent.addChannel("webrtc");
// Voice greeting (WhatsApp doesn't use this)
agent.on("call.started", (call) => call.say("Hello!"));
// WhatsApp events
agent.on("whatsapp.session_started", (event) => {
console.log(`💬 New WhatsApp chat: ${event.contact_name} (${event.contact_phone})`);
});
agent.on("whatsapp.message", (event) => {
console.log(`📩 ${event.name}: ${event.text}`);
});
agent.on("whatsapp.status", (event) => {
console.log(`✓ ${event.status} → ${event.recipient}`);
});
// Handle tool calls (works for both voice AND WhatsApp)
agent.on("llm.tool_call", async (call, data) => {
const results = [];
for (const tc of data.tool_calls) {
const args = JSON.parse(tc.arguments);
const result = await myToolHandler(tc.name, args);
results.push({ tool_call_id: tc.id, result });
}
agent.send({ event: "llm.tool_result", call_id: call.id, msg_id: data.msg_id, results });
});Multi-channel agent: The same agent can handle voice calls AND WhatsApp messages simultaneously. The LLM config, tools, and prompt are shared — only the transport differs.
WhatsApp Events
| Event | Data Fields | When |
|-------|------------|------|
| whatsapp.session_started | session_id, contact_phone, contact_name | First message from a new contact |
| whatsapp.message | session_id, from, name, type, text, message_id | Incoming message received |
| whatsapp.response | session_id, to, text | Agent sent a response |
| whatsapp.status | status, recipient, message_id | Delivery status update |
Status values: sent → delivered → read
WhatsAppChannelConfig
import type { WhatsAppChannelConfig } from "@pinecall/sdk";| Field | Type | Required | Description |
|-------|------|----------|-------------|
| phoneNumberId | string | ✅ | Meta Phone Number ID from API Setup |
| accessToken | string | ✅ | Permanent Graph API access token |
| verifyToken | string | — | Webhook verification token (default: pinecall-wa-verify) |
| appSecret | string | — | Meta App Secret for HMAC signature verification |
Voice Notes
When a user sends a voice note on WhatsApp, the server automatically:
- Downloads the audio (OGG/Opus format) via the Cloud API
- Transcribes it using Deepgram Nova-3
- Feeds the transcript to the LLM as text
The agent sees voice notes as regular text messages — no special handling needed.
Requires
DEEPGRAM_API_KEYenvironment variable on the voice server.
24h Service Window
Meta enforces a 24-hour service window for free-form messaging:
- Inside window: The agent can send any text message. Window refreshes on each inbound message.
- Outside window: Only pre-approved template messages can be sent.
The SDK tracks this automatically. If the window is closed, the server logs a warning. Template message support is planned for a future release.
Environment Variables
Set these on the voice server (sdk-server):
| Variable | Required | Description |
|----------|----------|-------------|
| WHATSAPP_VERIFY_TOKEN | No | Hub verification token (default: pinecall-wa-verify) |
| WHATSAPP_APP_SECRET | No | Meta App Secret for webhook HMAC verification |
| DEEPGRAM_API_KEY | For voice notes | Required if you want audio message transcription |
Configuration Reference
STT Providers
Deepgram Flux (recommended)
Best for real-time voice agents. Turn detection and VAD are auto-derived — no configuration needed.
stt: {
provider: "deepgram-flux",
keyterms: ["pinecall"], // boost recognition for specific terms
eot_threshold: 0.5, // end-of-turn sensitivity (0-1)
eager_eot_threshold: 0.7, // eager turn threshold
eot_timeout_ms: 2000,
}
// Shortcut: "deepgram-flux"Auto-derived: Flux → native turn detection + native VAD. No need to specify
turnDetection.
Deepgram Nova
Classic STT — turn detection and VAD auto-derived (smart_turn + silero).
stt: {
provider: "deepgram",
model: "nova-3",
language: "en",
interim_results: true,
smart_format: true,
punctuate: true,
profanity_filter: false,
endpointing_ms: 300,
utterance_end_ms: 1000,
keywords: ["pinecall"],
}
// Shortcut: "deepgram" or "deepgram:nova-3" or "deepgram:nova-3:es"Gladia
stt: {
provider: "gladia",
model: "accurate",
language: "en",
endpointing: 300,
speech_threshold: 0.8,
code_switching: false,
audio_enhancer: true,
}
// Shortcut: "gladia"AWS Transcribe
stt: { provider: "transcribe", language: "en-US" }
// Shortcut: "transcribe"TTS Providers
ElevenLabs
voice: {
provider: "elevenlabs",
voice_id: "JBFqnCBsd6RMkjVDRZzb",
model: "eleven_turbo_v2_5",
speed: 1.0,
stability: 0.5,
similarity_boost: 0.75,
style: 0,
use_speaker_boost: true,
}
// Shortcut: "elevenlabs:JBFqnCBsd6RMkjVDRZzb"Cartesia
voice: {
provider: "cartesia",
voice_id: "a0e99841-438c-4a64-b679-ae501e7d6091",
model: "sonic",
speed: 1.0,
volume: 1.0,
emotion: null,
language: "en",
}
// Shortcut: "cartesia:a0e99841-438c-4a64-b679-ae501e7d6091"AWS Polly
voice: {
provider: "polly",
voice_id: "Joanna",
engine: "neural",
language: "en-US",
}
// Shortcut: "polly:Joanna"LLM Providers
OpenAI
llm: {
engine: "openai",
model: "gpt-4.1-mini", // or "gpt-4.1", "gpt-4.1-nano"
enabled: true,
prompt: "System prompt here.",
temperature: 0.7,
max_tokens: 1024,
}Mistral
llm: {
engine: "mistral",
model: "mistral-medium",
enabled: true,
prompt: "System prompt here.",
}LLM shortcut:
llm: "openai:gpt-4.1-mini"expands to{ engine: "openai", model: "gpt-4.1-mini", enabled: true }.
Session Limits
Calls have built-in safety limits to prevent runaway sessions. The server enforces these defaults:
| Setting | Default | Description |
|---------|---------|-------------|
| max_duration_seconds | 600 (10 min) | Hard cap on total call length. Call is terminated after this time regardless of activity. |
| idle_timeout_seconds | 60 | Auto-hangup after this many seconds of no user speech. |
| idle_warning_seconds | 15 | Emit session.idle_warning event this many seconds before idle timeout. Use it to prompt the user or change the UI. 0 = no warning. |
| idle_grace_seconds | 10 | After idle timeout fires, the agent gets this many seconds to prompt the user before force-hangup. |
Override per-agent:
const agent = pc.agent("receptionist", {
voice: "elevenlabs:abc",
stt: "deepgram-flux",
llm: { engine: "openai", model: "gpt-4.1-mini", enabled: true, prompt: "..." },
session_limits: {
max_duration_seconds: 1800, // 30 minutes
idle_timeout_seconds: 120, // 2 minutes of silence
idle_warning_seconds: 30, // warn 30s before timeout
idle_grace_seconds: 15,
},
});Disable limits (not recommended):
session_limits: {
max_duration_seconds: 0, // 0 = unlimited
idle_timeout_seconds: 0, // 0 = disabled
}How it works:
- The server starts two watchdog tasks when a call begins.
_watchdog_max_durationfires aftermax_duration_seconds— emitssession.timeoutthen hangs up._watchdog_idletracks_last_user_activity. When the user hasn't spoken foridle_timeout_seconds, it emitssession.timeoutwith a grace period.- The
session.timeoutevent fires before the actual hangup, giving you a chance to warn the user:
agent.on("session.idle_warning", (event, call) => {
// event.remaining_seconds: seconds until timeout
// event.idle_timeout_seconds: the configured idle timeout
call.say("Are you still there?");
});
agent.on("session.timeout", (event, call) => {
// event.reason: "max_duration" | "idle_timeout"
call.say("Goodbye! The call is ending due to inactivity.");
});Timeline:
[silence starts] ──── idle_warning fires ──── idle_timeout fires ──── hangup
0s (timeout - warning)s timeout sNote: Bot speech (e.g. "Are you still there?") pauses the idle counter but does not reset it. Only real user speech resets the timer. This prevents infinite warning loops.
WebRTC widget integration: The @pinecall/voice-widget automatically responds to session.idle_warning by switching the orb to a blinking amber state (.idle-warning CSS class, configurable via colorWarning theme prop). On session.timeout, the widget auto-disconnects.
Interruption
Controls whether users can interrupt the bot mid-speech.
interruption: {
enabled: true,
energy_threshold_db: -40, // min energy to trigger interrupt
min_duration_ms: 200, // min speech duration to trigger
}
// Shortcut: false (disables interruption entirely)Analysis & Audio Metrics
Real-time audio metrics for waveform visualization and energy monitoring.
config: {
analysis: {
send_audio_metrics: true,
audio_metrics_interval_ms: 100,
send_turn_audio: false,
send_bot_audio: false,
}
}audio.metrics Event
Emitted per interval — one for user (mic) and one for bot (TTS):
agent.on("audio.metrics", (evt, call) => {
// evt.source: "user" | "bot"
// evt.energy_db: -60 to 0 (higher = louder)
// evt.rms: 0 to 1 (normalized amplitude)
// evt.peak: 0 to 1
// evt.is_speech: boolean (VAD state)
// evt.vad_prob: 0 to 1
});| Field | Type | Description |
|-------|------|-------------|
| source | "user" | "bot" | Audio source |
| energy_db | number | Energy in decibels (-60 to 0) |
| rms | number | Root mean square amplitude (0–1) |
| peak | number | Peak amplitude (0–1) |
| is_speech | boolean | VAD speech detection state |
| vad_prob | number | VAD probability (0–1) |
Multi-Environment
Run dev, staging, and production agents simultaneously on the same voice server, sharing the same phone numbers. No extra Twilio costs. Each developer gets their own isolated agent instance.
How It Works
The SDK reads PINECALL_MODE from the environment and prefixes agent IDs automatically:
| PINECALL_MODE | Wire slug | Notes |
|-----------------|-----------|-------|
| (empty/unset) | florencia | Production — all callers |
| dev | dev-berna-florencia | Dev — includes developer ID for isolation |
| staging | staging-florencia | Staging — shared environment, no dev ID |
The server routes phone calls based on the caller's phone number:
Incoming call to +13186330963
│
┌────────┴────────┐
│ │
Caller in Caller NOT in
DEV_CALLERS DEV_CALLERS
│ │
┌─────────┴─────────┐ ┌───┴───┐
│ dev-berna- │ │ │
│ florencia │ │ florencia │
│ (your dev agent) │ │ (prod) │
└───────────────────┘ └───────┘Dev and prod coexist on the same phone number. The server's caller-based routing handles the split.
Setup
Set PINECALL_MODE before importing @pinecall/sdk. The SDK reads it at initialization time.
// agent/index.js — set mode before SDK import
const ENV = process.env.NODE_ENV || "production";
if (ENV === "development") process.env.PINECALL_MODE = "dev";
else if (ENV === "staging") process.env.PINECALL_MODE = "staging";
import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY });
await pc.connect();
const agent = pc.deploy("florencia", { /* config */ });
// In dev: registers as "dev-berna-florencia"
// In prod: registers as "florencia"
// Configure caller-based routing for dev/staging
if (pc.mode) {
const callers = process.env.DEV_CALLERS;
if (callers) {
agent.send({
event: "dev.config",
callers: callers.split(",").map(s => s.trim()),
});
}
}Each developer creates a .env.local file (gitignored) with their personal config:
# .env.local — each developer sets their own
PINECALL_DEV_ID=berna
DEV_CALLERS=+34607827824Multi-Developer Isolation
In dev mode, the SDK includes a developer identity in the agent slug to prevent collisions:
dev-{PINECALL_DEV_ID}-{agentName}The developer ID is resolved in order:
PINECALL_DEV_IDenvironment variable- OS username (automatic fallback)
This means multiple developers can run the same agent simultaneously without interfering:
| Developer | .env.local | Wire Slug | Phone Routing |
|-----------|-------------|-----------|---------------|
| Berna | PINECALL_DEV_ID=berna | dev-berna-florencia | Calls from +34607... → Berna's agent |
| Juan | PINECALL_DEV_ID=juan | dev-juan-florencia | Calls from +34612... → Juan's agent |
| Production | (none) | florencia | All other callers |
Phone Routing
The voice server supports caller-based routing for non-production agents:
- Production agent registers
+13186330963→ stored in the main phone map - Dev agent registers the same number → stored in the dev override map
- On incoming call:
- If the caller is in
_dev_allowed_callers→ routes to the dev agent - Otherwise → routes to the production agent
- If the caller is in
To set your dev callers, send a dev.config event after connecting:
if (pc.mode) {
agent.send({
event: "dev.config",
callers: ["+34607827824"], // your phone number(s)
});
}Multi-Developer Strategies
When multiple developers work on the same agent, there are two approaches for phone testing:
Option A: Shared number + caller override (recommended)
All developers share the same Twilio number. Each developer configures their personal phone number in DEV_CALLERS. The server routes based on who's calling:
+13186330963 (shared Twilio number)
│
├── Call from +34607... → dev-berna-florencia
├── Call from +34612... → dev-juan-florencia
├── Call from +34699... → dev-flor-florencia
└── Call from anyone else → florencia (production)# Berna's .env.local
PINECALL_DEV_ID=berna
DEV_CALLERS=+34607827824
# Juan's .env.local
PINECALL_DEV_ID=juan
DEV_CALLERS=+34612345678
# Flor's .env.local
PINECALL_DEV_ID=flor
DEV_CALLERS=+34699887766Zero extra Twilio cost. One number serves all environments simultaneously.
Option B: Dedicated number per developer
Each developer uses their own Twilio number. No caller override needed — all calls to that number go to the dev agent:
// Berna uses a dedicated dev number
agent.addChannel("phone", "+18005551001"); // Berna's dev number
// Production uses the main number
agent.addChannel("phone", "+13186330963");Simpler routing, but requires extra Twilio numbers ($1/month each).
Comparison:
| | Shared + Override | Dedicated Numbers |
|---|---|---|
| Cost | No extra | $1/month per dev |
| Setup | DEV_CALLERS in .env.local | Separate Twilio number per dev |
| Routing | Caller-based | Number-based |
| External callers | Can't reach dev agent | Can reach dev agent |
| Best for | Internal testing | External/client testing |
WhatsApp Dev Routing
WhatsApp uses the same sender-based routing pattern as phone calls. Multiple developers can share the same WhatsApp Business number, with messages routed to dev agents based on the sender's phone number.
Meta WhatsApp Business Number (phone_number_id: 123456)
│
├── Message from +34607... → dev-berna-florencia
├── Message from +34612... → dev-juan-florencia
└── Message from anyone else → florencia (production)The dev.config event configures both phone and WhatsApp routing in one call:
if (pc.mode) {
agent.send({
event: "dev.config",
callers: ["+34607827824"], // routes BOTH phone calls AND WhatsApp messages
});
}Same
DEV_CALLERS, both channels. When your phone number sends a WhatsApp message to the business number, it routes to your dev agent. When your phone number calls the Twilio number, it also routes to your dev agent. One config, all channels.
Alternatively, each developer can register a separate Meta test number (from the Meta API console), avoiding the need for caller-based routing on WhatsApp.
WebRTC & Chat Dev Routing
WebRTC and Chat channels don't need caller-based routing — they use slug-based isolation automatically:
// Dev mode → agent registers as "dev-berna-florencia"
// The browser requests a token for "dev-berna-florencia" specifically
const { token } = await fetchWebRTCToken({ agentId: "dev-berna-florencia" });Each developer gets their own slug, their own tokens, their own sessions. Multiple developers can test simultaneously without interference.
Any web app can connect. WebRTC and Chat connections go directly to
voice.pinecall.iovia DataChannel (audio) or WebSocket (text). The browser never needs access to the agent process. This means any number of web apps, mobile apps, or third-party integrations can connect to the same agent using tokens — without the developer exposing SSE endpoints, webhook URLs, or the agent's Node.js process. The voice server is the relay.
Staging
Staging uses a simple prefix without developer ID — it's a shared environment:
NODE_ENV=staging node agent/index.js
# → Agent slug: "staging-florencia"Staging agents use the same caller-based override map. Useful for pre-production testing on a staging server.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| PINECALL_MODE | "" | "dev", "staging", or empty for production |
| PINECALL_DEV_ID | OS username | Developer identity for slug isolation |
| DEV_CALLERS | — | Comma-separated phone numbers for caller-based routing |
Vite Integration
When using Vite as your dev server, agents can be embedded in the same process via a plugin:
// vite-agent-plugin.mjs
export default function agentPlugin() {
return {
name: "my-agent",
async configureServer() {
const { startAgent } = await import("./agent/index.js");
await startAgent();
},
};
}// vite.config.js
import agentPlugin from "./vite-agent-plugin.mjs";
export default defineConfig({
plugins: [react(), agentPlugin()],
});npm run dev starts both the web server and the voice agent in a single process. Vite sets NODE_ENV=development automatically, so the agent runs in dev mode with no extra configuration.
npm run dev
🟢 SDK connected
🔧 DEV mode [berna] — calls from +34607827824 → dev-berna-florencia
🌸 Florencia agent ready (Phone + WebRTC + WhatsApp) [dev]
➜ Local: http://localhost:5173/Public API
const pc = new Pinecall({ apiKey: "pk_..." });
pc.mode; // "dev" | "staging" | "" — current environment mode
pc.devMode; // true if mode === "dev" — backward-compatible getter
pc.devId; // "berna" — developer identity for slug isolationDeployment Topologies
Pinecall uses two fundamentally different communication patterns. Understanding this distinction is key to choosing the right deployment topology.
Observe vs Interact
There are three communication patterns in Pinecall. Which one you use depends on the channel and your use case.
1. Phone calls (inbound + outbound) — Backend only, EventEmitter
Phone calls are inherently backend-side. Registering an agent with pc.agent() requires a PINECALL_API_KEY — this must never be exposed in frontend code. The agent runs in your Node.js process and receives all call events via the SDK's WebSocket → in-memory EventEmitter.
Twilio ──► voice.pinecall.io ──► SDK WebSocket ──► Your Node.js
│
EventEmitter
agent.on("call.started")
agent.on("user.message")
agent.on("llm.tool_call")There is no browser involvement. The entire call lifecycle (STT → LLM → TTS → tool calls) happens server-side. If your agent is phone-only, your architecture is simple: a single Node.js process with the SDK.
2. Browser interaction (WebRTC / Chat) — Direct to voice server
When users interact from a web app (voice widget, chatbox), the browser connects directly to voice.pinecall.io — it never touches your backend:
Browser ──► GET /webrtc/token?agent_id=mara (public, no API key)
──► POST /webrtc/offer { sdp, token } → audio via DataChannel
Browser ──► GET /chat/token?agent_id=mara (public, no API key)
──► WS /chat/ws?token=cht_xxx → text via WebSocketThe token endpoints are public because they only verify that the agent is online — no secrets are exchanged. The browser gets a short-lived signed token, then opens a direct connection to the voice server. Your agent process can run anywhere.
🔒 Origin restriction (recommended): By default, any website can request a token for your agent. To restrict which domains can embed your voice widget or chatbox, configure
allowedOrigins:const agent = pc.agent("mara", { allowedOrigins: ["https://yourdomain.com", "http://localhost:*"], // ...config });When set, the server validates the
Originheader and rejects requests from unlisted domains. For maximum security (mobile apps, multi-tenant platforms), proxy token requests through your own backend with API key authentication.
3. SSE — Observe events for dashboards and panels
SSE is for observing agent events from a web frontend — call center panels, admin dashboards, monitoring UIs. It requires the agent to run in the same Node.js process as your web server (embedded topology):
Browser ←── SSE ←── Your Express/Remix ←── agent.stream() ←── EventEmitterThis is how you build a call center panel without exposing API keys:
// Your backend — agent + SSE in the same process
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const agent = pc.agent("support", { /* config */ });
agent.addChannel("phone", "+13186330963");
// SSE endpoint — filter by user role, no API key to the browser
app.get("/api/events", (req, res) => {
const userId = req.auth.userId;
const allowed = getUserAgents(userId); // your auth logic
pc.stream(res, { agents: allowed }); // only their agents
});The browser sees real-time call events (who's calling, transcripts, tool calls) but has zero access to the API key or agent internals. You control exactly which events reach which user.
Summary:
| Channel | Who initiates | Where it runs | How events flow | API key exposed? |
|---------|--------------|---------------|----------------|-----------------|
| Phone (inbound) | Twilio | Backend only | EventEmitter → SDK WebSocket | ❌ Server-side only |
| Phone (outbound) | agent.dial() | Backend only | EventEmitter → SDK WebSocket | ❌ Server-side only |
| WebRTC | Browser user | Browser → voice server | DataChannel (direct) | ❌ Token-based |
| Chat | Browser user | Browser → voice server | WebSocket (direct) | ❌ Token-based |
| WhatsApp | Meta webhook | voice server | SDK WebSocket → EventEmitter | ❌ Server-side only |
| SSE | Browser (observe) | Your backend → browser | EventEmitter → agent.stream() | ❌ Your auth controls access |
Key insight: API keys never leave your backend. Phone calls and tool execution happen server-side. Browser users connect via tokens. SSE lets you build dashboards with your own auth layer on top.
With this in mind, your agent can run embedded inside your web server or as a standalone process:
Embedded Agent (same process)
The agent runs inside your web server (Express, Remix, Hono, etc.) or via a Vite plugin. Both the web app and the agent share the same Node.js process.
┌──────────────────────────────────────┐
│ Your Node process │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Web App │ │ Agent (SDK) │ │
│ │ Express │◄────│ pc.agent() │ │
│ │ /api/* │ │ event bus │ │
│ └──────────┘ └──────┬───────┘ │
│ │ │
│ SSE ✅ WS │
│ agent.stream() │ │
│ pc.stream() ▼ │
│ voice.pinecall.io │
└──────────────────────────────────────┘What works:
- ✅ SSE Streaming —
agent.stream()andpc.stream()pipe events directly from the in-memoryEventEmitter - ✅ REST endpoints —
req.app.agentor module-level reference - ✅ Hot-reload — file watchers, Vite HMR
- ✅ Single
npm run dev— Vite plugin boots the agent automatically
Example (Vite plugin — recommended for dev):
// vite-agent-plugin.mjs
export default function agentPlugin() {
return {
name: "my-agent",
async configureServer() {
const { startAgent } = await import("./agent/index.js");
await startAgent();
},
};
}Example (Express):
import express from "express";
import { Pinecall } from "@pinecall/sdk";
const app = express();
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const agent = pc.agent("receptionist", { /* config */ });
agent.addChannel("phone", "+13186330963");
agent.addChannel("webrtc");
agent.addChannel("chat");
// SSE endpoint — works because agent is in the same process
app.get("/api/events", (req, res) => agent.stream(res));
// Custom API that reads agent state
app.get("/api/calls", (req, res) => {
res.json({ activeCalls: agent.calls.size });
});
app.listen(3000);Standalone Agent (separate process)
The agent runs as its own Node process, alongside a separate web server. Both connect to voice.pinecall.io independently.
┌──────────────┐ ┌──────────────────┐
│ Web App │ │ Agent Process │
│ (Next.js, │ │ node agent.js │
│ Remix, etc) │ │ pc.agent() │
│ │ │ │
│ SSE ❌ │ │ WS ────────► │
│ No agent │ │ voice.pinecall │
│ reference │ │ .io │
└──────────────┘ └──────────────────┘
│ │
│ ┌────────────────┘
▼ ▼
voice.pinecall.ioBrowser users (WebRTC, chat) connect directly to the voice server via tokens — they don't care where the agent process lives. SSE is the only thing that breaks because it needs in-process access to the EventEmitter.
Headless Agent (no web server)
The agent doesn't need a web server at all. Many agents are pure phone/SIP agents — they answer calls, run tools, and hang up. No frontend, no API, no UI. Just a Node process running 24/7.
┌─────────────────────────┐
│ node agent.js │
│ │
│ pc.agent("julia") │
│ addChannel("phone") │
│ addChannel("sip:...") │
│ │
│ WS ────────────────► │
│ voice.pinecall.io │
└─────────────────────────┘
That's it.// agent.js — a complete production agent, no web server needed
import { Pinecall } from "@pinecall/sdk";
import { openDoor, identifyVisitor } from "./tools.js";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const julia = pc.deploy("julia", {
prompt: "You are Julia, the intercom concierge...",
model: "gpt-4.1-mini",
voice: "elevenlabs:abc",
language: "es",
channels: ["phone:+13186330963", "sip:[email protected]"],
tools: [openDoor, identifyVisitor],
});
julia.on("call.started", (call) => call.say("¿Quién es?"));
julia.on("llm.tool_call", async (call, data) => {
// Tools run locally — no webhooks, no exposed APIs
for (const tc of data.tool_calls) {
const result = await handleTool(tc.name, JSON.parse(tc.arguments));
julia.send({ event: "llm.tool_result", call_id: call.id, msg_id: data.msg_id, results: [{ tool_call_id: tc.id, result }] });
}
});
console.log("Julia is live. Ctrl+C to stop.");
// Runs forever — PM2, Docker, systemd, whatever.This is the simplest possible deployment. Deploy it with PM2, Docker, systemd — it connects to the voice server and waits for calls. The tool handlers (openDoor, identifyVisitor) call your internal APIs, databases, or hardware directly from the same process. No webhook URLs, no public endpoints, no attack surface.
Comparison
| Feature | Embedded | Standalone | Headless |
|---------|----------|------------|----------|
| Web server | ✅ Same process | Separate process | ❌ None |
| SSE (agent.stream()) | ✅ Works | ❌ Not available | ❌ N/A |
| WebRTC (browser voice) | ✅ Via DataChannel | ✅ Via DataChannel | ✅ Via DataChannel |
| Chat (browser text) | ✅ Via /chat/ws | ✅ Via /chat/ws | ✅ Via /chat/ws |
| Phone / SIP | ✅ | ✅ | ✅ |
| WhatsApp | ✅ | ✅ | ✅ |
| Tool calls | ✅ In-process | ✅ In-process | ✅ In-process |
| Agent state in web API | ✅ Direct reference | ❌ No shared memory | ❌ N/A |
| Complexity | Medium | Medium | Lowest |
| Best for | Dev + dashboards | Web app + agent | Phone/SIP agents |
Recommendation:
- Embedded for development (Vite plugin) and apps that need SSE dashboards
- Standalone for production web apps where the agent and web server scale independently
- Headless for phone/SIP agents, IoT, background services — anything without a UI
Philosophy
Pinecall SDK is designed around one idea: any existing app can add a voice agent without changing its architecture.
Traditional voice AI platforms (Vapi, Retell, Bland) are platform-first — you configure agents in their dashboard, define tools as JSON schemas, and expose webhook URLs for the platform to call. Your app adapts to the platform.
Pinecall is code-first — the agent is your code. It runs inside your app, uses your database, calls your internal APIs, and handles tool calls locally. The platform adapts to your app.
Platform-first (Vapi):
Your App ──webhook──► Vapi Dashboard ──POST──► Your Webhook URL
(config UI) (exposed endpoint)
Code-first (Pinecall):
┌─── Your App ──────────────────────┐
│ your code + pc.agent() + tools │──WS──► voice.pinecall.io
│ everything runs here │ (audio pipeline only)
└───────────────────────────────────┘This matters because:
- Existing chatbots (Langchain, LlamaIndex, custom LLM pipelines) can become voice agents by hooking into
turn.endand streaming tocall.replyStream(). No rewrite needed. - Tool calls are local functions, not webhook URLs. Your agent can call
db.query(),redis.get(),hardware.openDoor()— anything your process can reach. No exposed endpoints, no public API surface. - Multi-channel is native. The same agent instance handles phone calls, SIP intercoms, WebRTC voice widgets, text chat, and WhatsApp. One codebase, all channels.
- No vendor lock-in on the LLM. Use server-side LLM (we run it) or bring your own (OpenAI, Anthropic, local Ollama). Switch mid-call if you want.
The voice server (voice.pinecall.io) handles the hard real-time parts — audio transport, STT, TTS, VAD, turn detection. Your code handles everything else — business logic, tools, prompts, history, state. Each side does what it's good at.
Security
Token Security Model
Browser connections (WebRTC and Chat) use short-lived tokens generated by the voice server. The recommended model: your backend generates tokens using your API key, and distributes them to browsers through your own auth layer.
This is the same model used by LiveKit, Twilio, Daily.co, and every major real-time platform.
Browser → Your Backend (your auth: session, JWT, OAuth)
↓
pc.createToken("webrtc", "florencia")
↓ (API key in Authorization header)
voice.pinecall.io → { token, server, expires_in }
↓
Your Backend returns token to browser
↓
Browser connects to voice.pinecall.io with tokenBackend (Express, Next.js, Hono, etc.):
import { Pinecall } from "@pinecall/sdk";
const pc = new Pinecall({ apiKey: process.env.PINECALL_API_KEY! });
await pc.connect();
const agent = pc.agent("florencia", { /* config */ });
// Token endpoint — protected by YOUR auth
app.get("/api/token", authMiddleware, async (req, res) => {
const channel = req.query.channel as "webrtc" | "chat";
const token = await agent.createToken(channel);
res.json(token);
});
// Or if agent is in a separate process:
app.get("/api/token", authMiddleware, async (req, res) => {
const token = await pc.createToken("webrtc", "florencia");
res.json(token);
});Frontend (VoiceWidget):
<VoiceWidget
agent="florencia"
tokenProvider={async () => {
const res = await fetch("/api/token?channel=webrtc", {
credentials: "include", // send your session cookie
});
return res.json();
}}
/>Why Tokens Are Safe
Tokens have three security properties that make them safe to pass to browsers:
| Property | Value | Effect | |----------|-------|--------| | Single-use | Consumed on first connection | Can't be reused by an attacker | | Short-lived | 60 second TTL | Expires before anyone can steal it | | Scoped | Locked to agent + org | Can't be used for a different agent |
The token is not the security boundary — your backend is. The token is a short-lived capability that proves "someone authorized gave me permission to connect." The security question is: who can call your /api/token endpoint?
- Requires login → only authenticated users get tokens
- Rate limited → can't bulk-generate tokens
- Permission-checked
