@aihumanity/voice-sdk
v0.1.1
Published
JavaScript SDK for AIHumanity / eimi voice AI calls — wraps Ultravox with call-state tracking, transcripts, and emotion detection.
Maintainers
Readme
@aihumanity/voice-sdk
A small, batteries-included JavaScript SDK for embedding AIHumanity / eimi voice AI calls on any web page.
It wraps ultravox-client and
adds the things you almost always end up writing yourself:
- One-call setup: SDK fetches the
joinUrlfrom your eimi backend and joins the call for you. - A semantic call-state machine:
idle→connecting→connected→listening/speaking/thinking→disconnecting→idle. - Live transcripts, with
transcript/transcriptsevents and a snapshot getter. - Vocal-emotion extraction from
[EMOTION_CONTEXT]data messages produced by the eimi emotion bridge (configurable regex). - Mic / speaker mute helpers.
- A pre-built React hook (
@aihumanity/voice-sdk/react). - A pre-built floating-button widget (
@aihumanity/voice-sdk/widget) — drop a single<script>on any site.
Installation
npm install @aihumanity/voice-sdk ultravox-client
# or
pnpm add @aihumanity/voice-sdk ultravox-clientultravox-client is a hard runtime dependency; it ships separately so multiple
SDKs / apps can dedupe it. React is an optional peer dependency — only needed
if you import the React adapter.
The SDK is ESM-only because ultravox-client is ESM-only. Use import syntax
or a bundler that supports ESM packages.
For zero-build <script>-tag use you can also load the IIFE bundle directly
from dist/aihumanity-voice.iife.js (see the demo).
Getting your credentials
Before writing any code you need a developer account. The whole process takes about two minutes and is self-service.
1 — Sign up at the developer portal
Go to portal.eimi.ai and create an account. Once verified you land on your dashboard.
2 — Note your Key ID
In the API Keys tab you'll see two values:
| Field | What it is | Where you use it |
| --- | --- | --- |
| SDK Key ID | Identifies your developer account | publicKey option or as the key ID in HMAC signing |
| SDK Key Secret | Signs server-to-server requests | Never put this in browser code |
The Key ID is the same value regardless of which auth mode you choose.
3 — Choose your integration path
No backend (simplest)
Use your Key ID directly as publicKey. You also need to tell the server
which origins are allowed to use it — otherwise every request is rejected.
In the portal under API Keys → Allowed Origins, add the exact origin(s) your site runs on:
https://myapp.com
https://staging.myapp.com
http://localhost:5173 ← add this while developing locallyAn origin is scheme + host + port — no path, no trailing slash.
Then in your code:
import { VoiceCall } from "@aihumanity/voice-sdk";
const call = new VoiceCall({
apiUrl: "https://api.eimi.ai",
publicKey: "YOUR_KEY_ID", // from the portal — safe to commit
agentName: "YourAgent",
username: "visitor",
});With a backend (more control)
Keep your Key ID and Key Secret on your server and build a small proxy endpoint
that HMAC-signs the join request. The browser calls your endpoint via
fetchJoinUrl and never touches the eimi API directly:
// In your frontend:
const call = new VoiceCall({
fetchJoinUrl: async () => {
const res = await fetch("/api/create-voice-call", { method: "POST" });
if (!res.ok) throw new Error("Could not start call");
return res.json(); // { joinUrl, callId, sessionToken }
},
agentName: "YourAgent",
});// On your server (/api/create-voice-call):
// Sign the request with your Key ID + Key Secret using HMAC-SHA256.
// See the Authentication section below for the exact signing scheme.You don't need to register any Allowed Origins when using the server-side path, because the HMAC signature — not the browser Origin — is what authenticates the request.
Authentication — choosing the right method
The SDK supports three auth patterns. Pick the one that matches your deployment.
Option A — fetchJoinUrl (full control)
Supply your own async function that returns { joinUrl, callId?, sessionToken? }.
Use this when your backend already has an endpoint that creates the Ultravox call
session and you want the SDK to stay out of the request entirely.
import { VoiceCall } from "@aihumanity/voice-sdk";
const call = new VoiceCall({
fetchJoinUrl: async () => {
const res = await fetch("/api/create-call", { method: "POST" });
if (!res.ok) throw new Error("Could not start call");
return res.json(); // { joinUrl, callId, sessionToken? }
},
agentName: "DavidChiu",
});This is the recommended approach for production web apps. Your server holds the credentials; the browser never sees them.
sessionToken— When your backend returns a short-lived, call-scoped JWT alongsidejoinUrl/callId, include it in the response object. The SDK forwards it topollEmotion(callId, sessionToken)so emotion polling can authenticate without a long-lived secret in the browser.
Option B — publicKey (browser-direct, no backend)
Use your Key ID from the developer portal directly in browser code. The
server validates requests using the browser's Origin header against your
registered Allowed Origins list — see Getting your credentials
for the signup and origin registration steps.
const call = new VoiceCall({
apiUrl: "https://api.eimi.ai",
publicKey: "YOUR_KEY_ID", // Key ID from developer portal — safe to commit
agentName: "YourAgent",
username: "visitor",
});The SDK sends X-Public-Key: <publicKey> and POSTs to
${apiUrl}/v1/voice/joinurl. Override the path with joinUrlPath if needed.
Requests from origins not in your Allowed Origins list are rejected with 403. Add
http://localhost:PORTwhile developing locally.
Option C — fetchJoinUrl with HMAC backend proxy
Keep your Key ID and Key Secret on your server. Your backend endpoint signs the
join request; the browser calls your endpoint via fetchJoinUrl.
// Frontend — no credentials in the browser at all:
const call = new VoiceCall({
fetchJoinUrl: async () => {
const res = await fetch("/api/create-voice-call", { method: "POST" });
if (!res.ok) throw new Error("Could not start call");
return res.json(); // { joinUrl, callId, sessionToken }
},
agentName: "YourAgent",
});Your server endpoint signs requests to POST /v1/voice/joinurl using
HMAC-SHA256:
// Server-side signing (Node example):
const crypto = require("crypto");
const timestamp = Date.now().toString();
const method = "POST";
const path = "/v1/voice/joinurl";
const canonical = `${timestamp}\n${method}\n${path}`;
const signature = crypto
.createHmac("sha256", YOUR_KEY_SECRET)
.update(canonical)
.digest("base64");
const response = await fetch(`https://api.eimi.ai${path}`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-SDK-Key-Id": YOUR_KEY_ID,
"X-SDK-Timestamp": timestamp,
"X-SDK-Signature": signature,
},
body: JSON.stringify({ agentName: "YourAgent", username: req.user.id }),
});
return response.json(); // forward { joinUrl, callId, sessionToken } to the browserYOUR_KEY_ID and YOUR_KEY_SECRET come from the developer portal. The secret
never leaves your server.
The
authTokenoption (Bearer JWT) also maps to this server-side path but is intended for internal operator use. External developers should usefetchJoinUrlwith HMAC signing as shown above.
Quick start (vanilla TypeScript / JavaScript)
import { VoiceCall, CallStatus } from "@aihumanity/voice-sdk";
// Option A — recommended for production
const call = new VoiceCall({
fetchJoinUrl: async () => {
const res = await fetch("/.netlify/functions/create-call", { method: "POST" });
if (!res.ok) throw new Error("Could not create call session.");
return res.json(); // { joinUrl, callId, sessionToken }
},
// Poll server-side emotion every 15 s using the call-scoped session token.
pollEmotion: async (callId, sessionToken) => {
const params = new URLSearchParams({ callId });
if (sessionToken) params.set("sessionToken", sessionToken);
const res = await fetch(`/.netlify/functions/get-emotion?${params}`);
if (!res.ok) return null;
const data = await res.json();
return data?.emotion ?? null;
},
emotionPollIntervalMs: 15_000,
agentName: "DavidChiu",
});
call.on("status", (s) => console.log("call status:", s));
call.on("transcript", (t) => console.log(t.speaker, t.text));
call.on("emotion", (e) => console.log("emotion:", e.label));
call.on("error", (err) => console.error(err));
document.querySelector("#start")!.addEventListener("click", () => call.start());
document.querySelector("#stop")!.addEventListener("click", () => call.end());How the join URL is fetched
The SDK resolves credentials in this order:
fetchJoinUrl— calls your function; skips all built-in request logic.publicKey— POSTs to${apiUrl}/v1/voice/joinurlwithX-Public-Key.authToken— POSTs to${apiUrl}/ultravox/secure/joinurlwithAuthorization: Bearer.
The backend response must contain at least joinUrl. Optional fields:
{
"joinUrl": "https://...", // required
"callId": "uuid", // forwarded to pollEmotion
"sessionToken": "eyJ...", // short-lived JWT for emotion polling
"emotion": { "dataConnectionEnabled": true, ... }
}Override the default path for options B or C with joinUrlPath:
new VoiceCall({ publicKey: "pk_...", joinUrlPath: "/v1/voice/joinurl", ... })Session tokens and emotion polling
When the backend returns a sessionToken alongside the join URL, the SDK stores
it for the duration of the call. If you provide a pollEmotion callback, the SDK
passes both (callId, sessionToken) so your function can authenticate the polling
request without embedding a service credential in browser code:
pollEmotion: async (callId, sessionToken) => {
const headers: Record<string, string> = {};
if (sessionToken) headers["Authorization"] = `Bearer ${sessionToken}`;
const res = await fetch(`/api/calls/${callId}/emotion`, { headers });
if (!res.ok) return null;
const { emotion } = await res.json();
return emotion ?? null;
},React
import { useVoiceCall, CallStatus } from "@aihumanity/voice-sdk/react";
// Define stable callbacks outside the component so the hook doesn't re-run.
async function fetchJoinUrl() {
const res = await fetch("/api/create-call", { method: "POST" });
if (!res.ok) throw new Error("Could not start call");
return res.json(); // { joinUrl, callId, sessionToken }
}
async function pollEmotion(callId: string, sessionToken?: string) {
const params = new URLSearchParams({ callId });
if (sessionToken) params.set("sessionToken", sessionToken);
const res = await fetch(`/api/emotion?${params}`);
if (!res.ok) return null;
const data = await res.json();
return data?.emotion ?? null;
}
const VOICE_OPTS = { fetchJoinUrl, pollEmotion, emotionPollIntervalMs: 15_000 };
function TalkButton() {
const {
status, isLive, isBusy, transcripts, lastEmotion,
micMuted, error, start, end, toggleMicMute,
} = useVoiceCall(VOICE_OPTS);
return (
<div>
<button onClick={isLive || isBusy ? end : start}>
{isLive ? "End" : isBusy ? "Connecting…" : "Talk"}
</button>
<button onClick={toggleMicMute} disabled={!isLive}>
{micMuted ? "Unmute" : "Mute"}
</button>
{error && <p style={{ color: "tomato" }}>{error.message}</p>}
{lastEmotion && <p>Vocal emotion: {lastEmotion}</p>}
<ul>
{transcripts.map((t, i) => (
<li key={i}><b>{t.speaker}:</b> {t.text}</li>
))}
</ul>
</div>
);
}Status values map directly onto CallStatus:
| CallStatus | When you'll see it |
| ---------------- | --------------------------------------------------------------- |
| IDLE | Before start() and after the call has fully ended. |
| CONNECTING | Fetching the join URL or running WebRTC handshake. |
| CONNECTED | Call is live and the agent is waiting (no one is talking). |
| LISTENING | Mic is open and capturing user audio. |
| THINKING | Agent is reasoning about the user's last utterance. |
| SPEAKING | Agent is generating audio. |
| DISCONNECTING | end() was called; teardown in progress. |
| DISCONNECTED | Terminal state from ultravox-client; SDK normalises back to IDLE. |
Floating widget
Mount a self-contained mic button + call panel anywhere:
import { mountFloatingWidget } from "@aihumanity/voice-sdk/widget";
// Option A — server-side proxy (recommended)
mountFloatingWidget({
fetchJoinUrl: () =>
fetch("/api/create-call", { method: "POST" }).then((r) => r.json()),
agentName: "DavidChiu",
persona: {
name: "David Chiu",
title: "Founder & CEO · AIHumanity",
initials: "DC",
intro: "Have a real-time voice conversation with David — ask anything.",
},
});
// Option B — browser-direct with a public key
mountFloatingWidget({
apiUrl: "https://api.eimi.ai",
publicKey: "pk_live_abc123", // register your origin in the developer portal first
agentName: "DavidChiu",
persona: { name: "David Chiu", initials: "DC" },
});Or via plain <script> (IIFE build):
<script src="https://your.cdn/aihumanity-voice.iife.js"></script>
<script>
// Browser-direct with public key
AIHVoice.mountFloatingWidget({
apiUrl: "https://api.eimi.ai",
publicKey: "pk_live_abc123",
agentName: "DavidChiu",
persona: { name: "David Chiu", initials: "DC" },
});
</script>The widget renders inside a Shadow DOM, so its CSS won't fight your site's.
Events reference
| Event | Payload | Notes |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| status | CallStatus | Coarse semantic status. |
| raw_status | string | Underlying ultravox-client status string. |
| transcript | Transcript | Fired per added/updated entry. |
| transcripts | Transcript[] | Snapshot after each transcript update. |
| emotion | { label: string, raw: unknown } | Emitted when emotion regex matches a data message. |
| data_message | unknown | Every experimental_message payload. |
| mic_muted | boolean | |
| speaker_muted | boolean | |
| contact_saved | void | Heuristic on agent transcript. |
| warning | string | E.g. emotion bridge not configured. |
| error | Error | Fatal during start/operation. |
| ended | void | Fires once the underlying session disconnects. |
API surface
class VoiceCall {
constructor(options: VoiceCallOptions);
// Read-only state
readonly status: CallStatus;
readonly callId: string | null;
readonly transcripts: Transcript[];
readonly lastEmotion: string | null;
readonly contactSaved: boolean;
readonly isMicMuted: boolean;
readonly isSpeakerMuted: boolean;
readonly emotionMeta: ServerEmotionMeta | null;
readonly rawSession: UltravoxSession | null;
// Events
on<E>(event, listener): () => void; // returns unsubscribe
off<E>(event, listener): void;
once<E>(event, listener): () => void;
// Control
start(): Promise<void>;
end(): Promise<void>;
muteMic(): void; unmuteMic(): void; toggleMicMute(): boolean;
muteSpeaker(): void; unmuteSpeaker(): void; toggleSpeakerMute(): boolean;
sendText(text: string, deferResponse?: boolean): void;
sendData(obj: unknown): void;
dispose(): void;
}Building from source
npm install
npm run build # ESM + CJS + .d.ts (library mode)
npm run build:iife # bundled <script> tag build
npm run build:all
npm run typecheckThe examples/demo.html page loads dist/aihumanity-voice.iife.js, so run
npm run build:all once before opening it. The npm run demo script does
both for you.
Publishing
Before publishing, verify the package still builds and the tarball contents are what npm should receive:
npm whoami
npm pack --dry-runPublish the scoped package publicly:
npm publish --access publicIf npm returns E403 with Two-factor authentication or granular access token
with bypass 2fa enabled is required, the package metadata is usually not the
problem. Use one of these auth paths:
# Interactive publish with a current 2FA code.
npm publish --access public --otp 123456
# Token publish: configure a granular npm token with read/write package access
# for @aihumanity and "bypass 2FA" enabled.
npm config set //registry.npmjs.org/:_authToken npm_xxx
npm publish --access publicNewer npm versions protect token reads, so npm config get
//registry.npmjs.org/:_authToken may fail even when a token is configured. Use
npm config list --location=user to confirm the token entry exists without
printing the secret.
Roadmap
- Streaming partial-emotion confidences (instead of just last label).
- Pluggable transcript renderers (Markdown, ReactMarkdown).
- Server-side helper to mint short-lived per-user JWTs.
- Unit tests for emotion-pattern matching and status mapping.
License
MIT
