@aihumanity/voice-sdk

v0.1.1

Published

a month ago

JavaScript SDK for AIHumanity / eimi voice AI calls — wraps Ultravox with call-state tracking, transcripts, and emotion detection.

0High
0Medium
0Low

fdchiu

ultravox voice ai sdk webrtc transcript emotion aihumanity eimi

@aihumanity/voice-sdk

A small, batteries-included JavaScript SDK for embedding AIHumanity / eimi voice AI calls on any web page.

It wraps ultravox-client and adds the things you almost always end up writing yourself:

One-call setup: SDK fetches the joinUrl from your eimi backend and joins the call for you.
A semantic call-state machine: idle → connecting → connected → listening / speaking / thinking → disconnecting → idle.
Live transcripts, with transcript / transcripts events and a snapshot getter.
Vocal-emotion extraction from [EMOTION_CONTEXT] data messages produced by the eimi emotion bridge (configurable regex).
Mic / speaker mute helpers.
A pre-built React hook (@aihumanity/voice-sdk/react).
A pre-built floating-button widget (@aihumanity/voice-sdk/widget) — drop a single <script> on any site.

Installation

npm install @aihumanity/voice-sdk ultravox-client
# or
pnpm add @aihumanity/voice-sdk ultravox-client

ultravox-client is a hard runtime dependency; it ships separately so multiple SDKs / apps can dedupe it. React is an optional peer dependency — only needed if you import the React adapter.

The SDK is ESM-only because ultravox-client is ESM-only. Use import syntax or a bundler that supports ESM packages.

For zero-build <script>-tag use you can also load the IIFE bundle directly from dist/aihumanity-voice.iife.js (see the demo).

Getting your credentials

Before writing any code you need a developer account. The whole process takes about two minutes and is self-service.

1 — Sign up at the developer portal

Go to portal.eimi.ai and create an account. Once verified you land on your dashboard.

2 — Note your Key ID

In the API Keys tab you'll see two values:

| Field | What it is | Where you use it | | --- | --- | --- | | SDK Key ID | Identifies your developer account | publicKey option or as the key ID in HMAC signing | | SDK Key Secret | Signs server-to-server requests | Never put this in browser code |

The Key ID is the same value regardless of which auth mode you choose.

3 — Choose your integration path

No backend (simplest)

Use your Key ID directly as publicKey. You also need to tell the server which origins are allowed to use it — otherwise every request is rejected.

In the portal under API Keys → Allowed Origins, add the exact origin(s) your site runs on:

https://myapp.com
https://staging.myapp.com
http://localhost:5173     ← add this while developing locally

An origin is scheme + host + port — no path, no trailing slash.

Then in your code:

import { VoiceCall } from "@aihumanity/voice-sdk";

const call = new VoiceCall({
  apiUrl:    "https://api.eimi.ai",
  publicKey: "YOUR_KEY_ID",   // from the portal — safe to commit
  agentName: "YourAgent",
  username:  "visitor",
});

With a backend (more control)

Keep your Key ID and Key Secret on your server and build a small proxy endpoint that HMAC-signs the join request. The browser calls your endpoint via fetchJoinUrl and never touches the eimi API directly:

// In your frontend:
const call = new VoiceCall({
  fetchJoinUrl: async () => {
    const res = await fetch("/api/create-voice-call", { method: "POST" });
    if (!res.ok) throw new Error("Could not start call");
    return res.json(); // { joinUrl, callId, sessionToken }
  },
  agentName: "YourAgent",
});

// On your server (/api/create-voice-call):
// Sign the request with your Key ID + Key Secret using HMAC-SHA256.
// See the Authentication section below for the exact signing scheme.

You don't need to register any Allowed Origins when using the server-side path, because the HMAC signature — not the browser Origin — is what authenticates the request.

Authentication — choosing the right method

The SDK supports three auth patterns. Pick the one that matches your deployment.

Option A — `fetchJoinUrl` (full control)

Supply your own async function that returns { joinUrl, callId?, sessionToken? }. Use this when your backend already has an endpoint that creates the Ultravox call session and you want the SDK to stay out of the request entirely.

import { VoiceCall } from "@aihumanity/voice-sdk";

const call = new VoiceCall({
  fetchJoinUrl: async () => {
    const res = await fetch("/api/create-call", { method: "POST" });
    if (!res.ok) throw new Error("Could not start call");
    return res.json(); // { joinUrl, callId, sessionToken? }
  },
  agentName: "DavidChiu",
});

This is the recommended approach for production web apps. Your server holds the credentials; the browser never sees them.

sessionToken — When your backend returns a short-lived, call-scoped JWT alongside joinUrl / callId, include it in the response object. The SDK forwards it to pollEmotion(callId, sessionToken) so emotion polling can authenticate without a long-lived secret in the browser.

Option B — `publicKey` (browser-direct, no backend)

Use your Key ID from the developer portal directly in browser code. The server validates requests using the browser's Origin header against your registered Allowed Origins list — see Getting your credentials for the signup and origin registration steps.

const call = new VoiceCall({
  apiUrl:    "https://api.eimi.ai",
  publicKey: "YOUR_KEY_ID",   // Key ID from developer portal — safe to commit
  agentName: "YourAgent",
  username:  "visitor",
});

The SDK sends X-Public-Key: <publicKey> and POSTs to ${apiUrl}/v1/voice/joinurl. Override the path with joinUrlPath if needed.

Requests from origins not in your Allowed Origins list are rejected with 403. Add http://localhost:PORT while developing locally.

Option C — `fetchJoinUrl` with HMAC backend proxy

Keep your Key ID and Key Secret on your server. Your backend endpoint signs the join request; the browser calls your endpoint via fetchJoinUrl.

// Frontend — no credentials in the browser at all:
const call = new VoiceCall({
  fetchJoinUrl: async () => {
    const res = await fetch("/api/create-voice-call", { method: "POST" });
    if (!res.ok) throw new Error("Could not start call");
    return res.json(); // { joinUrl, callId, sessionToken }
  },
  agentName: "YourAgent",
});

Your server endpoint signs requests to POST /v1/voice/joinurl using HMAC-SHA256:

// Server-side signing (Node example):
const crypto    = require("crypto");
const timestamp = Date.now().toString();
const method    = "POST";
const path      = "/v1/voice/joinurl";
const canonical = `${timestamp}\n${method}\n${path}`;
const signature = crypto
  .createHmac("sha256", YOUR_KEY_SECRET)
  .update(canonical)
  .digest("base64");

const response = await fetch(`https://api.eimi.ai${path}`, {
  method: "POST",
  headers: {
    "Content-Type":    "application/json",
    "X-SDK-Key-Id":    YOUR_KEY_ID,
    "X-SDK-Timestamp": timestamp,
    "X-SDK-Signature": signature,
  },
  body: JSON.stringify({ agentName: "YourAgent", username: req.user.id }),
});
return response.json(); // forward { joinUrl, callId, sessionToken } to the browser

YOUR_KEY_ID and YOUR_KEY_SECRET come from the developer portal. The secret never leaves your server.

The authToken option (Bearer JWT) also maps to this server-side path but is intended for internal operator use. External developers should use fetchJoinUrl with HMAC signing as shown above.

Quick start (vanilla TypeScript / JavaScript)

import { VoiceCall, CallStatus } from "@aihumanity/voice-sdk";

// Option A — recommended for production
const call = new VoiceCall({
  fetchJoinUrl: async () => {
    const res = await fetch("/.netlify/functions/create-call", { method: "POST" });
    if (!res.ok) throw new Error("Could not create call session.");
    return res.json(); // { joinUrl, callId, sessionToken }
  },
  // Poll server-side emotion every 15 s using the call-scoped session token.
  pollEmotion: async (callId, sessionToken) => {
    const params = new URLSearchParams({ callId });
    if (sessionToken) params.set("sessionToken", sessionToken);
    const res = await fetch(`/.netlify/functions/get-emotion?${params}`);
    if (!res.ok) return null;
    const data = await res.json();
    return data?.emotion ?? null;
  },
  emotionPollIntervalMs: 15_000,
  agentName: "DavidChiu",
});

call.on("status",     (s) => console.log("call status:", s));
call.on("transcript", (t) => console.log(t.speaker, t.text));
call.on("emotion",    (e) => console.log("emotion:", e.label));
call.on("error",      (err) => console.error(err));

document.querySelector("#start")!.addEventListener("click", () => call.start());
document.querySelector("#stop")!.addEventListener("click",  () => call.end());

How the join URL is fetched

The SDK resolves credentials in this order:

fetchJoinUrl — calls your function; skips all built-in request logic.
publicKey — POSTs to ${apiUrl}/v1/voice/joinurl with X-Public-Key.
authToken — POSTs to ${apiUrl}/ultravox/secure/joinurl with Authorization: Bearer.

The backend response must contain at least joinUrl. Optional fields:

{
  "joinUrl":      "https://...",          // required
  "callId":       "uuid",                 // forwarded to pollEmotion
  "sessionToken": "eyJ...",              // short-lived JWT for emotion polling
  "emotion":      { "dataConnectionEnabled": true, ... }
}

Override the default path for options B or C with joinUrlPath:

new VoiceCall({ publicKey: "pk_...", joinUrlPath: "/v1/voice/joinurl", ... })

Session tokens and emotion polling

When the backend returns a sessionToken alongside the join URL, the SDK stores it for the duration of the call. If you provide a pollEmotion callback, the SDK passes both (callId, sessionToken) so your function can authenticate the polling request without embedding a service credential in browser code:

pollEmotion: async (callId, sessionToken) => {
  const headers: Record<string, string> = {};
  if (sessionToken) headers["Authorization"] = `Bearer ${sessionToken}`;
  const res = await fetch(`/api/calls/${callId}/emotion`, { headers });
  if (!res.ok) return null;
  const { emotion } = await res.json();
  return emotion ?? null;
},

React

import { useVoiceCall, CallStatus } from "@aihumanity/voice-sdk/react";

// Define stable callbacks outside the component so the hook doesn't re-run.
async function fetchJoinUrl() {
  const res = await fetch("/api/create-call", { method: "POST" });
  if (!res.ok) throw new Error("Could not start call");
  return res.json(); // { joinUrl, callId, sessionToken }
}

async function pollEmotion(callId: string, sessionToken?: string) {
  const params = new URLSearchParams({ callId });
  if (sessionToken) params.set("sessionToken", sessionToken);
  const res = await fetch(`/api/emotion?${params}`);
  if (!res.ok) return null;
  const data = await res.json();
  return data?.emotion ?? null;
}

const VOICE_OPTS = { fetchJoinUrl, pollEmotion, emotionPollIntervalMs: 15_000 };

function TalkButton() {
  const {
    status, isLive, isBusy, transcripts, lastEmotion,
    micMuted, error, start, end, toggleMicMute,
  } = useVoiceCall(VOICE_OPTS);

  return (
    <div>
      <button onClick={isLive || isBusy ? end : start}>
        {isLive ? "End" : isBusy ? "Connecting…" : "Talk"}
      </button>
      <button onClick={toggleMicMute} disabled={!isLive}>
        {micMuted ? "Unmute" : "Mute"}
      </button>
      {error && <p style={{ color: "tomato" }}>{error.message}</p>}
      {lastEmotion && <p>Vocal emotion: {lastEmotion}</p>}
      <ul>
        {transcripts.map((t, i) => (
          <li key={i}><b>{t.speaker}:</b> {t.text}</li>
        ))}
      </ul>
    </div>
  );
}

Status values map directly onto CallStatus:

| CallStatus | When you'll see it | | ---------------- | --------------------------------------------------------------- | | IDLE | Before start() and after the call has fully ended. | | CONNECTING | Fetching the join URL or running WebRTC handshake. | | CONNECTED | Call is live and the agent is waiting (no one is talking). | | LISTENING | Mic is open and capturing user audio. | | THINKING | Agent is reasoning about the user's last utterance. | | SPEAKING | Agent is generating audio. | | DISCONNECTING | end() was called; teardown in progress. | | DISCONNECTED | Terminal state from ultravox-client; SDK normalises back to IDLE. |

Floating widget

Mount a self-contained mic button + call panel anywhere:

import { mountFloatingWidget } from "@aihumanity/voice-sdk/widget";

// Option A — server-side proxy (recommended)
mountFloatingWidget({
  fetchJoinUrl: () =>
    fetch("/api/create-call", { method: "POST" }).then((r) => r.json()),
  agentName: "DavidChiu",
  persona: {
    name: "David Chiu",
    title: "Founder & CEO · AIHumanity",
    initials: "DC",
    intro: "Have a real-time voice conversation with David — ask anything.",
  },
});

// Option B — browser-direct with a public key
mountFloatingWidget({
  apiUrl:    "https://api.eimi.ai",
  publicKey: "pk_live_abc123",  // register your origin in the developer portal first
  agentName: "DavidChiu",
  persona:   { name: "David Chiu", initials: "DC" },
});

Or via plain <script> (IIFE build):

<script src="https://your.cdn/aihumanity-voice.iife.js"></script>
<script>
  // Browser-direct with public key
  AIHVoice.mountFloatingWidget({
    apiUrl:    "https://api.eimi.ai",
    publicKey: "pk_live_abc123",
    agentName: "DavidChiu",
    persona:   { name: "David Chiu", initials: "DC" },
  });
</script>

The widget renders inside a Shadow DOM, so its CSS won't fight your site's.

Events reference

| Event | Payload | Notes | | --------------- | ---------------------------------------- | ------------------------------------------- | | status | CallStatus | Coarse semantic status. | | raw_status | string | Underlying ultravox-client status string. | | transcript | Transcript | Fired per added/updated entry. | | transcripts | Transcript[] | Snapshot after each transcript update. | | emotion | { label: string, raw: unknown } | Emitted when emotion regex matches a data message. | | data_message | unknown | Every experimental_message payload. | | mic_muted | boolean | | | speaker_muted | boolean | | | contact_saved | void | Heuristic on agent transcript. | | warning | string | E.g. emotion bridge not configured. | | error | Error | Fatal during start/operation. | | ended | void | Fires once the underlying session disconnects. |

API surface

class VoiceCall {
  constructor(options: VoiceCallOptions);

  // Read-only state
  readonly status: CallStatus;
  readonly callId: string | null;
  readonly transcripts: Transcript[];
  readonly lastEmotion: string | null;
  readonly contactSaved: boolean;
  readonly isMicMuted: boolean;
  readonly isSpeakerMuted: boolean;
  readonly emotionMeta: ServerEmotionMeta | null;
  readonly rawSession: UltravoxSession | null;

  // Events
  on<E>(event, listener): () => void;     // returns unsubscribe
  off<E>(event, listener): void;
  once<E>(event, listener): () => void;

  // Control
  start(): Promise<void>;
  end(): Promise<void>;
  muteMic(): void;          unmuteMic(): void;          toggleMicMute(): boolean;
  muteSpeaker(): void;      unmuteSpeaker(): void;      toggleSpeakerMute(): boolean;
  sendText(text: string, deferResponse?: boolean): void;
  sendData(obj: unknown): void;
  dispose(): void;
}

Building from source

npm install
npm run build         # ESM + CJS + .d.ts (library mode)
npm run build:iife    # bundled <script> tag build
npm run build:all
npm run typecheck

The examples/demo.html page loads dist/aihumanity-voice.iife.js, so run npm run build:all once before opening it. The npm run demo script does both for you.

Publishing

Before publishing, verify the package still builds and the tarball contents are what npm should receive:

npm whoami
npm pack --dry-run

Publish the scoped package publicly:

npm publish --access public

If npm returns E403 with Two-factor authentication or granular access token with bypass 2fa enabled is required, the package metadata is usually not the problem. Use one of these auth paths:

# Interactive publish with a current 2FA code.
npm publish --access public --otp 123456

# Token publish: configure a granular npm token with read/write package access
# for @aihumanity and "bypass 2FA" enabled.
npm config set //registry.npmjs.org/:_authToken npm_xxx
npm publish --access public

Newer npm versions protect token reads, so npm config get //registry.npmjs.org/:_authToken may fail even when a token is configured. Use npm config list --location=user to confirm the token entry exists without printing the secret.

Roadmap

Streaming partial-emotion confidences (instead of just last label).
Pluggable transcript renderers (Markdown, ReactMarkdown).
Server-side helper to mint short-lived per-user JWTs.
Unit tests for emotion-pattern matching and status mapping.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@aihumanity/voice-sdk

Installation

Getting your credentials

1 — Sign up at the developer portal

2 — Note your Key ID

3 — Choose your integration path

Authentication — choosing the right method

Option A — fetchJoinUrl (full control)

Option B — publicKey (browser-direct, no backend)

Option C — fetchJoinUrl with HMAC backend proxy

Quick start (vanilla TypeScript / JavaScript)

How the join URL is fetched

Session tokens and emotion polling

React

Floating widget

Events reference

API surface

Building from source

Publishing

Roadmap

License

Option A — `fetchJoinUrl` (full control)

Option B — `publicKey` (browser-direct, no backend)

Option C — `fetchJoinUrl` with HMAC backend proxy