npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pyai/sdk

v0.2.1

Published

Official TypeScript/JavaScript SDK for PyAI — speech-to-text (Hear), text-to-speech (Speak), realtime voice agents (Omni), and call compliance (Trace).

Readme

@pyai/sdk

Official TypeScript/JavaScript SDK for PyAI — the all-in-one voice AI platform: lightning-fast speech-to-text, ultra-realistic text-to-speech, end-to-end realtime voice agents, and automatic call compliance. Zero dependencies; runs in the browser and Node 18+.

PyAI products

  • Hear — Lightning-fast, telephony-native speech-to-text. Whisper-compatible transcription tuned for real phone-call audio, with live streaming partials so your app reacts mid-sentence, plus async batch transcription for big archives. POST /v1/audio/transcriptions
  • Speak — Ultra-realistic text-to-speech that starts speaking in tens of milliseconds. Stream lifelike, expressive voices, choose from 36 studio-quality presets, or clone any voice instantly — for free. POST /v1/audio/speech
  • Omni (flagship) — One API for a complete, end-to-end voice AI agent. A single WebSocket where your agent listens, thinks, and speaks — grounded in your knowledge bases and tools, with human-like turn-taking and instant barge-in — no STT, LLM, or TTS to stitch together yourself. wss://api.pyai.com/v1/omni
  • Trace (flagship) — The compliance API that keeps your AI agents safe. Trace automatically checks every call for HIPAA, TCPA, and PII risks (plus your own brand-voice rules), flags the exact rule broken, redacts sensitive data, and seals each call with a tamper-evident audit trail — so a risky conversation never slips through. GET /v1/trace/interactions
  • Cue — Realtime turn detection + knowledge-grounded context for your own stack. Bring your own LLM and voice; Cue nails the hard part — knowing the instant a speaker finishes and surfacing the right context. wss://api.pyai.com/v1/audio/transcriptions/stream
  • Telephony — Instant managed phone numbers for your voice agents. Provision a US number and route live calls straight into an Omni agent — no carrier contracts, no telephony glue. POST /v1/telephony/numbers

The contract is https://api.pyai.com/openapi.json. This SDK wraps it ergonomically with typed errors, automatic retries, and a realtime helper.

Install

npm install @pyai/sdk

Quickstart

import PyAI from "@pyai/sdk";

const pyai = new PyAI({ apiKey: process.env.PYAI_API_KEY! });

// Text-to-speech
const audio = await pyai.audio.speech({ input: "Hello from PyAI.", voice: "stock_emma_en_gb" });
await Bun.write?.("hello.wav", audio); // or fs.writeFile in Node

// Text-to-speech, streamed — start playing/forwarding at the first chunk
// (tens of ms) instead of waiting for the whole clip. Use mp3 for smooth
// progressive playback.
const stream = await pyai.audio.speechStream({ input: "Hello from PyAI.", voice: "stock_emma_en_gb", response_format: "mp3" });
for await (const chunk of stream) writeToSpeakerOrResponse(chunk);

// Voices
const { data: voices } = await pyai.voices.list({ gender: "female" });

// Async transcription (safe retry with an idempotency key)
const job = await pyai.transcriptionJobs.create(
  { audio_url: "https://example.com/call.wav", diarize: true },
  { idempotencyKey: crypto.randomUUID() },
);
const done = await pyai.transcriptionJobs.get(job.job_id);

Realtime (Omni)

omni.connect() opens an agentic-voice session and hides the wire protocol — including its frame-key asymmetry (your control frames are keyed on type, the server's frames are keyed on event). It sends a type-keyed configure the instant the socket opens and routes server frames to typed callbacks, so you can't trip the #1 Omni integration bug (a hand-rolled {"event":"configure"} is acked but silently dropped, giving you a connected session with zero turns):

// Omni is zero-state: the key's org authorizes the session — nothing to create.
const omni = pyai.omni.connect({
  rate: 16000, // 24000 browser · 16000 wideband telephony · 8000 G.711/Twilio
  configure: { voice_id: "stock_emma_en_gb", persona: "You are a receptionist." },
  onAudio: (chunk) => speaker.write(chunk), // binary agent audio — play it out
  onTranscript: (f) => console.log(f.text),
  onError: (e) => console.error(e),
});

omni.sendAudio(pcm16Chunk); // stream caller audio continuously (server-side VAD)
omni.sendDtmf("5");
omni.close();

From the browser, mint an ephemeral token server-side with pyai.omni.createSession({ allowedOrigins }) and pass it as token so the page never holds a secret key:

const omni = pyai.omni.connect({ token: session.token, configure: { voice_id, persona } });

Omni uses the native wss://api.pyai.com/v1/omni surface and is zero-state — no agent to create, sessionLabel is an optional opaque tag (never required). Need the raw socket? pyai.realtimeURL({ product: "omni" }) + pyai.realtimeSubprotocol() (or pyai.connectRealtime()) still work; product: "flow" uses /v1/realtime. The older /v2/omni/chat URL and the agentId option are deprecated but still work.

Streaming speech-to-text (Hear / Cue)

transcriptions.stream() hides the WebSocket frame protocol behind callbacks. It opens wss://api.pyai.com/v1/audio/transcriptions/stream (key carried as the WS subprotocol, so it works in the browser), routes the wire frames to onPartial/onFinal/onError, and gives you sendAudio, commit(), and close():

const hear = pyai.audio.transcriptions.stream({
  sampleRate: 16000,
  onPartial: (f) => console.log("…", f.text),
  onFinal: (f) => console.log("✓", f.text, `(${f.audio_ms}ms)`),
  onError: (e) => console.error(e),
});

micChunks.on("data", (pcm16) => hear.sendAudio(pcm16)); // binary frames
vad.on("end", () => hear.commit());                     // force-finalize an utterance
// hear.close() also flushes a final for any buffered audio

Frame types, WS close codes, and error codes are exported as named constants so you never hardcode a magic string:

import { HearFrameType, WSCloseCode, ErrorCode } from "@pyai/sdk";
HearFrameType.SpeechFinal; // "speech_final"
WSCloseCode.OverCapacity;  // 4429
ErrorCode.CreditExhausted; // "credit_exhausted"

Set grounding: true to turn the stream into Cue (turn detection + KB context): the SDK sends the grounding config on open and final/speech_final frames then carry a grounding array of top KB passages.

In Node, pass a WebSocket implementation if there's no global one: transcriptions.stream({ webSocket: (await import("ws")).WebSocket }).

Speak audio formats (incl. telephony G.711)

audio.speech encodes server-side into any of eight formats via response_format — the audio comes back already in the shape you need, so telephony callers can drop the hand-rolled resampler + μ-law encoder entirely:

// Twilio/SIP-ready in one param: raw 8 kHz mono μ-law, no client-side DSP.
const ulaw = await pyai.audio.speech({
  input: "Your appointment is confirmed.",
  voice: "stock_emma_en_gb",
  response_format: "g711_ulaw", // -> audio/basic, forced 8 kHz
});
// base64-encode `ulaw` straight into a Twilio media frame.

| response_format | sample rates (Hz) | Content-Type | |---|---|---| | mp3 (default) | 8000 / 16000 / 24000 / 48000 | audio/mpeg | | wav | 8000 / 16000 / 24000 / 48000 | audio/wav | | opus | 8000 / 16000 / 24000 / 48000 | audio/ogg | | aac | 8000 / 16000 / 24000 / 48000 | audio/aac | | flac | 8000 / 16000 / 24000 / 48000 | audio/flac | | pcm (raw int16 LE, no header) | 8000 / 16000 / 24000 / 48000 | audio/pcm | | g711_ulaw | 8000 (forced) | audio/basic | | g711_alaw | 8000 (forced) | audio/basic |

sample_rate is optional — omit it for the engine's native 24 kHz (g711_* is always 8 kHz). The set is typed (SpeechFormat) and exported as SPEECH_FORMATS / SPEECH_SAMPLE_RATES for dropdowns and validation. Any other value is a 400 unsupported_format; omit response_format for the default mp3.

See examples/speak-telephony-formats for the full before/after: ~120 lines of resampler + μ-law replaced by one param, with Node (@pyai/twilio), Python, and raw-curl snippets.

More APIs: clones, telephony, trace

// Voice clones (Speak)
const { data: clones } = await pyai.clones.list();
const clone = await pyai.clones.create({ name: "Brand VO", file: refAudioBlob });
await pyai.clones.delete(clone.id);

// Managed phone numbers (Telephony)
const { data: avail } = await pyai.telephony.numbers.available({ areaCode: "415" });
const num = await pyai.telephony.numbers.buy({ phone_number: avail[0]!.phone_number, agent_id: "agent_123" });
await pyai.telephony.numbers.assign(num.id, "agent_123");
await pyai.telephony.numbers.release(num.id);

// Compliance (Trace)
const { data: calls } = await pyai.trace.interactions.list({ verdict: "FAIL" });
const detail = await pyai.trace.interactions.get(calls[0]!.id);
await pyai.trace.config.set({ agent_id: "agent_123", enabled: true });
const exposure = await pyai.trace.exposure(30);

// Per-call eval scorecard (timeline + quality metrics). These are additive and
// forward-compatible — present once the engine emits them, so reading them is
// always safe (the timeline reader returns [] until then).
const timeline = await pyai.trace.callTimeline(detail.id); // TraceTimelineTurn[]
const quality = detail.quality_metrics;                    // { wer?, ttfb_ms?, turn_p95_ms?, vaqi?, … }

Reproducible runs (evals)

audio.speech and audio.transcriptions.create take optional seed and temperature for deterministic eval runs. They're forward-compatible — honored once the engine supports them and otherwise ignored — so it's always safe to send:

await pyai.audio.speech({ input: "Hello", voice: "stock_emma_en_gb", seed: 42, temperature: 0 });
await pyai.audio.transcriptions.create({ file: wavBlob, seed: 42 });

Errors

Failures throw PyAIError with a stable code (branch on it, not the message):

import { PyAIError } from "@pyai/sdk";

try {
  await pyai.audio.speech({ input: "hi" });
} catch (err) {
  if (err instanceof PyAIError && err.code === "credit_exhausted") {
    // out of prepaid credit — add credit or use a sandbox key
  }
}

Common codes: unauthorized, forbidden, credit_exhausted, rate_limit_exceeded, concurrency_limit_exceeded, idempotency_conflict. 429/5xx are retried automatically (honoring Retry-After); tune with new PyAI({ apiKey, maxRetries }).

CLI (pyai)

Installing the package also provides a pyai command. pyai doctor runs a deeper diagnosis — it introspects your key/scopes via GET /v1/me (skipped gracefully if the route isn't deployed yet), checks endpoint liveness, runs a Speak→Hear round-trip, and prints remediation hints for any failure:

export PYAI_API_KEY=pyai_test_...
npx pyai doctor
# PASS  key (/v1/me)  — env=test; 3 scope(s): hear:transcribe, voice:synthesize, hear:stream
# PASS  models.list  — 12 models
# PASS  voices.list  — 38 voices
# PASS  speak→hear round-trip  — synth 45210 bytes → "the quick brown fox…"
# Diagnosis: healthy. Key, endpoint, and a Speak→Hear round-trip all work.

npx pyai smoke   # lighter: models + voices + speak

Other commands:

pyai models
pyai voices --gender female --region en_us
pyai speak --text "Hello" --voice stock_emma_en_gb --out hello.wav
pyai transcribe --url https://example.com/call.wav --diarize --poll

Auth comes from PYAI_API_KEY / PYAI_BASE_URL (or --api-key / --base-url).

Develop

npm install
npm test         # node --test, fetch injected (no network)
npm run build    # emits dist/ (incl. the pyai CLI bin)