npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@apicity/elevenlabs

v0.1.0

Published

ElevenLabs provider for sound effect generation, text-to-speech, and audio APIs.

Downloads

309

Readme

@apicity/elevenlabs

npm zero dependencies TypeScript

ElevenLabs provider for sound effect generation, text-to-speech, and audio APIs.

Installation

npm install @apicity/elevenlabs
# or
pnpm add @apicity/elevenlabs

Quick Start

import { elevenlabs as createElevenlabs } from "@apicity/elevenlabs";

const elevenlabs = createElevenlabs({ apiKey: process.env.ELEVENLABS_API_KEY! });

Real-world example: generate a sound effect, then run it through Scribe v2

ElevenLabs' two flagship audio surfaces fit together cleanly: text-to- sound-effects spits out raw MP3 bytes, and Scribe v2 hands back a typed transcript with word-level timestamps plus tagged audio events. The round-trip below — generate a UI click, then transcribe a separate clip with tag_audio_events: true — mirrors what tests/integration/elevenlabs-sound-generation.test.ts and tests/integration/elevenlabs-speech-to-text.test.ts replay against tests/recordings/elevenlabs_2379486140/, so every payload, response field, and byte count below comes straight from the recorded HARs.

import { readFileSync, writeFileSync } from "node:fs";
import { elevenlabs as createElevenlabs } from "@apicity/elevenlabs";
import type { ElevenLabsTranscript } from "@apicity/elevenlabs";

const elevenlabs = createElevenlabs({ apiKey: process.env.ELEVENLABS_API_KEY! });

// 1. Generate a 0.5s UI click. soundGeneration returns the raw MP3 as
//    an ArrayBuffer — there's no JSON wrapper, the response body is
//    audio/mpeg straight off the wire. duration_seconds (0.5–30) caps
//    the clip length and prompt_influence (0–1) trades prompt-fidelity
//    for creative variation. The factory also accepts `output_format`
//    on the same request object and silently moves it to the URL query.
const audio = await elevenlabs.v1.soundGeneration({
  text: "soft ui click",
  duration_seconds: 0.5,
  prompt_influence: 0.3,
});

writeFileSync("./click.mp3", new Uint8Array(audio));
console.log(`Generated ${audio.byteLength} bytes of audio/mpeg`);
// → "Generated 11764 bytes of audio/mpeg"
//   ElevenLabs charged 10 characters for this call (visible in the
//   `character-cost` response header on the original request).

// 2. Transcribe a separate audio clip with Scribe v2. The request goes
//    up as multipart/form-data — pass a Blob and the rest as ergonomic
//    fields; the factory packs the form, sets xi-api-key, and parses
//    the JSON response. tag_audio_events: true tells Scribe to surface
//    non-speech events ([phone beeping], [laughter], [applause]) inline
//    with words instead of dropping them.
const phoneBeep = readFileSync("./phone-beeping.mp3"); // 2,528 bytes
const file = new Blob([phoneBeep], { type: "audio/mp3" });

const result = (await elevenlabs.v1.speechToText({
  file,
  model_id: "scribe_v2",
  language_code: "eng",
  tag_audio_events: true,
})) as ElevenLabsTranscript;

// 3. The transcript is rich. `text` is the human-readable form;
//    `words` is the per-token breakdown with absolute timestamps and a
//    `type` discriminator ("word" | "spacing" | "audio_event") plus a
//    `logprob` confidence. `transcription_id` is durable — you can
//    retrieve the same transcript later through the history API.
console.log(
  `${result.language_code} · ${(result.language_probability * 100).toFixed(0)}% confident`,
);
// → "eng · 100% confident"
console.log(
  `${result.audio_duration_secs}s · transcription_id=${result.transcription_id}`,
);
// → "0.5s · transcription_id=CeeidI2QJ8kkN1mcq8HX"
console.log(result.text);
// → "[phone beeping]"

// 4. Walk the words array, splitting audio events from spoken words
//    via the `type` discriminator. On a clip with no speech every
//    entry is an audio_event; on real speech you'll see "word" and
//    "spacing" entries interleaved with bracketed events.
for (const w of result.words) {
  const tag =
    w.type === "audio_event"
      ? "event"
      : w.type === "word"
        ? "word "
        : "space";
  console.log(
    `  ${tag}  [${w.start.toFixed(2)}–${w.end.toFixed(2)}s]  ${w.text}` +
      (w.logprob !== undefined ? ` (logprob ${w.logprob.toFixed(3)})` : ""),
  );
}
// → "  event  [0.00–0.44s]  [phone beeping] (logprob -0.335)"

Notes

  • soundGeneration returns binary, not JSON — the provider already reads it as arrayBuffer() and hands you an ArrayBuffer. Pass output_format: "mp3_44100_128" (or any other ElevenLabs codec string) on the request object and the factory will strip it from the body and move it to the ?output_format= URL query.
  • speechToText accepts either a file Blob or a cloud_storage_url (S3/GCS/HTTP). For long-form audio set webhook: true — the call returns a small ElevenLabsWebhookAcknowledgement instead of the transcript, and the finished result is delivered to your registered webhook. Type-narrow the union with "text" in result before reading transcript fields.
  • Set diarize: true and num_speakers to label words by speaker; the per-word speaker_id field gets populated in that mode. Combine with use_multi_channel: true for stereo audio and the response switches to ElevenLabsMultichannelTranscript (one transcript per channel under transcripts[]).
  • Errors throw ElevenLabsError with status, code, and the parsed body attached. ElevenLabs returns either FastAPI's { detail: [{msg, ...}] } shape or { detail: { status, message } }; the client normalises both into error.message.

API Reference

2 endpoints across 2 groups. Each method mirrors an upstream URL path.

soundGeneration

POST https://api.elevenlabs.io/v1/sound-generation

Upstream docs ↗

const res = await elevenlabs.v1.soundGeneration({ /* ... */ });

Source: packages/provider/elevenlabs/src/elevenlabs.ts

speechToText

POST https://api.elevenlabs.io/v1/speech-to-text

Upstream docs ↗

const res = await elevenlabs.v1.speechToText({ /* ... */ });

Source: packages/provider/elevenlabs/src/elevenlabs.ts

Middleware

import { elevenlabs as createElevenlabs, withRetry } from "@apicity/elevenlabs";

const elevenlabs = createElevenlabs({ apiKey: process.env.ELEVENLABS_API_KEY! });
const models = withRetry(elevenlabs.get.v1.models, { retries: 3 });

Part of the apicity monorepo.

License

MIT — see LICENSE.