convex-elevenlabs

v0.1.0

Published

3 months ago

A elevenlabs component for Convex.

0High
0Medium
0Low

wantpinow

convex component

Convex ElevenLabs

A Convex component for async speech-to-text transcription using ElevenLabs. Transcriptions are processed asynchronously via webhooks, so your functions return immediately while ElevenLabs processes the audio in the background.

Found a bug? Feature request? File it here.

Installation

npm install convex-elevenlabs

Add the component to your Convex app:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import elevenlabs from "convex-elevenlabs/convex.config";

const app = defineApp();
app.use(elevenlabs);

export default app;

ElevenLabs Setup

Get your API key from the ElevenLabs dashboard
Create a webhook in the ElevenLabs webhooks settings:
- Set the URL to https://<your-convex-deployment>.convex.site/elevenlabs/webhook
- Enable the "Transcription completed" event
- Copy the Webhook ID and Webhook Secret
Set environment variables in your Convex dashboard:

ELEVENLABS_API_KEY=your_api_key
ELEVENLABS_WEBHOOK_ID=your_webhook_id
ELEVENLABS_WEBHOOK_SECRET=your_webhook_secret

Usage

Initialize the Client

// convex/elevenlabs.ts
import { ElevenLabs } from "convex-elevenlabs";
import { components } from "./_generated/api";

const elevenlabs = new ElevenLabs(components.elevenlabs);

Start a Transcription

Call startTranscription from an action with either a URL or file bytes. It returns immediately with a jobId while ElevenLabs processes the audio asynchronously.

From URL:

import { action } from "./_generated/server";
import { v } from "convex/values";

export const transcribeFromUrl = action({
  args: { url: v.string() },
  handler: async (ctx, args) => {
    return await elevenlabs.startTranscription(ctx, {
      url: args.url,
      modelId: "scribe_v1",
      options: {},
    });
  },
});

From file:

export const transcribeFromFile = action({
  args: {
    file: v.bytes(),
    fileName: v.string(),
  },
  handler: async (ctx, args) => {
    return await elevenlabs.startTranscription(ctx, {
      file: args.file,
      fileName: args.fileName,
      modelId: "scribe_v1",
      options: {},
    });
  },
});

When using file upload, the file is sent directly to ElevenLabs and is not stored in your Convex database. Supported formats include MP3, WAV, M4A, MP4, and other major audio/video formats. Maximum file size is 3GB.

Handle Webhook Callbacks

// convex/http.ts
import { httpRouter } from "convex/server";
import { ElevenLabs } from "convex-elevenlabs";
import { components } from "./_generated/api";

const http = httpRouter();
const elevenlabs = new ElevenLabs(components.elevenlabs);

elevenlabs.registerWebhook(http, {
  path: "/elevenlabs/webhook",
  onComplete: async (ctx, result) => {
    console.log("Transcription completed:", result.transcription.text);

    // Save the transcription to your own table
    // await ctx.runMutation(internal.transcriptions.save, {
    //   text: result.transcription.text,
    // });
  },
  onError: async (ctx, result) => {
    console.error("Transcription failed:", result.job);
  },
});

export default http;

Typed Metadata

Pass custom metadata with your transcription request and receive it back in the webhook callback with full type safety:

// Define your metadata type when creating the client
const elevenlabs = new ElevenLabs<{ documentId: string; userId: string }>(
  components.elevenlabs,
);

// Pass metadata when starting transcription
await elevenlabs.startTranscription(ctx, {
  url: args.url,
  modelId: "scribe_v1",
  options: {
    metadata: {
      documentId: "doc_123",
      userId: "user_456",
    },
  },
});

// Access typed metadata in the webhook callback
elevenlabs.registerWebhook(http, {
  onComplete: async (ctx, result) => {
    // result.requestMetadata is typed as { documentId: string; userId: string }
    console.log(result.requestMetadata.documentId);
  },
});

Transcription Options

All options are optional:

| Option | Type | Description | | ----------------------- | --------------------------------- | -------------------------------------------------------------------------------------- | | languageCode | string | ISO-639-1/3 language code. Auto-detected if not provided. | | diarize | boolean | Annotate which speaker is talking. Default: false | | numSpeakers | number | Max speakers in the file (up to 32). | | timestampsGranularity | "none" \| "word" \| "character" | Timestamp detail level. Default: "word" | | diarizationThreshold | number | Speaker diarization threshold (when diarize=true). | | tagAudioEvents | boolean | Tag events like (laughter), (footsteps). Default: true | | temperature | number | Randomness control (0.0 to 2.0). | | seed | number | For deterministic sampling (0 to 2147483647). | | useMultiChannel | boolean | Transcribe each audio channel independently. Max 5 channels. | | entityDetection | string \| string[] | Detect entities: "all", "pii", "phi", "pci", "other", "offensive_language" | | keyterms | string[] | Words/phrases to bias transcription towards (max 100 terms). | | metadata | object | Custom metadata included in webhook response. |

Example with Options

await elevenlabs.startTranscription(ctx, {
  url: args.url,
  modelId: "scribe_v2",
  options: {
    diarize: true,
    numSpeakers: 2,
    languageCode: "en",
    entityDetection: ["pii", "phi"],
    keyterms: ["Convex", "ElevenLabs"],
  },
});

Webhook Response

The onComplete callback receives a result object with:

status: "completed"
job: The transcription job record
transcription: The full transcription result including:
- text: The transcribed text
- words: Array of words with timestamps and speaker IDs
- language_code: Detected language
- language_probability: Confidence score
requestMetadata: Your custom metadata (if provided)

Using with Convex Workflows

This component integrates with @convex-dev/workflow using awaitEvent to pause a workflow until transcription completes.

1. Type your metadata to include the event ID:

type TranscriptionMetadata = { eventId: string };
const elevenlabs = new ElevenLabs<TranscriptionMetadata>(components.elevenlabs);

2. In your workflow, create an event and pass its ID via metadata:

const eventId = await workflow.createEvent(ctx, {
  name: "transcriptionComplete",
  workflowId: ctx.workflowId,
});

// Start transcription, passing the event ID in metadata
await ctx.runAction(internal.example.startTranscription, {
  url: args.audioUrl,
  eventId,
});

// Workflow pauses here until the webhook sends the event
const result = await ctx.awaitEvent({ id: eventId });

3. In the webhook callback, send the event to resume the workflow:

elevenlabs.registerWebhook(http, {
  onComplete: async (ctx, result) => {
    await workflow.sendEvent(ctx, {
      id: result.requestMetadata.eventId,
      value: {
        text: result.transcription.text,
        languageCode: result.transcription.language_code,
      },
    });
  },
});

This pattern allows transcription to run asynchronously while the workflow durably waits, surviving server restarts.

Local Development

npm install
npm run dev