convex-elevenlabs
v0.1.0
Published
A elevenlabs component for Convex.
Readme
Convex ElevenLabs
A Convex component for async speech-to-text transcription using ElevenLabs. Transcriptions are processed asynchronously via webhooks, so your functions return immediately while ElevenLabs processes the audio in the background.
Found a bug? Feature request? File it here.
Installation
npm install convex-elevenlabsAdd the component to your Convex app:
// convex/convex.config.ts
import { defineApp } from "convex/server";
import elevenlabs from "convex-elevenlabs/convex.config";
const app = defineApp();
app.use(elevenlabs);
export default app;ElevenLabs Setup
Get your API key from the ElevenLabs dashboard
Create a webhook in the ElevenLabs webhooks settings:
- Set the URL to
https://<your-convex-deployment>.convex.site/elevenlabs/webhook - Enable the "Transcription completed" event
- Copy the Webhook ID and Webhook Secret
- Set the URL to
Set environment variables in your Convex dashboard:
ELEVENLABS_API_KEY=your_api_key
ELEVENLABS_WEBHOOK_ID=your_webhook_id
ELEVENLABS_WEBHOOK_SECRET=your_webhook_secretUsage
Initialize the Client
// convex/elevenlabs.ts
import { ElevenLabs } from "convex-elevenlabs";
import { components } from "./_generated/api";
const elevenlabs = new ElevenLabs(components.elevenlabs);Start a Transcription
Call startTranscription from an action with either a URL or file bytes. It
returns immediately with a jobId while ElevenLabs processes the audio
asynchronously.
From URL:
import { action } from "./_generated/server";
import { v } from "convex/values";
export const transcribeFromUrl = action({
args: { url: v.string() },
handler: async (ctx, args) => {
return await elevenlabs.startTranscription(ctx, {
url: args.url,
modelId: "scribe_v1",
options: {},
});
},
});From file:
export const transcribeFromFile = action({
args: {
file: v.bytes(),
fileName: v.string(),
},
handler: async (ctx, args) => {
return await elevenlabs.startTranscription(ctx, {
file: args.file,
fileName: args.fileName,
modelId: "scribe_v1",
options: {},
});
},
});When using file upload, the file is sent directly to ElevenLabs and is not stored in your Convex database. Supported formats include MP3, WAV, M4A, MP4, and other major audio/video formats. Maximum file size is 3GB.
Handle Webhook Callbacks
Register the webhook handler in your http.ts to receive transcription results:
// convex/http.ts
import { httpRouter } from "convex/server";
import { ElevenLabs } from "convex-elevenlabs";
import { components } from "./_generated/api";
const http = httpRouter();
const elevenlabs = new ElevenLabs(components.elevenlabs);
elevenlabs.registerWebhook(http, {
path: "/elevenlabs/webhook",
onComplete: async (ctx, result) => {
console.log("Transcription completed:", result.transcription.text);
// Save the transcription to your own table
// await ctx.runMutation(internal.transcriptions.save, {
// text: result.transcription.text,
// });
},
onError: async (ctx, result) => {
console.error("Transcription failed:", result.job);
},
});
export default http;Typed Metadata
Pass custom metadata with your transcription request and receive it back in the webhook callback with full type safety:
// Define your metadata type when creating the client
const elevenlabs = new ElevenLabs<{ documentId: string; userId: string }>(
components.elevenlabs,
);
// Pass metadata when starting transcription
await elevenlabs.startTranscription(ctx, {
url: args.url,
modelId: "scribe_v1",
options: {
metadata: {
documentId: "doc_123",
userId: "user_456",
},
},
});
// Access typed metadata in the webhook callback
elevenlabs.registerWebhook(http, {
onComplete: async (ctx, result) => {
// result.requestMetadata is typed as { documentId: string; userId: string }
console.log(result.requestMetadata.documentId);
},
});Transcription Options
All options are optional:
| Option | Type | Description |
| ----------------------- | --------------------------------- | -------------------------------------------------------------------------------------- |
| languageCode | string | ISO-639-1/3 language code. Auto-detected if not provided. |
| diarize | boolean | Annotate which speaker is talking. Default: false |
| numSpeakers | number | Max speakers in the file (up to 32). |
| timestampsGranularity | "none" \| "word" \| "character" | Timestamp detail level. Default: "word" |
| diarizationThreshold | number | Speaker diarization threshold (when diarize=true). |
| tagAudioEvents | boolean | Tag events like (laughter), (footsteps). Default: true |
| temperature | number | Randomness control (0.0 to 2.0). |
| seed | number | For deterministic sampling (0 to 2147483647). |
| useMultiChannel | boolean | Transcribe each audio channel independently. Max 5 channels. |
| entityDetection | string \| string[] | Detect entities: "all", "pii", "phi", "pci", "other", "offensive_language" |
| keyterms | string[] | Words/phrases to bias transcription towards (max 100 terms). |
| metadata | object | Custom metadata included in webhook response. |
Example with Options
await elevenlabs.startTranscription(ctx, {
url: args.url,
modelId: "scribe_v2",
options: {
diarize: true,
numSpeakers: 2,
languageCode: "en",
entityDetection: ["pii", "phi"],
keyterms: ["Convex", "ElevenLabs"],
},
});Webhook Response
The onComplete callback receives a result object with:
status:"completed"job: The transcription job recordtranscription: The full transcription result including:text: The transcribed textwords: Array of words with timestamps and speaker IDslanguage_code: Detected languagelanguage_probability: Confidence score
requestMetadata: Your custom metadata (if provided)
Using with Convex Workflows
This component integrates with
@convex-dev/workflow using
awaitEvent to pause a workflow until transcription completes.
1. Type your metadata to include the event ID:
type TranscriptionMetadata = { eventId: string };
const elevenlabs = new ElevenLabs<TranscriptionMetadata>(components.elevenlabs);2. In your workflow, create an event and pass its ID via metadata:
const eventId = await workflow.createEvent(ctx, {
name: "transcriptionComplete",
workflowId: ctx.workflowId,
});
// Start transcription, passing the event ID in metadata
await ctx.runAction(internal.example.startTranscription, {
url: args.audioUrl,
eventId,
});
// Workflow pauses here until the webhook sends the event
const result = await ctx.awaitEvent({ id: eventId });3. In the webhook callback, send the event to resume the workflow:
elevenlabs.registerWebhook(http, {
onComplete: async (ctx, result) => {
await workflow.sendEvent(ctx, {
id: result.requestMetadata.eventId,
value: {
text: result.transcription.text,
languageCode: result.transcription.language_code,
},
});
},
});This pattern allows transcription to run asynchronously while the workflow durably waits, surviving server restarts.
Local Development
npm install
npm run dev