npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/media-pipeline-mcp-audio-gen

v0.3.0

Published

Audio generation operations — TTS, STT, diarization, source separation, music generation, sound effects (provider delegation)

Readme

@reaatech/media-pipeline-mcp-audio-gen

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Audio generation and processing operations — text-to-speech, speech-to-text, speaker diarization, source separation, music generation, and sound effects — all via provider delegation with multi-provider routing.

Installation

npm install @reaatech/media-pipeline-mcp-audio-gen
# or
pnpm add @reaatech/media-pipeline-mcp-audio-gen

Feature Overview

  • Text-to-speech — convert text to natural speech with voice, speed, and format options
  • Speech-to-text — transcribe audio with optional language detection and diarization
  • Speaker diarization — identify and label individual speakers in multi-speaker audio
  • Source separation — isolate audio stems (vocals, instruments, drums, bass)
  • Music generation — generate music from a text prompt with style, tempo, and instrumentation control
  • Sound effects — generate sound effects from a text prompt with configurable duration
  • Multi-provider routing — operation-based lookup with preferred provider selection; falls back to first capable provider
  • Realtime STT streaming — WebSocket-based streaming transcription via Deepgram with interim results and speaker diarization (via TranscribeStream)
  • Provider-agnostic — works with OpenAI, ElevenLabs, Deepgram, and any conformant provider

Quick Start

import { createAudioGenOperations } from "@reaatech/media-pipeline-mcp-audio-gen";
import { ElevenLabsProvider } from "@reaatech/media-pipeline-mcp-elevenlabs";
import { DeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";
import { OpenAIProvider } from "@reaatech/media-pipeline-mcp-openai";

const ops = createAudioGenOperations(artifactRegistry, storage);

// Register providers — operations auto-route to the right one
ops.registerProvider("openai", new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY! }));
ops.registerProvider("elevenlabs", new ElevenLabsProvider({ apiKey: process.env.ELEVENLABS_API_KEY! }));
ops.registerProvider("deepgram", new DeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! }));

// Text to speech — routes to OpenAI or ElevenLabs
const speech = await ops.textToSpeech({
  text: "Welcome to the media pipeline.",
  voice: "alloy",
  speed: 1.0,
  format: "mp3",
  provider: "openai", // optional: force a specific provider
});

// Transcribe audio — routes to Deepgram or OpenAI
const transcript = await ops.speechToText("audio-123", {
  language: "en",
  diarize: true,
  model: "whisper-1",
});

// Identify speakers
const speakers = await ops.diarize("meeting-456", {
  language: "en",
});

// Separate audio stems
const vocals = await ops.isolate("song-789", {
  target: "vocals",
  model: "demucs",
});

// Generate music
const music = await ops.generateMusic({
  prompt: "Upbeat electronic pop with a driving beat",
  duration: 60,
  instrumental: false,
  style: "electronic-pop",
  tempo: 128,
  format: "mp3",
});

// Generate a sound effect
const sfx = await ops.generateSoundEffect({
  prompt: "Heavy wooden door creaking open",
  duration: 3,
  format: "wav",
});

API Reference

createAudioGenOperations(artifactRegistry, storage)

Factory function that creates an AudioGenOperations instance bound to the given artifact registry and store.

function createAudioGenOperations(
  artifactRegistry: ArtifactRegistry,
  storage: ArtifactStore,
): AudioGenOperations;

AudioGenOperations

Main class providing all audio generation and processing capabilities. Operations delegate to registered providers based on operation type.

class AudioGenOperations {
  constructor(artifactRegistry: ArtifactRegistry, storage: ArtifactStore);

  registerProvider(name: string, provider: MediaProvider): void;

  textToSpeech(config: TTSConfig): Promise<Artifact>;
  speechToText(artifactId: string, config?: STTConfig): Promise<Artifact>;
  diarize(artifactId: string, config?: DiarizeConfig): Promise<Artifact>;
  isolate(artifactId: string, config: IsolateConfig): Promise<Artifact>;
  generateMusic(config: MusicConfig): Promise<Artifact>;
  generateSoundEffect(config: SoundEffectConfig): Promise<Artifact>;
}

Operation Configs

TTSConfig

interface TTSConfig {
  text: string;                     // Text to convert to speech
  voice?: string;                   // Voice ID (default: "alloy")
  speed?: number;                   // Speech speed 0.5–2.0 (default: 1.0)
  format?: "mp3" | "wav" | "ogg" | "flac";  // Output format (default: "mp3")
  model?: string;                   // Model override (default: "tts-1")
  provider?: string;                // Force specific provider (e.g., "openai", "elevenlabs")
}

STTConfig

interface STTConfig {
  language?: string;                // Language code (e.g., "en", "es")
  diarize?: boolean;                // Enable speaker diarization (default: false)
  model?: string;                   // Model override (default: "whisper-1")
  provider?: string;                // Force specific provider (e.g., "openai", "deepgram")
}

DiarizeConfig

interface DiarizeConfig {
  language?: string;                // Language code
  model?: string;                   // Model override (default: "pyannote")
  provider?: string;                // Force specific provider (e.g., "deepgram")
}

IsolateConfig

interface IsolateConfig {
  target: "vocals" | "instruments" | "drums" | "bass";  // Stem to isolate
  model?: string;                   // Model override (default: "demucs")
  provider?: string;                // Force specific provider (e.g., "replicate")
}

MusicConfig

interface MusicConfig {
  prompt: string;                   // Text description of music to generate
  duration?: number;                // Duration in seconds (default: 30)
  instrumental?: boolean;           // Instrumental only (default: true)
  style?: string;                   // Musical style (e.g., "pop", "rock", "classical")
  tempo?: number;                   // BPM tempo (e.g., 120)
  format?: "mp3" | "wav" | "ogg" | "flac";  // Output format (default: "mp3")
  model?: string;                   // Model override (default: "music-gen")
  provider?: string;                // Force specific provider
}

SoundEffectConfig

interface SoundEffectConfig {
  prompt: string;                   // Text description of the sound effect
  duration?: number;                // Duration in seconds (default: 5)
  format?: "mp3" | "wav" | "ogg" | "flac";  // Output format (default: "mp3")
  model?: string;                   // Model override (default: "sfx-gen")
  provider?: string;                // Force specific provider
}

TranscribeStream (Real-time STT)

WebSocket-based streaming transcription for real-time audio. Supports Deepgram with interim results, word-level timings, and speaker diarization.

class TranscribeStream extends EventEmitter {
  constructor(options: TranscribeStreamOptions);

  start(request: TranscribeStreamRequest): Promise<void>;
  sendAudio(data: Buffer): void;
  close(): Promise<TranscribeStreamResult>;

  on(event: "event", listener: (event: TranscribeStreamEvent) => void): this;
}

TranscribeStreamRequest

interface TranscribeStreamRequest {
  source: {
    kind: "inline" | "url" | "mic" | "inline-sample";
    encoding?: "linear16" | "opus" | "mulaw";
    sampleRateHz?: number;
    data?: string;                  // Base64-encoded audio for inline mode
    url?: string;                   // Audio URL for url mode
  };
  language?: string;
  model?: string;
  provider?: string;
  interim?: boolean;                // Return interim results (default: false)
  diarize?: boolean;                // Enable speaker diarization (default: false)
  endpointingMs?: number;           // Endpointing sensitivity in ms
}

TranscribeStreamEvent

type TranscribeStreamEvent =
  | { kind: "interim"; transcript: string; confidence?: number; words?: WordTiming[] }
  | { kind: "final"; transcript: string; confidence?: number; words?: WordTiming[]; startMs: number; endMs: number; speaker?: string }
  | { kind: "metadata"; languageDetected?: string; sampleRateHz?: number }
  | { kind: "error"; code: string; message: string };

ProviderUnsupportedError

Thrown when a provider does not support streaming STT (e.g., OpenAI Whisper is batch-only).

MicNotAvailableError

Thrown when microphone capture is requested but node-record-lpcm16 is not installed.

Usage Patterns

Multi-Provider Setup with Routing

const ops = createAudioGenOperations(artifactRegistry, storage);
ops.registerProvider("openai", new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY! }));
ops.registerProvider("deepgram", new DeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! }));
ops.registerProvider("elevenlabs", new ElevenLabsProvider({ apiKey: process.env.ELEVENLABS_API_KEY! }));

// Automatically routes: TTS → ElevenLabs prefers audio.tts, STT → Deepgram prefers audio.stt
// Use `provider` param to override the default routing

Transcription with Diarization

const result = await ops.speechToText("meeting-audio", {
  language: "en",
  diarize: true,
  model: "whisper-1",
});

const segments = JSON.parse(
  (await storage.get(result.id)).data.toString()
);
// segments = [
//   { speaker: "Speaker 1", text: "...", start: 0.0, end: 2.5, confidence: 0.97 },
//   { speaker: "Speaker 2", text: "...", start: 3.0, end: 5.8, confidence: 0.94 },
// ]

Dedicated Diarization with Fallback

// If a dedicated diarization provider exists, uses it.
// Falls back to STT provider with diarize: true if not available.
const result = await ops.diarize("meeting-audio", {
  language: "en",
  model: "pyannote",
});

console.log(result.metadata.speakers); // 3
for (const segment of result.metadata.segments) {
  console.log(`${segment.speaker}: ${segment.text} (${segment.confidence})`);
}

Source Separation (Audio Stems)

const vocals = await ops.isolate("song-123", { target: "vocals" });
const drums = await ops.isolate("song-123", { target: "drums" });
const bass = await ops.isolate("song-123", { target: "bass" });
const instruments = await ops.isolate("song-123", { target: "instruments" });

Music Generation

const music = await ops.generateMusic({
  prompt: "Upbeat electronic pop with a driving beat and synth melody",
  duration: 60,
  instrumental: false,
  style: "electronic-pop",
  tempo: 128,
  format: "mp3",
  provider: "elevenlabs",  // optional provider override
});

Sound Effects

const sfx = await ops.generateSoundEffect({
  prompt: "Heavy wooden door creaking open slowly",
  duration: 3,
  format: "mp3",
});

Real-time STT Streaming

import { TranscribeStream, ProviderUnsupportedError } from "@reaatech/media-pipeline-mcp-audio-gen";

const ts = new TranscribeStream({ apiKey: process.env.DEEPGRAM_API_KEY! });

// Listen for streaming events
ts.on("event", (event) => {
  if (event.kind === "interim") {
    console.log("Partial:", event.transcript);
  } else if (event.kind === "final") {
    console.log("Final:", event.transcript, `(${event.speaker ?? "unknown"})`);
  } else if (event.kind === "error") {
    console.error(event.code, event.message);
  }
});

// Stream from URL
await ts.start({
  source: { kind: "url", url: "https://example.com/live-audio" },
  language: "en",
  interim: true,
  diarize: true,
  endpointingMs: 800,
});

const result = await ts.close();
console.log("Full transcript:", result.transcript);
console.log("Duration:", result.durationMs, "ms");
console.log("Audio bytes:", result.bytes);

Related Packages

License

MIT