npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agent-orcha/node-omni-orcha

v2026.324.2209

Published

Unified native Node.js bindings — single omni.node for LLM, STT, TTS, and Image/Video generation

Readme

node-omni-orcha

Unified native Node.js inference engine — LLM, Image/Video Generation, Speech-to-Text, and Text-to-Speech in a single omni.node binary.

Built on a llama.cpp fork with stable-diffusion.cpp, whisper.cpp, and qwen3-tts.cpp compiled against a shared ggml backend.

Features

  • LLM — Chat completion, streaming, embeddings, tool calling, reasoning budget control
  • Image Generation — FLUX 2, Wan 2.2, SD/SDXL (text-to-image)
  • Video Generation — Wan 2.2 (text-to-video)
  • Speech-to-Text — Whisper (language detection, timestamps)
  • Text-to-Speech — Qwen3-TTS with voice cloning from 3s reference audio
  • GPU accelerated — Metal (macOS), CUDA (NVIDIA), CPU fallback
  • Single binary — One omni.node for all engines, one shared ggml, no symbol conflicts
  • Native N-API — In-process inference, no child processes or HTTP servers
  • Node 25 — Native TypeScript, ESM, no build step

Quick Start

npm install node-omni-orcha
npm run build:metal   # macOS
npm run build:cuda    # NVIDIA
npm run build:cpu     # CPU only

API

import { loadModel, createModel, detectGpu, readGGUFMetadata } from 'node-omni-orcha'

LLM

const llm = await loadModel('qwen3.5-4b.gguf', { type: 'llm', contextSize: 4096 })

// With reasoning (default)
const result = await llm.complete([
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'Hello!' },
], { temperature: 0.7, maxTokens: 512 })

console.log(result.content)    // response text
console.log(result.reasoning)  // thinking/reasoning (if model supports it)
console.log(result.usage)      // { inputTokens, outputTokens, totalTokens }

// Without reasoning (direct response, faster)
const fast = await llm.complete(messages, { thinkingBudget: 0, maxTokens: 256 })

// With capped reasoning (N tokens of thinking, then respond)
const capped = await llm.complete(messages, { thinkingBudget: 64, maxTokens: 512 })

// Streaming
for await (const chunk of llm.stream(messages, { thinkingBudget: 0 })) {
  process.stdout.write(chunk.content ?? '')
  if (chunk.done) break
}

// Embeddings
const embedding = await llm.embed('some text')

// Tool calling
const result = await llm.complete(messages, {
  tools: [{ name: 'get_weather', description: '...', parameters: { ... } }],
  toolChoice: 'auto',
})

Image Generation (FLUX 2)

const img = createModel('flux-2-klein-4b.gguf', 'image')
await img.load({
  llmPath: 'qwen3-4b.gguf',        // text encoder for FLUX 2
  vaePath: 'flux2-vae.safetensors', // VAE decoder
  keepVaeOnCpu: true,
})

const png = await img.generate('a sunset over mountains', {
  width: 512, height: 512, steps: 4, cfgScale: 1.0,
})
fs.writeFileSync('output.png', png)

Video Generation (Wan 2.2)

const vid = createModel('wan2.2-5b.gguf', 'image')
await vid.load({ t5xxlPath: 'umt5-xxl.gguf', vaePath: 'wan-vae.safetensors' })

const frames = await vid.generateVideo('a dog playing fetch', {
  width: 832, height: 480, videoFrames: 33, steps: 30,
})
// frames is Buffer[] — one PNG per frame

Speech-to-Text (Whisper)

const stt = await loadModel('whisper-tiny.bin', { type: 'stt' })

const result = await stt.transcribe(pcmBuffer, { language: 'en' })
// { text: "Hello world", language: "en", segments: [{ start: 0.0, end: 1.5, text: "..." }] }

const lang = await stt.detectLanguage(pcmBuffer)
// "en"

Audio format: 16-bit PCM, 16kHz, mono.

Text-to-Speech with Voice Cloning (Qwen3-TTS)

const tts = createModel('/path/to/qwen3-tts-models/', 'tts')
await tts.load()

// Clone any voice from a short WAV reference
const wav = await tts.speak('Text to speak in cloned voice.', {
  referenceAudioPath: '/path/to/reference.wav', // 24kHz mono, 3-10s recommended
})

// Or generate with default voice (no cloning)
const wav2 = await tts.speak('Default voice synthesis.')

Utilities

const gpu = detectGpu()
// { backend: 'metal' | 'cuda' | 'cpu' }

const meta = await readGGUFMetadata('model.gguf')
// { architecture, contextLength, blockCount, embeddingLength, ... }

Architecture

All engines compile against a single shared ggml (tensor library + GPU backends) and link into one N-API addon:

engine/
  ggml/        ← shared tensor ops, Metal/CUDA/CPU backends
  src/         ← llama.cpp LLM core
  stt/         ← whisper.cpp (compiled against shared ggml)
  tts/         ← qwen3-tts.cpp (compiled against shared ggml)
  diffusion/   ← stable-diffusion.cpp (compiled against shared ggml)

→ build/Release/omni.node (single output, ~10-30MB depending on platform)

Build

npm install
npm run build          # auto-detect GPU
npm run build:metal    # macOS Metal
npm run build:cuda     # NVIDIA CUDA
npm run build:cpu      # CPU only

Test

# Unit tests (no models needed)
npm test

# Download test models
bash scripts/download-test-models.sh              # LLM + STT (~745MB)

# Integration tests (requires models in ~/.orcha/workspace/.models/)
node scripts/full-integration-test.ts
node scripts/samuel-jackson-test.ts

Platforms

| Platform | GPU | Status | |----------|-----|--------| | macOS arm64 | Metal | Tested | | Linux x64 | CPU | CI | | Linux x64 | CUDA | CI | | Linux arm64 | CPU | CI | | Windows x64 | CPU | CI | | Windows x64 | CUDA | CI |

Requirements

  • Node.js >= 25.0.0
  • CMake >= 3.15
  • C++17 compiler (Clang, GCC, MSVC)
  • macOS: Xcode Command Line Tools (for Metal)
  • Linux/Windows: CUDA Toolkit 12.6+ (for NVIDIA GPU support)

License

MIT