npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@ruvector/ruvllm

v2.4.1

Published

Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference

Readme

@ruvector/ruvllm

The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration

100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning

npm Downloads License Tests

Quick Start | RLM | Training | Models | API


What is @ruvector/ruvllm?

@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

  • RLM (Recursive Language Model) - Break complex queries into sub-queries, synthesize coherent answers
  • 100% Routing Accuracy - Hybrid keyword + embedding strategy for perfect agent selection
  • SONA Self-Learning - Model improves with every successful interaction
  • SIMD Acceleration - AVX2/NEON optimized inference

Why @ruvector/ruvllm?

| Challenge | Traditional Approach | @ruvector/ruvllm Solution | |-----------|---------------------|---------------------------| | Agent selection | Manual or keyword-based | Semantic + keyword hybrid = 100% | | Complex queries | Single-shot RAG | Recursive decomposition + synthesis | | Response latency | 2-5 seconds | <1ms cache, 50-200ms full | | Learning | Static models | Self-improving (SONA) | | Cost per route | $0.01+ (API call) | $0 (local inference) |


Installation

npm install @ruvector/ruvllm

Quick Start

import { RuvLLM, RlmController } from '@ruvector/ruvllm';

// Simple LLM inference
const llm = new RuvLLM({
  modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
  sonaEnabled: true,
});

const response = await llm.query('Explain quantum computing');
console.log(response.text);

// Recursive Language Model for complex queries
const rlm = new RlmController({ maxDepth: 5 });
const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
// Automatically decomposes into sub-queries, retrieves context, synthesizes answer

Core Features

1. Claude Code Native Routing

Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM({ model: 'ruv/ruvltra' });

// Intelligent routing
const route = await llm.route('implement OAuth2 authentication');
console.log(route.agent);      // 'security-architect'
console.log(route.confidence); // 0.98
console.log(route.tier);       // 2 (Haiku-level complexity)

// Multi-agent teams for complex tasks
const team = await llm.routeComplex('build full-stack app with auth');
// Returns: [system-architect, backend-dev, coder, security-architect, tester]

2. 3-Tier Intelligent Routing

┌─────────────────────────────────────────────────────────┐
│                    User Request                         │
└─────────────────────┬───────────────────────────────────┘
                      ↓
              [RuvLTRA Routing]
                      ↓
        ┌─────────────┼─────────────┐
        ↓             ↓             ↓
┌───────────┐  ┌───────────┐  ┌───────────┐
│  Tier 1   │  │  Tier 2   │  │  Tier 3   │
│  Booster  │  │   Haiku   │  │   Opus    │
│   <1ms    │  │  ~500ms   │  │   2-5s    │
│    $0     │  │  $0.0002  │  │  $0.015   │
└───────────┘  └───────────┘  └───────────┘

3. Self-Learning (SONA)

Every successful interaction improves the model:

// First routing: Full inference
llm.route('implement OAuth2') → security-architect (97%)

// Later: Pattern hit in <25μs (learned from success)
llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)

RLM (Recursive Language Model)

RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

How It Works

Query: "What are the causes AND solutions for slow API responses?"
                              ↓
                    [Decomposition]
                    /            \
    "Causes of slow API?"    "Solutions for slow API?"
           ↓                        ↓
    [Sub-answers]            [Sub-answers]
           \                        /
                    [Synthesis]
                         ↓
            Coherent combined answer with sources

Basic Usage

import { RlmController } from '@ruvector/ruvllm';

const rlm = new RlmController({
  maxDepth: 5,
  retrievalTopK: 10,
  enableCache: true,
});

// Add knowledge to memory
await rlm.addMemory('TypeScript adds static typing to JavaScript.');
await rlm.addMemory('React is a library for building user interfaces.');

// Query with recursive retrieval
const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text);           // Comprehensive synthesized answer
console.log(answer.sources);        // Source attributions
console.log(answer.qualityScore);   // 0.0-1.0
console.log(answer.confidence);     // Routing confidence

Streaming

for await (const event of rlm.queryStream('Explain machine learning')) {
  if (event.type === 'token') {
    process.stdout.write(event.text);
  } else {
    console.log('\n\nQuality:', event.answer.qualityScore);
  }
}

With Self-Reflection

const rlm = new RlmController({
  enableReflection: true,
  maxReflectionIterations: 2,
  minQualityScore: 0.8,
});

// Answers are iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');

RLM Configuration

interface RlmConfig {
  maxDepth?: number;              // Max recursion depth (default: 3)
  maxSubQueries?: number;         // Max sub-queries per level (default: 5)
  tokenBudget?: number;           // Token budget (default: 4096)
  enableCache?: boolean;          // Enable caching (default: true)
  cacheTtl?: number;              // Cache TTL in ms (default: 300000)
  retrievalTopK?: number;         // Memory spans to retrieve (default: 10)
  minQualityScore?: number;       // Min quality threshold (default: 0.7)
  enableReflection?: boolean;     // Enable self-reflection (default: false)
  maxReflectionIterations?: number; // Max reflection loops (default: 2)
}

Unique Capabilities

1. Memory-Augmented Routing

Every successful routing is stored in HNSW-indexed memory for instant recall:

// First time: Full inference (~50ms)
route("implement OAuth2") → security-architect (97% confidence)

// Later: Memory hit (<25μs)
route("add OAuth2 flow") → security-architect (99% confidence, cached)

2. Confidence-Aware Escalation

// Low confidence automatically escalates
Confidence > 0.9  → Use recommended agent
Confidence 0.7-0.9 → Use with human confirmation
Confidence < 0.7  → Escalate to higher tier

3. Batch SIMD Operations

import { simd } from '@ruvector/ruvllm/simd';

// 4x faster vector operations with AVX2/NEON
const similarity = simd.batchCosineSimilarity(query, targets);
const attended = simd.flashAttention(q, k, v, scale);

4. Zero-Copy Caching

Arc-based string interning for 100-1000x faster cache hits on large responses.


Performance

Benchmarks (M4 Pro)

| Operation | Latency | Throughput | |-----------|---------|------------| | Query decomposition | 340 ns | 2.9M/s | | Cache lookup | 23.5 ns | 42.5M/s | | Embedding (384d) | 293 ns | 3.4M/s | | Memory search (10k) | 0.4 ms | 2.5K/s | | End-to-end routing | <1 ms | 1K+/s | | Full RLM query | 50-200 ms | 5-20/s |

Routing Accuracy

| Strategy | RuvLTRA | Qwen Base | OpenAI | |----------|---------|-----------|--------| | Embedding Only | 45% | 40% | 52% | | Keyword Only | 78% | 78% | N/A | | Hybrid | 100% | 95% | N/A |

Test Results

145 tests passing
  - RLM Controller: 24 tests
  - Routing Accuracy: 18 tests
  - Contrastive Training: 15 tests
  - SIMD Operations: 22 tests
  - SONA Learning: 19 tests
  - Memory/HNSW: 21 tests
  - Benchmarks: 26 tests

Models

HuggingFace Repository

URL: https://huggingface.co/ruv/ruvltra

Available Models

| Model | Size | Purpose | Accuracy | |-------|------|---------|----------| | ruvltra-claude-code-0.5b-q4_k_m | 398 MB | Agent routing | 100% (hybrid) | | ruvltra-small-0.5b-q4_k_m | ~400 MB | Embeddings | - | | ruvltra-medium-1.1b-q4_k_m | ~1 GB | Full inference | - |

Download Models

// Programmatic
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

// CLI
ruvllm download ruv/ruvltra

Auto-Download

Models are automatically downloaded on first use:

const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Downloads to ~/.ruvllm/models/ if not present

Training

Generate Routing Dataset

node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs, 156 hard negatives

Contrastive Fine-tuning

import { ContrastiveTrainer } from '@ruvector/ruvllm';

const trainer = new ContrastiveTrainer({
  modelPath: './models/base.gguf',
  loraRank: 8,
  loraAlpha: 16,
  learningRate: 1e-4,
});

const pairs = [
  { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
  // ... more pairs
];

await trainer.train(pairs, { epochs: 10 });
await trainer.save('./adapters/routing-lora');

Training Scripts

| Script | Description | |--------|-------------| | routing-dataset.js | Generate 381 routing examples | | claude-code-synth.js | Synthetic data generation | | contrastive-finetune.js | LoRA fine-tuning pipeline | | rlm-dataset.js | RLM training data (500 examples) |


API Reference

RuvLLM Class

class RuvLLM {
  constructor(config?: RuvLLMConfig);

  query(prompt: string, params?: GenerateParams): Promise<Response>;
  stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
  route(task: string): Promise<RoutingResult>;
  routeComplex(task: string): Promise<AgentTeam[]>;

  loadModel(path: string): Promise<void>;
  addMemory(text: string, metadata?: object): number;
  searchMemory(query: string, topK?: number): MemoryResult[];

  sonaStats(): SonaStats | null;
  adapt(input: Float32Array, quality: number): void;
}

RlmController Class

class RlmController {
  constructor(config?: RlmConfig, engine?: RuvLLM);

  query(input: string): Promise<RlmAnswer>;
  queryStream(input: string): AsyncGenerator<StreamToken>;

  addMemory(text: string, metadata?: object): Promise<string>;
  searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;

  clearCache(): void;
  getCacheStats(): { size: number; entries: number };

  updateConfig(config: Partial<RlmConfig>): void;
  getConfig(): Required<RlmConfig>;
}

All Exports

import {
  // Core
  RuvLLM, RuvLLMConfig,

  // RLM
  RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

  // Training
  RlmTrainer, ContrastiveTrainer, createRlmTrainer,
  DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

  // SONA Learning
  SonaCoordinator, TrajectoryBuilder,

  // LoRA
  LoraAdapter, LoraManager,

  // Benchmarks
  ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
} from '@ruvector/ruvllm';

CLI

# Route a task
ruvllm route "add unit tests for auth module"
# → Agent: tester | Confidence: 0.96 | Tier: 2

# Query with streaming
ruvllm query --stream "Explain machine learning"

# Download models
ruvllm download ruv/ruvltra

# Run benchmarks
ruvllm bench ./models/model.gguf

# Evaluate (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite

Platform Support

| Platform | Architecture | Status | |----------|--------------|--------| | macOS | arm64 (M1-M4) | Full support | | macOS | x64 | Supported | | Linux | x64 | Supported | | Linux | arm64 | Supported | | Windows | x64 | Supported |


Links

| Resource | URL | |----------|-----| | npm | npmjs.com/package/@ruvector/ruvllm | | HuggingFace | huggingface.co/ruv/ruvltra | | Crate (Rust) | crates.io/crates/ruvllm | | Documentation | docs.rs/ruvllm | | GitHub | github.com/ruvnet/ruvector | | Claude Flow | github.com/ruvnet/claude-flow |


License

MIT OR Apache-2.0


Built for Claude Code. Optimized for agents. Designed for speed.

Get Started | View on GitHub