npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

llm-runner-router

v1.0.0

Published

Universal LLM model loader and inference router - agnostic, fast, and intelligent

Readme

🧠 LLM-Runner-Router: The Universal Model Orchestration System

Where models transcend their formats, engines dance across dimensions, and inference becomes art

Built by Echo AI Systems License: MIT Quantum Ready

🌌 What Is This Sorcery?

LLM-Runner-Router is not just another model loader - it's a full-stack agnostic neural orchestration system that adapts to ANY model format, ANY runtime environment, and ANY deployment scenario. Think of it as the Swiss Army knife of AI inference, but cooler and with more quantum entanglement.

✨ Core Superpowers

  • 🔮 Universal Format Support: GGUF, ONNX, Safetensors, HuggingFace, and whatever format the future invents
  • ⚡ Multi-Engine Madness: WebGPU for speed demons, WASM for universalists, Node for servers, Edge for the distributed
  • 🧭 Intelligent Routing: Automatically selects the perfect model based on quality, cost, speed, or pure chaos
  • 🚀 Streaming Everything: Real-time token generation that flows like digital poetry
  • 💰 Cost Optimization: Because your wallet deserves love too
  • 🎯 Zero-Config Magic: Works out of the box, customizable to the quantum level

🎮 Quick Start (For The Impatient)

# Clone the quantum repository
git clone https://github.com/echoaisystems/llm-runner-router
cd llm-runner-router

# Install interdimensional dependencies
npm install

# Launch the neural matrix
npm start

🎭 Usage Examples (Where Magic Happens)

Simple Mode - For Mortals

import { quick } from 'llm-runner-router';

// Just ask, and ye shall receive
const response = await quick("Explain quantum computing to a goldfish");
console.log(response.text);

Advanced Mode - For Wizards

import LLMRouter from 'llm-runner-router';

const router = new LLMRouter({
  strategy: 'quality-first',
  enableQuantumMode: true // (Not actually quantum, but sounds cool)
});

// Load multiple models
await router.load('huggingface:meta-llama/Llama-2-7b');
await router.load('local:./models/mistral-7b.gguf');

// Let the router choose the best model
const response = await router.advanced({
  prompt: "Write a haiku about JavaScript",
  temperature: 0.8,
  maxTokens: 50,
  fallbacks: ['gpt-3.5', 'local-llama']
});

Streaming Mode - For The Real-Time Addicts

const stream = router.stream("Tell me a story about a debugging dragon");

for await (const token of stream) {
  process.stdout.write(token);
}

Ensemble Mode - For The Overachievers

const result = await router.ensemble([
  { model: 'gpt-4', weight: 0.5 },
  { model: 'claude', weight: 0.3 },
  { model: 'llama', weight: 0.2 }
], "What is the meaning of life?");

// Get wisdom from multiple AI perspectives!

🏗️ Architecture (For The Curious)

┌─────────────────────────────────────────────┐
│            Your Application                 │
├─────────────────────────────────────────────┤
│            LLM-Runner-Router                │
├─────────────┬──────────┬───────────────────┤
│   Router    │ Pipeline │    Registry       │
├─────────────┴──────────┴───────────────────┤
│      Engines (WebGPU, WASM, Node)          │
├─────────────────────────────────────────────┤
│    Loaders (GGUF, ONNX, Safetensors)       │
└─────────────────────────────────────────────┘

🎯 Routing Strategies

Choose your destiny:

  • 🏆 Quality First: Only the finest neural outputs shall pass
  • 💵 Cost Optimized: Your accountant will love you
  • ⚡ Speed Priority: Gotta go fast!
  • ⚖️ Balanced: The zen master approach
  • 🎲 Random: Embrace chaos, trust the universe
  • 🔄 Round Robin: Everyone gets a turn
  • 📊 Least Loaded: Fair distribution of neural labor

🛠️ Configuration

{
  "routingStrategy": "balanced",
  "maxModels": 100,
  "enableCaching": true,
  "quantization": "dynamic",
  "preferredEngine": "webgpu",
  "maxTokens": 4096,
  "cosmicAlignment": true  // Optional but recommended
}

📊 Performance Metrics

  • Model Load Time: < 500ms ⚡
  • First Token: < 100ms 🚀
  • Throughput: > 100 tokens/sec 💨
  • Memory Usage: < 50% of model size 🧠
  • Quantum Entanglement: Yes ✨

🔧 Advanced Features

Custom Model Loaders

router.registerLoader('my-format', MyCustomLoader);

Cost Optimization

const budget = 0.10; // $0.10 per request
const models = router.optimizeForBudget(availableModels, budget);

Quality Scoring

const scores = await router.rankModelsByQuality(models, prompt);

🌐 Deployment Options

  • Browser: Full client-side inference with WebGPU
  • Node.js: Server-side with native bindings
  • Edge: Cloudflare Workers, Deno Deploy
  • Docker: Container-ready out of the box
  • Kubernetes: Scale to infinity and beyond

🤝 Contributing

We welcome contributions from all dimensions! Whether you're fixing bugs, adding features, or improving documentation, your quantum entanglement with this project is appreciated.

  1. Fork the repository (in this dimension)
  2. Create your feature branch (git checkout -b feature/quantum-enhancement)
  3. Commit with meaningful messages (git commit -m 'Add quantum tunneling support')
  4. Push to your branch (git push origin feature/quantum-enhancement)
  5. Open a Pull Request (and hope it doesn't collapse the wave function)

📜 License

MIT License - Because sharing is caring, and AI should be for everyone.

🙏 Acknowledgments

  • The Quantum Field for probabilistic inspiration
  • Coffee for keeping us in a superposition of awake and asleep
  • You, for reading this far and joining our neural revolution

🚀 What's Next?

  • [ ] Actual quantum computing support (when available)
  • [ ] Time-travel debugging (work in progress)
  • [ ] Telepathic model loading (pending FDA approval)
  • [ ] Integration with alien AI systems (awaiting first contact)

Built with 💙 and ☕ by Echo AI Systems

"Because every business deserves an AI brain, and every AI brain deserves a proper orchestration system"


📞 Support

Remember: With great model power comes great computational responsibility. Use wisely! 🧙‍♂️