npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

gpu-orchestrator

v0.5.0

Published

Cross-platform GPU/CPU tuning + monitoring for local AI runtimes

Readme

Cross-platform GPU/CPU tuning, VRAM management, and live monitoring dashboard for local AI runtimes.

npm license node


Why this exists

Local AI runtimes (Ollama, llama.cpp, vLLM) ship with zero guidance on tuning for your hardware. Users guess at batch sizes, thread counts, VRAM limits, and backend selection — and get it wrong. This tool inspects your machine, detects your compute stack, and generates correct runtime configs automatically.

What it does

  • Detects hardware — CPU cores, RAM, GPUs, VRAM per device, temperatures
  • Probes compute backends — CUDA, ROCm, Vulkan, Metal, CPU
  • Recommends backends — AMD → ROCm primary, Vulkan fallback; NVIDIA → CUDA; Apple → Metal
  • VRAM soft caps — suggest memory limits per GPU (strict / balanced / lenient policies)
  • Model recommendations — curated database of 35+ models, ranked by what fits your VRAM
  • Quickstart wizard — zero to running in one command
  • Model fit checker — tell it a model size, it tells you if it fits your VRAM
  • GPU/CPU split planner — visualize how a model distributes across GPUs + CPU RAM
  • Model management — list, load, unload, pull, remove, inspect Ollama models with GPU/CPU split bars
  • Profile-based tuning — latency, throughput, balanced, low-power
  • Generates runtime configs — env files for Ollama, llama.cpp flags, vLLM args
  • Live ASCII dashboard — CPU trend, RAM gauge, GPU table, VRAM bars, temperatures, live inference
  • Web GUI dashboard — browser-based live monitoring with Chart.js graphs, health score, model management
  • Install guides — platform-aware setup instructions for ROCm, Vulkan, CUDA, Metal, Ollama
  • Quick benchmarks — tok/s measurement with history tracking, comparison, trend sparklines
  • Cost savings — compare local inference electricity cost vs cloud API pricing
  • Health score — A-F system grade with category breakdown
  • Temperature monitoring — GPU and CPU temps via nvidia-smi, rocm-smi, systeminformation
  • Process viewer — find running AI runtime processes
  • Persistent config — save preferences to ~/.gpu-orchestrator/config.json

Install

npm install -g gpu-orchestrator

Commands

quickstart — Zero to running

gpu-orchestrator quickstart

Interactive 9-step wizard: detect hardware → check Ollama → recommend models → pull → generate config → load → benchmark → done.

status — Quick overview

gpu-orchestrator status
⚡ GPU Orchestrator — Status

  CPU:  [██░░░░░░░░░░░░░░░░░░] 4.9% (12 cores)
  RAM:  [████████░░░░░░░░░░░░] 41.9% (13.1/31.3 GB)
  GPU:  AMD Radeon RX 5500 (4.0 GB VRAM)
  Cap:  3.38 GB usable / 3.98 GB total (balanced)
  Temp: CPU 52°C | AMD Radeon RX 5500 61°C
  API:  VULKAN
  Proc: 2 AI process(es) running

  ● Ollama — 3 models, 1 running
  ○ llama.cpp
  ○ vLLM

recommend — What model should I run?

gpu-orchestrator recommend
gpu-orchestrator recommend --category code
gpu-orchestrator recommend --json

Scans your hardware, shows a ranked table of models that fit your VRAM, with medal emojis and install status.

doctor — Full inspection

gpu-orchestrator doctor
gpu-orchestrator doctor --json
gpu-orchestrator doctor --vram-policy strict
gpu-orchestrator doctor --export report.json

Detailed report: health score (A-F grade), system specs, temperatures, per-GPU VRAM caps, backend detection table, process list, runtime probing, profile recommendations.

models — Ollama model management

gpu-orchestrator models list              # all installed + running status
gpu-orchestrator models running           # loaded models with GPU/CPU split + last bench tok/s
gpu-orchestrator models info llama3:8b    # detailed model specs
gpu-orchestrator models load llama3:8b    # load into memory
gpu-orchestrator models unload llama3:8b  # free memory
gpu-orchestrator models pull llama3:8b    # pull from Ollama library (streaming progress)
gpu-orchestrator models rm llama3:8b      # remove a model

model-fit — Will it run?

gpu-orchestrator model-fit 7b
gpu-orchestrator model-fit 13b --quant q8
gpu-orchestrator model-fit 3b --quant fp16

Presets: 1b 3b 7b 8b 13b 14b 24b 30b 34b 70b 72b — or pass raw GB.

split — GPU/CPU layer split planner

gpu-orchestrator split 4.5
gpu-orchestrator split 14 --runtime llamacpp
gpu-orchestrator split 8 --policy strict

optimize — Generate runtime config

gpu-orchestrator optimize --runtime ollama --profile balanced
gpu-orchestrator optimize --runtime llamacpp --profile latency --backend vulkan
gpu-orchestrator optimize --runtime ollama --profile balanced --apply    # show how to apply
gpu-orchestrator optimize --runtime ollama --profile balanced --dry-run

| Option | Values | |---|---| | --runtime | ollama · llamacpp · vllm | | --profile | latency · throughput · balanced · low-power | | --vram-cap | Manual cap in GB | | --vram-policy | strict (25%) · balanced (15%) · lenient (5%) | | --backend | cuda · rocm · vulkan · metal · cpu · auto | | --apply | Show platform-specific apply instructions |

bench — Runtime benchmark

gpu-orchestrator bench --model llama3:8b --runs 3
gpu-orchestrator bench history                        # past runs table
gpu-orchestrator bench compare llama3:8b mistral:7b   # side-by-side
gpu-orchestrator bench trend llama3:8b                # ASCII sparkline over time

Results auto-save to history. Use --no-save to skip.

savings — Cost calculator

gpu-orchestrator savings --tokens 100000
gpu-orchestrator savings --tokens 500000 --electricity 0.15
gpu-orchestrator savings --tokens 100000 --json

Compares your local electricity cost vs GPT-4o, GPT-4o Mini, Claude Sonnet, Claude Haiku.

ps — AI process viewer

gpu-orchestrator ps
gpu-orchestrator ps --json

Shows running Ollama, llama.cpp, vLLM processes with PID, CPU%, memory.

config — Persistent configuration

gpu-orchestrator config show                        # print current config
gpu-orchestrator config set electricityRate 0.15    # update a value
gpu-orchestrator config set hosts.ollama http://192.168.1.50:11434
gpu-orchestrator config reset                       # restore defaults
gpu-orchestrator config path                        # show config file location

monitor — Live ASCII dashboard (TUI)

gpu-orchestrator monitor
gpu-orchestrator monitor --runtime all --interval 2000
gpu-orchestrator monitor --new-window

Terminal UI panels: CPU load trend (60 points) · Memory donut · GPU table · Temperature panel · Live inference metrics · Runtime status · Event log. Press q to exit.

web — Web GUI dashboard

gpu-orchestrator web
gpu-orchestrator web --port 8080 --open
gpu-orchestrator web --host http://192.168.1.50:11434

Browser-based live dashboard with:

  • DARKSOL gold/dark branded theme
  • Chart.js real-time CPU line chart
  • Temperature gauges (color-coded)
  • Health score badge (A-F)
  • GPU/VRAM panels with cap indicators
  • Ollama model list with load/unload buttons
  • Model pull input field
  • Cost savings calculator widget
  • Runtime status (Ollama, llama.cpp, vLLM)
  • 2-second auto-refresh

install-guide — Setup help

gpu-orchestrator install-guide
gpu-orchestrator install-guide rocm
gpu-orchestrator install-guide cuda
gpu-orchestrator install-guide ollama

Platform-aware guides for Windows, Linux, and macOS.

VRAM Soft Caps

Non-aggressive by design — suggestions only, never kills processes.

| Policy | VRAM Reserve | Use Case | |---|---|---| | strict | 25% | Desktop multitasking, stability first | | balanced | 15% | Default — good performance + headroom | | lenient | 5% | Max performance, may cause stuttering |

Backend Detection

Auto-probes and recommends:

| Hardware | Primary | Fallback | |---|---|---| | NVIDIA + CUDA | CUDA | Vulkan | | AMD + ROCm | ROCm | Vulkan | | AMD (no ROCm) | Vulkan | CPU | | Apple Silicon | Metal | — | | Intel | Vulkan | CPU |

Health Score

System health grading with 6 categories:

| Category | Max Points | Criteria | |---|---|---| | GPU | 20 | Discrete GPU detected | | Backend | 20 | GPU compute backend available | | Runtime | 20 | AI runtime running | | VRAM | 15 | 8G+ = 15, 4G+ = 10, 2G+ = 5 | | RAM | 15 | 32G+ = 15, 16G+ = 10 | | Cores | 10 | 8+ = 10, 4+ = 5 |

Grades: A (90+), B (75+), C (60+), D (45+), F (<45)

Supported Runtimes

  • Ollama — full integration: model list, load/unload, pull/remove, running status, GPU/CPU split, env generation, benchmarks
  • llama.cpp — flag generation (--threads, --n-gpu-layers, --tensor-split, --batch-size)
  • vLLM — arg generation (--gpu-memory-utilization, --max-num-seqs)

Testing

npm test

8 test files, 25+ tests using Node's built-in test runner. All pure unit tests with mock data.

Roadmap

  • [ ] Runtime adapters: text-generation-webui, exllamav2
  • [ ] Model-aware tuning (param size × quant → layer allocation)
  • [ ] Remote node collector + multi-machine dashboard
  • [ ] Thermal guardrails with auto profile switching
  • [ ] Webhook alerts on VRAM cap breach
  • [ ] Config presets per GPU family
  • [ ] SSH tunnel support for remote GPU nodes

License

MIT


Built with teeth. 🌑