npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

gpu-orchestrator

v0.8.0

Published

Cross-platform GPU/CPU tuning + monitoring for local AI runtimes

Readme

gpu-orchestrator

DARKSOL Built by DARKSOL 🌑

Cross-platform GPU/CPU tuning, VRAM management, and live monitoring dashboard for local AI runtimes.

npm license platform

Why this exists

Running local AI is just throwing models at hardware until something works — until now. gpu-orchestrator gives you actual visibility into what your machine is doing: detect your backends, cap your VRAM intelligently, generate tuned runtime configs, and watch everything live in a terminal dashboard. Built for real local inference setups, not cloud demos.

What it does

  • Auto-detects compute backends: CUDA, ROCm, Vulkan, Metal, CPU
  • Inspects hardware and reports VRAM, RAM, CPU with ASCII bars
  • Generates tuned runtime configs for Ollama, llama.cpp, and vLLM
  • Live ASCII terminal dashboard with CPU trends, memory gauges, GPU tables, VRAM bars, per-core sparklines, and runtime status
  • Quick benchmark runner — tokens/sec avg/min/max across runs
  • Non-aggressive VRAM soft caps: suggestions only, never kills your processes

Install

npm install -g gpu-orchestrator

Commands

doctor — Full hardware inspection

gpu-orchestrator doctor
gpu-orchestrator doctor --json
gpu-orchestrator doctor --vram-policy strict

Reports:

  • CPU, RAM, GPU hardware
  • VRAM soft caps with ASCII bars per GPU
  • Compute backends (CUDA / ROCm / Vulkan / Metal / CPU)
  • Backend recommendations (e.g. AMD → ROCm primary, Vulkan fallback)
  • Runtime status (Ollama, llama.cpp, vLLM — online/offline)
  • Profile recommendations

optimize — Generate tuned runtime config

# Ollama
gpu-orchestrator optimize --runtime ollama --profile balanced
gpu-orchestrator optimize --runtime ollama --profile balanced --vram-cap 3.0

# llama.cpp with Vulkan backend
gpu-orchestrator optimize --runtime llamacpp --profile latency --backend vulkan

# vLLM with custom VRAM cap
gpu-orchestrator optimize --runtime vllm --profile throughput --vram-cap 6.0

# Dry run (preview without writing)
gpu-orchestrator optimize --runtime ollama --profile balanced --dry-run

Options:

| Flag | Values | Description | |---|---|---| | --runtime | ollama | llamacpp | vllm | Target runtime | | --profile | latency | throughput | balanced | low-power | Tuning profile | | --vram-cap <GB> | number | Manual VRAM soft cap in GB | | --vram-policy | strict | balanced | lenient | VRAM reserve policy | | --backend | cuda | rocm | vulkan | metal | cpu | Force backend (default: auto) | | --output <path> | string | Custom output path | | --dry-run | — | Preview without writing |


monitor — Live ASCII terminal dashboard

gpu-orchestrator monitor
gpu-orchestrator monitor --runtime all --interval 2000

Dashboard panels:

  • CPU load trend line
  • Memory donut gauge (color-coded)
  • GPU table (model/vendor/VRAM/driver)
  • VRAM bar chart
  • Per-core CPU sparklines
  • Runtime status panel (Ollama / llama.cpp / vLLM online status)
  • Event log

Press q to exit.


bench — Quick runtime benchmark

gpu-orchestrator bench --runtime ollama --model lfm2:latest --runs 3

Reports avg/min/max tokens per second.

VRAM Soft Caps

Non-aggressive by design — suggestions only, never kills processes.

| Policy | Reserve | Use case | |---|---|---| | strict | 25% | Safety margin for OS/desktop | | balanced | 15% | Default — works for most setups | | lenient | 5% | Max performance, minimal headroom |

Manual override: --vram-cap 3.0 sets a hard GB ceiling.

Generated env vars per runtime:

  • Ollama: OLLAMA_MAX_VRAM
  • llama.cpp: --n-gpu-layers hints + LLAMA_VRAM_CAP_MB
  • vLLM: --gpu-memory-utilization

Backend Detection

Auto-probes in priority order:

| Backend | Probe method | |---|---| | CUDA | nvcc / nvidia-smi | | ROCm | rocminfo / rocm-smi | | Vulkan | vulkaninfo | | Metal | macOS native | | CPU | Always available |

AMD example: ROCm installed → ROCm primary. No ROCm → Vulkan fallback. Neither → CPU.

Architecture / flow

  • doctor probes hardware and backend availability
  • optimize maps hardware profile → runtime config → writes env vars or config files
  • monitor polls system metrics via systeminformation and renders live in blessed-contrib
  • bench runs timed inference loops and reports throughput stats
  • All commands respect VRAM policy before generating any output

Roadmap

  • Runtime adapters: text-generation-webui, exllamav2
  • Model-aware tuning (param size × quantization → layer allocation)
  • Remote node collector + multi-machine dashboard
  • Thermal guardrails with auto profile switching
  • Config presets per GPU vendor family
  • Webhook alerts when VRAM usage nears cap

Security notes

  • No telemetry, no phone-home. Runs fully local.
  • VRAM caps are advisory — the tool does not kill or modify running processes.
  • Generated configs may contain system-specific values; review before deploying to shared machines.

License + links