badgr-gpu-doctor
v0.1.1
Published
CLI preflight checks for model VRAM, CUDA, GPU compatibility, and quantization risk.
Downloads
292
Maintainers
Readme
badgr-gpu-doctor
Check whether your GPU can run a model — VRAM fit, CUDA version, quantization options — before you waste time loading it.
npx badgr-gpu-doctor --model meta-llama/Llama-3.1-8B-Instruct
npx badgr-gpu-doctor --model Qwen/Qwen3-32B --gpu RTX4090Free. No signup required. Runs entirely on your machine.
The problem it solves
You pick a model, start loading it, and 10 minutes later it OOMs. Or you're not sure whether int4 quantization is safe for your use case. badgr-gpu-doctor answers these questions in under a second — before you waste time or money.
Quick start
# Auto-detect your GPU via nvidia-smi and check a model
npx badgr-gpu-doctor --model meta-llama/Llama-3.1-8B-Instruct
# Specify a GPU manually (no nvidia-smi needed)
npx badgr-gpu-doctor --model Qwen/Qwen3-32B --gpu RTX4090
# Machine-readable JSON (for CI or scripts)
npx badgr-gpu-doctor --model Qwen/Qwen3-32B --gpu A100 --json
# GPU won't fit the model? Show the AI Badgr cloud handoff
npx badgr-gpu-doctor --model Qwen/Qwen3-32B --run-with-badgrCLI flags
| Flag | Description |
|------|-------------|
| --model <id> | HuggingFace model ID, e.g. Qwen/Qwen3-32B or meta-llama/Llama-3.1-70B-Instruct |
| --gpu <name> | GPU name, e.g. RTX4090, A100, H100, L40S, T4 |
| --vram-gb <n> | Override VRAM in GB — skips nvidia-smi lookup |
| --cuda-version <v> | CUDA version string, e.g. 12.1 |
| --quantization <type> | auto | fp16 | int8 | int4 (default: auto) |
| --run-with-badgr | Print the AI Badgr cloud GPU command and exit |
| --json | Output machine-readable JSON |
How VRAM estimation works
Parameter count is inferred from the model name, then VRAM is calculated per quantization:
| Quantization | Bytes/param | 70B model needs |
|---|---|---|
| fp16 | 2.0 | ~140 GB |
| int8 | 1.1 | ~77 GB |
| auto | 0.8 | ~56 GB |
| int4 | 0.6 | ~42 GB |
KV cache and runtime overhead (8–32%) is added automatically.
Known GPU VRAM: RTX 3090 (24 GB), RTX 4090 (24 GB), L4 (24 GB), T4 (16 GB), L40S (48 GB), A100 (40/80 GB), H100 (80 GB).
Example output
badgr-gpu-doctor — model: Qwen/Qwen3-32B gpu: RTX4090
✓ Estimated VRAM needed: 19–26 GB (int4 quantization)
✗ Available VRAM: 24 GB (RTX 4090) — borderline, may OOM at runtime
Suggestion: cap max_model_len to reduce KV cache memory, or use a larger GPU.
To run this model in the cloud:
npx badgr-gpu-doctor --model Qwen/Qwen3-32B --run-with-badgrExit codes: 0 = compatible, 1 = incompatible
TypeScript API
import { runGpuDoctor, detectLocalGpu, estimateModelVramGb } from "badgr-gpu-doctor";
// Full preflight check
const result = runGpuDoctor({
model: "Qwen/Qwen3-32B",
gpu: "RTX4090",
quantization: "int4",
});
console.log(result.compatible); // false
console.log(result.estimatedModelVramGb); // { min: 19, max: 26 }
console.log(result.suggestions); // ["Cap max_model_len to reduce KV cache..."]
// Auto-detect the local GPU (calls nvidia-smi)
const gpu = detectLocalGpu();
// → { name: "NVIDIA GeForce RTX 4090", vramGb: 24, cudaVersion: "12.4" }
// Estimate VRAM without running a full check
const vram = estimateModelVramGb("meta-llama/Llama-3.1-70B-Instruct", "int4");
// → { min: 35, max: 45 }Types:
interface GpuDoctorOptions {
model: string;
gpu?: string;
vramGb?: number;
cudaVersion?: string;
quantization?: "auto" | "fp16" | "int8" | "int4" | "none";
runWithBadgr?: boolean;
}
interface GpuDoctorResult {
compatible: boolean;
estimatedModelVramGb: { min: number; max: number };
availableVramGb?: number;
checks: DiagnosticCheck[];
suggestions: string[];
nextCommand?: string;
report: JsonReport;
}Optional: run on AI Badgr instead
If your local GPU can't fit the model, run it in the cloud with no setup:
npm install -g badgr-cli
badgr login
badgr serve Qwen/Qwen3-32B --gpu L40S
# → https://dep-a1b2c3.aibadgr.com/v1 (OpenAI-compatible endpoint)See badgr-cli for full options.
Requirements
- Node.js 18+
nvidia-smiin PATH (optional — only needed for auto-detection)
