@rh-ai-bu/ts-nvml
v0.1.0
Published
TypeScript bindings for NVIDIA Management Library (NVML)
Maintainers
Readme
ts-nvml
TypeScript bindings for NVIDIA Management Library (NVML) using Koffi FFI.
Overview
ts-nvml provides direct access to NVIDIA GPU monitoring capabilities from Node.js, replacing the need to spawn nvidia-smi processes and parse text output. This results in significantly better performance for GPU monitoring applications.
Features
- Direct FFI bindings to
libnvidia-ml.so - Full TypeScript support with comprehensive types
- Result-based error handling (no exceptions for expected failures)
- Zero external runtime dependencies (Koffi is the only dependency)
- Matches the API style of NVIDIA go-nvml
Documentation
For complete API documentation with all functions, types, enums, and examples, see docs/API.md.
Requirements
- Node.js >= 22.0.0
- Linux with NVIDIA GPU and drivers installed
libnvidia-ml.so.1available (installed with NVIDIA drivers)
Installation
npm install ts-nvmlQuick Start
import { Nvml, unwrap } from 'ts-nvml';
// Initialize NVML (required before any operations)
Nvml.init();
// Get device count
const count = Nvml.getDeviceCount();
console.log(`Found ${count} GPU(s)`);
// Query a specific device
const device = Nvml.getDevice(0);
const name = unwrap(device.getName());
const memory = unwrap(device.getMemoryInfo());
const temp = unwrap(device.getTemperature());
console.log(`GPU: ${name}`);
console.log(`Memory: ${Number(memory.used) / 1e9} / ${Number(memory.total) / 1e9} GB`);
console.log(`Temperature: ${temp}°C`);
// Always shutdown when done
Nvml.shutdown();NVML Lifecycle Management
When to Initialize and Shutdown
NVML is designed to be initialized once and kept running for the lifetime of your application. You do not need to call init() and shutdown() for each query.
For long-running services (backends, monitoring daemons):
import { Nvml, unwrap } from 'ts-nvml';
// Initialize once at startup
Nvml.init();
// Register cleanup for graceful shutdown
process.on('SIGINT', () => {
Nvml.shutdown();
process.exit();
});
process.on('SIGTERM', () => {
Nvml.shutdown();
process.exit();
});
// Poll as often as needed - no init/shutdown required
setInterval(() => {
const statuses = unwrap(Nvml.getAllGpuStatus());
// ... use statuses
}, 5000);For one-off scripts:
Nvml.init();
try {
// ... do work
} finally {
Nvml.shutdown();
}Why Keep NVML Initialized?
- Performance:
init()andshutdown()each take ~1-2ms. Calling them repeatedly adds unnecessary overhead. - Design: NVML maintains internal state and is designed for persistent usage.
- Concurrency: The library handles concurrent queries safely when initialized.
When to Call Shutdown
- Process exit (SIGINT, SIGTERM, beforeExit)
- Graceful application shutdown
- When you truly won't need GPU queries anymore
Shutdown releases resources and should always be called before your process exits to ensure clean cleanup.
API Reference
Initialization
// Initialize NVML library
Nvml.init(): void
// Shutdown NVML and release resources
Nvml.shutdown(): void
// Check if initialized
Nvml.isInitialized(): booleanSystem Queries
// Get number of GPUs
Nvml.getDeviceCount(): number
// Get all devices
Nvml.getAllDevices(): Device[]
// Get device by index (0-based)
Nvml.getDevice(index: number): Device
// Get device by UUID
Nvml.getDeviceByUUID(uuid: string): Device
// Get driver and version information
Nvml.getDriverInfo(): Result<DriverInfo>
// Returns: { driverVersion, nvmlVersion, cudaVersion }
// Get complete system snapshot
Nvml.getSystemSnapshot(): Result<SystemSnapshot>
// Returns: { timestamp, driver, gpus, processes }Device Queries
All device methods return Result<T> to handle potential failures gracefully.
const device = Nvml.getDevice(0);
// Basic info
device.getName(): Result<string>
device.getUUID(): Result<string>
// Memory
device.getMemoryInfo(): Result<MemoryInfo>
// Returns: { total: bigint, free: bigint, used: bigint } (bytes)
// Utilization
device.getUtilizationRates(): Result<UtilizationRates>
// Returns: { gpu: number, memory: number } (0-100%)
// Thermal
device.getTemperature(): Result<number> // Celsius
device.getFanSpeed(): Result<number | null> // 0-100% or null if passive
// Power
device.getPowerUsage(): Result<number> // Watts
device.getPowerLimit(): Result<number> // Watts
// Performance
device.getPerformanceState(): Result<NvmlPState> // P0-P15
// PCI
device.getPciInfo(): Result<PciInfo>
// Modes
device.getPersistenceMode(): Result<boolean>
device.getDisplayActive(): Result<boolean>
device.getComputeMode(): Result<NvmlComputeMode>
device.getMigMode(): Result<boolean | null>
// ECC
device.getEccErrorsCorrected(): Result<number | null>
// Processes
device.getProcesses(): Result<GpuProcess[]>
// Returns: [{ gpuIndex, pid, processName, usedMemoryMiB }]
// Convenience methods
device.getBasicInfo(): Result<GpuBasicInfo>
device.getStatus(): Result<GpuStatus> // All fields in one callResult Type
All fallible operations return a Result<T> type for explicit error handling:
import { unwrap, unwrapOr, map, andThen } from 'ts-nvml';
const result = device.getTemperature();
// Pattern matching
if (result.ok) {
console.log(`Temperature: ${result.value}°C`);
} else {
console.error(`Error: ${result.error.message}`);
}
// Unwrap (throws on error)
const temp = unwrap(result);
// Unwrap with default
const temp = unwrapOr(result, 0);
// Map values
const fahrenheit = map(result, (c) => c * 9/5 + 32);
// Chain operations
const status = andThen(
device.getName(),
(name) => device.getTemperature().ok
? ok(`${name}: ${device.getTemperature().value}°C`)
: device.getTemperature()
);Types
interface MemoryInfo {
total: bigint; // bytes
free: bigint;
used: bigint;
}
interface UtilizationRates {
gpu: number; // 0-100
memory: number; // 0-100
}
interface GpuStatus {
index: number;
name: string;
persistenceMode: boolean;
pciBusId: string;
displayActive: boolean;
eccErrorsCorrected: number | null;
fanSpeed: number | null;
temperature: number;
pstate: NvmlPState;
powerDraw: number;
powerLimit: number;
memoryUsedMiB: number;
memoryTotalMiB: number;
utilizationGpu: number;
utilizationMemory: number;
computeMode: NvmlComputeMode;
migMode: boolean | null;
}
interface GpuProcess {
gpuIndex: number;
pid: number;
processName: string;
usedMemoryMiB: number;
}
interface DriverInfo {
driverVersion: string;
nvmlVersion: string;
cudaVersion: string;
}
interface SystemSnapshot {
timestamp: Date;
driver: DriverInfo;
gpus: GpuStatus[];
processes: GpuProcess[];
}Migration from nvidia-smi
If you're currently parsing nvidia-smi output, here's how to migrate:
Before (nvidia-smi)
import { execSync } from 'child_process';
function getGpuInfo() {
const output = execSync('nvidia-smi --query-gpu=name,memory.total,memory.used,temperature.gpu --format=csv,noheader,nounits');
const lines = output.toString().trim().split('\n');
return lines.map(line => {
const [name, total, used, temp] = line.split(', ');
return { name, memoryTotal: parseInt(total), memoryUsed: parseInt(used), temperature: parseInt(temp) };
});
}After (ts-nvml)
import { Nvml, unwrap } from 'ts-nvml';
function getGpuInfo() {
Nvml.init();
try {
const devices = Nvml.getAllDevices();
return devices.map(device => {
const name = unwrap(device.getName());
const memory = unwrap(device.getMemoryInfo());
const temp = unwrap(device.getTemperature());
return {
name,
memoryTotal: Number(memory.total / (1024n * 1024n)),
memoryUsed: Number(memory.used / (1024n * 1024n)),
temperature: temp,
};
});
} finally {
Nvml.shutdown();
}
}Performance Comparison
| Metric | nvidia-smi | ts-nvml | |--------|------------|---------| | Latency | ~50-100ms | ~1-2ms | | CPU overhead | High (process spawn) | Minimal (FFI call) | | Memory | New process per call | Shared library |
Examples
Get Complete System Snapshot
import { Nvml, unwrap } from 'ts-nvml';
Nvml.init();
const snapshot = unwrap(Nvml.getSystemSnapshot());
console.log(`Driver: ${snapshot.driver.driverVersion}`);
console.log(`CUDA: ${snapshot.driver.cudaVersion}`);
for (const gpu of snapshot.gpus) {
console.log(`\nGPU ${gpu.index}: ${gpu.name}`);
console.log(` Temperature: ${gpu.temperature}°C`);
console.log(` Power: ${gpu.powerDraw}W / ${gpu.powerLimit}W`);
console.log(` Memory: ${gpu.memoryUsedMiB} / ${gpu.memoryTotalMiB} MiB`);
console.log(` Utilization: GPU ${gpu.utilizationGpu}%, Memory ${gpu.utilizationMemory}%`);
}
console.log(`\nProcesses: ${snapshot.processes.length}`);
for (const proc of snapshot.processes) {
console.log(` GPU ${proc.gpuIndex} - PID ${proc.pid}: ${proc.processName} (${proc.usedMemoryMiB} MiB)`);
}
Nvml.shutdown();Monitor GPU in a Loop
import { Nvml, unwrap } from 'ts-nvml';
Nvml.init();
const device = Nvml.getDevice(0);
setInterval(() => {
const util = unwrap(device.getUtilizationRates());
const temp = unwrap(device.getTemperature());
const power = unwrap(device.getPowerUsage());
console.log(`GPU: ${util.gpu}% | Mem: ${util.memory}% | Temp: ${temp}°C | Power: ${power.toFixed(1)}W`);
}, 1000);
// Remember to call Nvml.shutdown() on exit
process.on('SIGINT', () => {
Nvml.shutdown();
process.exit();
});Low-Level Bindings
For advanced use cases, you can access the raw NVML bindings:
import {
nvmlInit,
nvmlShutdown,
nvmlDeviceGetCount,
nvmlDeviceGetHandleByIndex,
nvmlDeviceGetMemoryInfo,
// ... etc
} from 'ts-nvml/bindings';Testing
# Run all tests (unit tests don't require GPU)
npm test
# Run only unit tests
npm run test:unit
# Run integration tests (requires GPU)
npm run test:integrationLicense
Apache-2.0
Acknowledgments
- NVIDIA NVML - The underlying library
- Koffi - Excellent FFI library for Node.js
- NVIDIA go-nvml - API design inspiration
