offline-intelligence
v0.1.7
Published
Private On-Device Inference Engine — run LLMs locally with zero configuration
Maintainers
Readme
Offline Intelligence
Private On-Device Inference Engine — run LLMs locally with zero configuration.
Installation
npm install offline-intelligenceQuick Start
const { OfflineIntelligence } = require("offline-intelligence");
// Create SDK — auto-detects your hardware (NVIDIA, AMD, Intel, Apple Silicon)
const sdk = new OfflineIntelligence();
// Download inference engine (first run only, ~200MB, stored in AppData)
sdk.ensureReady();
// Load a GGUF model and start inference
sdk.loadModel("path/to/model.gguf");
// Chat
const response = sdk.chat("What is the capital of France?");
console.log(response.content);
// Cleanup
sdk.close();Configuration
const sdk = new OfflineIntelligence({
ctx_size: 4096, // Context window size
gpu_layers: 28, // GPU layers to offload (0 = CPU only)
threads: 8, // CPU threads
batch_size: 512, // Batch size for prompt processing
env_file: ".env", // Optional .env file (does NOT pollute your environment)
});Full Message Format
const response = sdk.chat([
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in one sentence." }
], {
maxTokens: 500,
temperature: 0.5
});
console.log(response.content);How It Works
new OfflineIntelligence()— Detects GPU hardware, creates data directories. Instant.ensureReady()— Downloads the best inference engine for your hardware. First run only.loadModel(path)— Spawns a local inference server on a free port. Manages all DLLs automatically.chat(messages)— Sends requests to the local server. All inference stays on your machine.
No data leaves your device. No API keys. No internet required after initial setup.
Status
console.log(sdk.status); // 0 = NotStarted, 1 = Degraded, 2 = ReadyPlatform Support
| Platform | GPU Support | |----------|------------| | Windows x64 | NVIDIA (CUDA), AMD (HIP), Intel (SYCL), Vulkan, CPU | | macOS ARM64 | Apple Metal | | macOS x64 | CPU | | Linux x64 | NVIDIA (CUDA), AMD (ROCm), Vulkan, CPU |
License
Apache 2.0
