npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

foundry-local-sdk

v1.1.0

Published

Foundry Local JavaScript SDK

Downloads

21,208

Readme

Foundry Local JS SDK

The Foundry Local JS SDK provides a JavaScript/TypeScript interface for running AI models locally on your machine. Discover, download, load, and run inference — all without cloud dependencies.

Features

  • Local-first AI — Run models entirely on your machine with no cloud calls
  • Model catalog — Browse and discover available models, check what's cached or loaded
  • Automatic model management — Download, load, unload, and remove models from cache
  • Chat completions — OpenAI-compatible chat API with both synchronous and streaming responses
  • Embeddings — Generate text embeddings via OpenAI-compatible API
  • Audio transcription — Transcribe audio files locally with streaming support
  • Multi-variant models — Models can have multiple variants (e.g., different quantizations) with automatic selection of the best cached variant
  • Embedded web service — Start a local HTTP service for OpenAI-compatible API access
  • WinML support — Automatic execution provider download on Windows for NPU/GPU acceleration
  • Configurable inference — Control temperature, max tokens, top-k, top-p, frequency penalty, and more

Installation

npm install foundry-local-sdk

TypeScript support

The package is authored in TypeScript and ships with bundled type declarations (.d.ts files) alongside the compiled JavaScript. No @types/foundry-local-sdk package or manual ambient declarations are needed.

Importing from foundry-local-sdk in a TypeScript project gives you full type information and IntelliSense for every public API, including FoundryLocalManager, Catalog, ChatClient, AudioClient, EmbeddingClient, ResponsesClient, LiveAudioTranscriptionSession, and all of their associated option and response types.

WinML: Automatic Hardware Acceleration (Windows)

On Windows, install the WinML package to enable automatic execution provider management. The SDK will automatically discover, download, and register hardware-specific execution providers (e.g., Qualcomm QNN for NPU acceleration) via the Windows App Runtime — no manual driver or EP setup required.

Note: foundry-local-sdk-winml is a Windows-only package. Its install script downloads WinML artifacts during installation and may fail on macOS or Linux.

npm install foundry-local-sdk-winml

When WinML is enabled:

  • Execution providers like QNNExecutionProvider, OpenVINOExecutionProvider, etc. are downloaded and registered on the fly, enabling NPU/GPU acceleration without manual configuration
  • No code changes needed — your application code stays the same whether WinML is enabled or not

Explicit EP Management

You can explicitly discover and download execution providers using the discoverEps() and downloadAndRegisterEps() methods:

// Discover available EPs and their status
const eps = manager.discoverEps();
for (const ep of eps) {
    console.log(`${ep.name} — registered: ${ep.isRegistered}`);
}

// Download and register all available EPs
const result = await manager.downloadAndRegisterEps();
console.log(`Success: ${result.success}, Status: ${result.status}`);

// Download only specific EPs
const result2 = await manager.downloadAndRegisterEps([eps[0].name]);

Per-EP download progress

Pass an optional progressCallback to receive (epName, percent) updates as each EP downloads (percent is 0–100):

let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') {
            process.stdout.write('\n');
        }
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName}  ${percent.toFixed(1)}%`);
});
process.stdout.write('\n');

Catalog access does not block on EP downloads. Call downloadAndRegisterEps() when you need hardware-accelerated execution providers.

Quick Start

import { FoundryLocalManager } from 'foundry-local-sdk';

const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Get the model object
const modelAlias = 'qwen2.5-0.5b';
const model = await manager.catalog.getModel(modelAlias);

// Download the model
console.log(`\nDownloading model ${modelAlias}...`);
await model.download((progress) => {
    process.stdout.write(`\rDownloading... ${progress.toFixed(2)}%`);
});

// Load the model
await model.load();

// Create chat client
const chatClient = model.createChatClient();

// Example chat completion
console.log('\nTesting chat completion...');
const completion = await chatClient.completeChat([
    { role: 'user', content: 'Why is the sky blue?' }
]);
console.log(completion.choices[0]?.message?.content);

// Example streaming completion
console.log('\nTesting streaming completion...');
for await (const chunk of chatClient.completeStreamingChat(
    [{ role: 'user', content: 'Write a short poem about programming.' }]
)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
    }
}
console.log('\n');

// Unload the model
await model.unload();

Usage

Browsing the Model Catalog

The Catalog lets you discover what models are available, which are already cached locally, and which are currently loaded in memory.

const catalog = manager.catalog;

// List all available models
const models = await catalog.getModels();
models.forEach(model => {
    console.log(`${model.alias} — cached: ${model.isCached}`);
});

// See what's already downloaded
const cached = await catalog.getCachedModels();

// See what's currently loaded in memory
const loaded = await catalog.getLoadedModels();

Loading and Running Models

Each model can have multiple variants (different quantizations or formats). The SDK automatically selects the best available variant, preferring cached versions. All models implement the IModel interface.

const model = await catalog.getModel('qwen2.5-0.5b');

// Download if not cached (with optional progress tracking)
if (!model.isCached) {
    await model.download((progress) => {
        console.log(`Download: ${progress}%`);
    });
}

// Load into memory and run inference
await model.load();
const chatClient = model.createChatClient();

You can also select a specific variant manually:

const variants = model.variants;
model.selectVariant(variants[0]);

Chat Completions

The ChatClient follows the OpenAI Chat Completion API structure.

const chatClient = model.createChatClient();

// Configure settings
chatClient.settings.temperature = 0.7;
chatClient.settings.maxTokens = 800;
chatClient.settings.topP = 0.9;

// Synchronous completion
const response = await chatClient.completeChat([
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms.' }
]);
console.log(response.choices[0].message.content);

Streaming Responses

For real-time output, use streaming:

for await (const chunk of chatClient.completeStreamingChat(
    [{ role: 'user', content: 'Write a short poem about programming.' }]
)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
    }
}

Embeddings

Generate text embeddings using the EmbeddingClient:

const embeddingClient = model.createEmbeddingClient();

// Single input
const response = await embeddingClient.generateEmbedding(
    'The quick brown fox jumps over the lazy dog'
);
const embedding = response.data[0].embedding; // number[]
console.log(`Dimensions: ${embedding.length}`);

// Batch input
const batchResponse = await embeddingClient.generateEmbeddings([
    'The quick brown fox',
    'The capital of France is Paris'
]);
// batchResponse.data[0].embedding, batchResponse.data[1].embedding

Audio Transcription

Transcribe audio files locally using the AudioClient:

const audioClient = model.createAudioClient();
audioClient.settings.language = 'en';

// Synchronous transcription
const result = await audioClient.transcribe('/path/to/audio.wav');

// Streaming transcription
for await (const chunk of audioClient.transcribeStreaming('/path/to/audio.wav')) {
    console.log(chunk);
}

Embedded Web Service

Start a local HTTP server that exposes an OpenAI-compatible API:

manager.startWebService();
console.log('Service running at:', manager.urls);

// Use with any OpenAI-compatible client library
// ...

manager.stopWebService();

Configuration

The SDK is configured via FoundryLocalConfig when creating the manager:

| Option | Description | Default | |--------|-------------|---------| | appName | Required. Application name for logs and telemetry. | — | | appDataDir | Directory where application data should be stored | ~/.{appName} | | logLevel | Logging level: trace, debug, info, warn, error, fatal | warn | | modelCacheDir | Directory for downloaded models | ~/.{appName}/cache/models | | logsDir | Directory for log files | ~/.{appName}/logs | | libraryPath | Path to native Foundry Local Core libraries | Auto-discovered | | serviceEndpoint | URL of an existing external service to connect to | — | | webServiceUrls | URL(s) for the embedded web service to bind to | — |

API Reference

Auto-generated class documentation lives in docs/classes/:

Contributing: Building from Source

Prerequisites

  • Node.js 20+
  • Python 3.x — required by node-gyp for compiling the native addon
  • C/C++ toolchain:
    • Windows: Visual Studio Build Tools (the "Desktop development with C++" workload)
    • Linux: build-essential (apt install build-essential)
    • macOS: Xcode Command Line Tools (xcode-select --install)

Build Steps

# 1. Install JS dependencies (also downloads native core binaries)
npm install

# 2. Build the Node-API native addon (compiles C code and copies to prebuilds/)
npm run build:native

# 3. Build the TypeScript source
npm run build

# 4. Run tests
npm test

# 5. Pack the SDK into a .tgz (includes prebuilt addon for your platform)
npm run pack

Note: npm run build:native compiles the addon only for your current platform. The published npm package includes prebuilt addons for all supported platforms (win32-x64, win32-arm64, linux-x64, darwin-arm64), which are compiled in CI.

Running Tests

npm test

See test/README.md for details on prerequisites and setup.

Running Examples

npm run example

This runs the chat completion example in examples/chat-completion.ts.