npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

web-llm-middleware

v1.4.0

Published

OpenAI-compatible middleware for running WebLLM models locally with offline support

Readme

web-llm-middleware

https://github.com/user-attachments/assets/4d5a6160-9985-4e63-b812-fe595e84c0af

🚀 Usage

Basic Middleware Integration

The WebLLM middleware provides an OpenAI-compatible API for running large language models locally in the browser.

Node.js HTTP Server

import { createServer } from 'node:http';
import { parse } from 'node:url';
import { WebLLMMiddleware } from 'web-llm-middleware';

const webllm = new WebLLMMiddleware({
  dev: true, // Enable development logging
  model: 'Llama-3.2-1B-Instruct-q4f32_1-MLC',
});

const handler = webllm.getRequestHandler();

const server = createServer((req, res) => {
  const parsedUrl = parse(req.url ?? '/', true);
  handler(req, res, parsedUrl);
});

server.listen(15408, () => {
  console.log('WebLLM server running on http://localhost:15408');
});

Express.js Integration

import express from 'express';
import { WebLLMMiddleware } from 'web-llm-middleware';

const app = express();
const webllm = new WebLLMMiddleware({
  dev: process.env.NODE_ENV === 'development',
  dir: './public',
  model: 'Llama-3.2-1B-Instruct-q4f32_1-MLC',
});

const handler = webllm.getRequestHandler();

// Use WebLLM middleware for all requests
app.use((req, res, next) => {
  handler(req, res);
});

app.listen(15408, () => {
  console.log('Express + WebLLM server running on http://localhost:15408');
});

Next.js API Route

// pages/api/chat.ts or app/api/chat/route.ts
import { WebLLMMiddleware } from 'web-llm-middleware';

const webllm = new WebLLMMiddleware({
  dev: process.env.NODE_ENV === 'development',
  dir: './public',
  model: 'Llama-3.2-1B-Instruct-q4f32_1-MLC',
});

const handler = webllm.getRequestHandler();

export default function chatHandler(req: any, res: any) {
  return handler(req, res);
}

Configuration Options

interface WebLLMMiddlewareOptions {
  model: string; // Model ID to initialize
  dev?: boolean; // Enable development logging (default: false)
}

Available Models

The middleware supports 36+ models including:

  • Llama Series: 3, 3.1, 3.2 (1B, 3B, 8B, 70B)
  • Qwen Series: 1.5, 2, 2.5, 3 with Math/Coder variants
  • Phi Series: 3, 3.5 mini and vision models
  • SmolLM: Lightweight 135M, 360M, 1.7B models
  • Gemma, Hermes, Mistral: Various sizes and specializations

See /v1/models endpoint for the complete list.

🤖 Vercel AI SDK Integration

The middleware is fully compatible with Vercel AI SDK's generateText and streamText functions:

import { generateText, streamText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';

const openai = createOpenAI({
  baseURL: 'http://localhost:15408/v1',
  apiKey: 'not-needed',
});

// Non-streaming text generation
const { text } = await generateText({
  model: openai('Llama-3.2-1B-Instruct-q4f32_1-MLC'),
  prompt: 'Write a short story about a robot.',
});

// Streaming text generation
const { textStream } = await streamText({
  model: openai('Llama-3.2-1B-Instruct-q4f32_1-MLC'),
  prompt: 'Write a creative story...',
});

for await (const textPart of textStream) {
  process.stdout.write(textPart);
}

Both functions use the standard OpenAI /v1/chat/completions endpoint with automatic streaming detection.

🛠️ Development

This project uses:

  • TypeScript for type safety
  • ES Modules for modern JavaScript
  • tsx for running TypeScript files directly
  • Strict mode enabled in TypeScript for better type checking

Building

To build the project:

pnpm run build

This will compile TypeScript files from src/ to JavaScript in dist/.

Development Mode

For development with automatic reloading:

pnpm run dev

🧪 Testing

Quick Start Testing

  1. Start the test server:

    pnpm test:server
  2. Test Vercel AI SDK integration:

    pnpm test:ai-sdk
  3. Test chat completions endpoint with curl:

    curl -X POST http://localhost:15408/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d @./example/hello.json | jq .choices

API Endpoints Testing

1. Health Check

curl -X GET http://localhost:15408/health | jq

Expected response:

{
  "status": "healthy",
  "webllm_initialized": true,
  "timestamp": "2024-06-19T..."
}

2. List Available Models

curl -X GET http://localhost:15408/v1/models | jq .data

Returns array of 36 supported models including Llama, Phi, Qwen, and other series.

3. Chat Completions

Using example file:

curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @./example/hello.json

Custom request:

curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "Llama-3.2-1B-Instruct-q4f32_1-MLC",
    "max_tokens": 50,
    "temperature": 0.7
  }'

Offline Functionality Verification

  1. Disconnect from internet or block external requests
  2. Start server: pnpm test:server
  3. Verify WebLLM loads: Check that lib/web-llm.js (5.6MB) is served locally
  4. Test completion: Use any of the above curl commands
  5. Check logs: Server should show WebLLM initialization without external requests

Testing Different Models

Test various model families:

# Small model (fast)
curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hi"}], "model": "SmolLM-135M-Instruct-q4f16_1-MLC"}'

# Math specialist
curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What is 15 * 23?"}], "model": "Qwen2-Math-7B-Instruct-q4f16_1-MLC"}'

# Code specialist
curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Write a hello world in Python"}], "model": "Qwen2.5-Coder-7B-Instruct-q4f16_1-MLC"}'

Performance Testing

Monitor initialization and response times:

time curl -X POST http://localhost:15408/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @./example/hello.json

Terminate Server Process

lsof -ti:15408 | xargs kill -9

📝 License

MIT