litellm-pyodide
v1.0.0
Published
Run LiteLLM in a multi-worker setup on the Web with Pyodide
Downloads
117
Readme
litellm-pyodide
Run LiteLLM inside Pyodide workers from one JavaScript client for Node and the browser.
This package keeps the Python runtime, wheel install, and worker boot path inside the package. You call LiteLLM-style endpoints from JavaScript, and the package handles Pyodide, the local wheelhouse, and callback forwarding behind the scenes.
Audience
This library is intended for harness developers and advanced users who want to run LiteLLM-compatible providers in JS environments without reinventing the wheel.
Features
- Single public ESM entrypoint.
- Browser and Node runtime detection.
- Local Pyodide assets. No CDN fallback.
- LiteLLM-compatible endpoint surface for chat completions, messages, responses, and embeddings.
- ReadableStream support for streamed chat, messages, and responses calls.
- EventEmitter-style callback forwarding on the main thread.
- Worker-backed execution so Python startup and request work stay off the main thread.
- Optional browser-side
corsBusterUrlrewriting for cross-origin requests that need a CORS proxy. - A React + WebLLM GitHub Pages demo with one shared service-worker engine and IndexedDB-backed model reuse.
Installation
npm install litellm-pyodideNode 18+ is required.
Browser usage assumes a normal npm build pipeline that resolves package dependencies and serves the built package assets over HTTP.
Quick Start
import { createClient } from "litellm-pyodide";
const client = createClient({
maxWorkers: 1,
warmup: true,
corsBusterUrl: "https://cors-anywhere.example/",
});
const response = await client.chatCompletions.create({
model: "openai/gpt-4o-mini",
api_key: process.env.OPENAI_API_KEY,
messages: [{ role: "user", content: "Hello from Pyodide" }],
});
console.log(response);
await client.close();Supported APIs
client.chatCompletions.create(...)
client.messages.create(...)
client.responses.create(...)
client.embeddings.create(...)Non-stream requests resolve to JSON-safe LiteLLM response payloads.
Stream requests resolve to ReadableStream values whose chunks preserve endpoint-native event shapes.
Chat Completions
const response = await client.chatCompletions.create({
model: "openai/gpt-4o-mini",
api_key: process.env.OPENAI_API_KEY,
messages: [
{ role: "system", content: "Be concise" },
{ role: "user", content: "Explain Pyodide in one sentence" },
],
});Messages
const response = await client.messages.create({
model: "anthropic/claude-3-5-sonnet-20241022",
api_key: process.env.ANTHROPIC_API_KEY,
messages: [{ role: "user", content: "Hello" }],
max_tokens: 256,
});Responses
const response = await client.responses.create({
model: "openai/gpt-4.1-mini",
api_key: process.env.OPENAI_API_KEY,
input: [{ role: "user", content: "Summarize this repository" }],
});Embeddings
const response = await client.embeddings.create({
model: "openai/text-embedding-3-small",
api_key: process.env.OPENAI_API_KEY,
input: "embed this string",
});Streaming
Streaming returns a ReadableStream.
const stream = await client.responses.create({
model: "openai/gpt-4.1-mini",
api_key: process.env.OPENAI_API_KEY,
input: [{ role: "user", content: "Stream a short answer" }],
stream: true,
});
const reader = stream.getReader();
for (;;) {
const next = await reader.read();
if (next.done) break;
console.log(next.value);
}Events
The client exposes one shared EventEmitter-compatible event surface:
client.events.on("callback:pre_api_call", (payload) => {
console.log(payload);
});
client.events.on("callback:success", (payload) => {
console.log(payload);
});
client.events.on("request:stream_chunk", (payload) => {
console.log(payload);
});
client.events.on("worker:ready", (payload) => {
console.log(payload);
});Current worker and request events:
callback:pre_api_callcallback:post_api_callcallback:stream_eventcallback:successcallback:failurerequest:stream_chunkrequest:completedworker:bootworker:readyworker:errorworker:shutdown
Event payloads are serialized JavaScript objects, not raw Python objects.
Browser Proxy Support
Browser requests can optionally rewrite their final outbound URL through a CORS proxy.
const client = createClient({
maxWorkers: 1,
corsBusterUrl: "https://cors-anywhere.example/",
});
await client.chatCompletions.create({
model: "openai/gpt-4o-mini",
api_base: "https://api.openai.com",
api_key: "browser-key",
messages: [{ role: "user", content: "hello" }],
});The runtime applies the proxy only after the final provider URL is resolved. It deliberately bypasses proxying when:
- no proxy base is configured
- the target is same-origin with the current browser page
- the target has already been prefixed with the proxy base
When the proxy is used, the runtime also adds X-Requested-With: litellm-pyodide.
GitHub Pages Demo
This repository now includes a React demo under demo/ that exercises the built package artifact against same-origin service-worker routes backed by WebLLM.
Fixed demo models:
- chat:
SmolLM2-360M-Instruct-q4f16_1-MLC - embeddings:
snowflake-arctic-embed-m-q0f32-MLC-b4
The demo uses one shared WebLLM service-worker engine with cacheBackend: "indexeddb". The UI explicitly shows whether the browser is still downloading model assets or reusing cached artifacts from previous visits.
Local demo routes:
POST /demo-openai/v1/chat/completionsPOST /demo-openai/v1/responsesPOST /demo-anthropic/v1/messagesPOST /demo-openai/v1/embeddings
Demo Commands
npm run demo:install
npm run demo:build
npm run demo:preview
npm run demo:testdemo:build rebuilds the package dist output, copies it into demo/public/litellm-pyodide, and then builds the Vite app.
demo:test runs the demo unit tests and a browser smoke path against the built demo in mock mode so the Pages workflow can validate the transport path without requiring WebGPU in CI.
Warmup And Health
Pyodide startup is the expensive path. If you want to pay that cost before the first request, call warmup explicitly.
await client.warmup();
const health = await client.health();
console.log(health);How It Works
build.tsbundles the public client and worker runtime with webpack.- The build copies Pyodide assets into
dist/internal/pyodide. - The build stages the Python bridge into
dist/internal/python. - The build produces a local wheelhouse and runtime manifest under
dist/internal. - The worker boots Pyodide from package-local assets, installs the local wheels, imports the bridge, and then serves requests.
Debugging
This package uses the debug package for runtime logging.
DEBUG=litellmPyodide:* npm testUseful namespaces:
litellmPyodide:buildlitellmPyodide:clientlitellmPyodide:poollitellmPyodide:workerlitellmPyodide:bridge
Limitations
- This is an SDK-side runtime. Browser usage does not hide provider credentials from end users.
- Cold start is slower than a JavaScript-only SDK because Pyodide and Python wheels must boot first.
- The package ships a Pyodide-specific LiteLLM overlay for the endpoint families exposed here, not full upstream LiteLLM parity.
- Browser usage requires correct static asset serving for the internal Pyodide, wheel, and worker assets.
- Native CPython extension wheels are out of scope unless they are Pyodide-compatible.
Build And Test
npm run build
npm run typecheck
npm test
npm run demo:testThe test suite covers:
- runtime detection
- metadata injection
- Node integration for all four endpoint families
- stream handling and abort behavior
- callback forwarding and request scoping
- browser worker smoke coverage against the built dist output
- demo startup-state helpers and route adapters
- demo browser smoke coverage against the built Pages app in mock mode
Repository Layout
build.ts: build orchestrationsrc/: TypeScript client, pool adapter, worker runtime, and helperspython/bridge.py: package-owned Python bridgepython/litellm_overlay/: Pyodide-compatible LiteLLM overlaypython/wheels-source.json: pinned wheel source manifesttests/: unit, integration, and browser smoke tests
