npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

phantomllm

v1.0.3

Published

Mock server for OpenAI-compatible APIs. Test your LLM integrations with a real HTTP server.

Readme

import { MockLLM } from 'phantomllm';

const mock = new MockLLM();
await mock.start();
mock.given.chatCompletion.willReturn('Hello from the mock!');

// mock.apiBaseUrl => "http://localhost:55123/v1" — ready to plug into any client
await mock.stop();

Table of Contents

Why phantomllm?

  • Real HTTP server — no monkey-patching fetch or http. Your SDK makes actual network calls.
  • Zero confignpm install phantomllm and go. No Docker, no external services, no setup steps.
  • Works with any client — OpenAI SDK, Vercel AI SDK, LangChain, opencode, Python, curl.
  • Responses API — supports both /v1/chat/completions and /v1/responses (used by Vercel AI SDK, opencode, and newer OpenAI clients).
  • Streaming support — SSE chunked responses work exactly like the real OpenAI API.
  • Fast — in-process Fastify server, sub-millisecond response latency.
  • Simple API — fluent given/expect pattern: mock.given.chatCompletion.forModel('gpt-4').willReturn('Hello').

Installation

npm install phantomllm --save-dev
pnpm add -D phantomllm
yarn add -D phantomllm

That's it. No Docker, no image builds, no extra setup.

Getting the Server URL

MockLLM provides two URL getters:

const mock = new MockLLM();
await mock.start();

mock.baseUrl      // "http://127.0.0.1:55123"     — raw host:port
mock.apiBaseUrl   // "http://127.0.0.1:55123/v1"   — includes /v1 prefix

Which one to use:

| Client / Tool | Property | Example | |---------------|----------|---------| | OpenAI SDK (baseURL) | mock.apiBaseUrl | new OpenAI({ baseURL: mock.apiBaseUrl }) | | Vercel AI SDK (baseURL) | mock.apiBaseUrl | createOpenAI({ baseURL: mock.apiBaseUrl }) | | LangChain (configuration.baseURL) | mock.apiBaseUrl | new ChatOpenAI({ configuration: { baseURL: mock.apiBaseUrl } }) | | Plain fetch | mock.baseUrl | fetch(`${mock.baseUrl}/v1/chat/completions`) |

Most SDK clients expect the URL to end with /v1. Use mock.apiBaseUrl and you won't need to think about it.

API Reference

MockLLM

The main class. Starts an in-process HTTP server that implements the OpenAI API.

import { MockLLM } from 'phantomllm';

const mock = new MockLLM();

| Method / Property | Returns | Description | |---|---|---| | await mock.start() | void | Start the mock server. Idempotent — safe to call twice. | | await mock.stop() | void | Stop the server. Idempotent. | | mock.baseUrl | string | Server URL without /v1, e.g. http://127.0.0.1:55123. | | mock.apiBaseUrl | string | Server URL with /v1, e.g. http://127.0.0.1:55123/v1. Pass this to SDK clients. | | mock.given | GivenStubs | Entry point for stubbing responses. | | mock.expect | ExpectConditions | Entry point for configuring server behavior (API key validation, etc.). | | await mock.clear() | void | Remove all stubs and reset server config. Call between tests. |

MockLLM implements Symbol.asyncDispose for automatic cleanup:

{
  await using mock = new MockLLM();
  await mock.start();
  // ... use mock ...
} // mock.stop() called automatically

Chat Completions

Stub POST /v1/chat/completions responses.

// Any request returns this content
mock.given.chatCompletion.willReturn('Hello!');

// Match by model
mock.given.chatCompletion
  .forModel('gpt-4o')
  .willReturn('I am GPT-4o.');

// Match by message content (case-insensitive substring)
mock.given.chatCompletion
  .withMessageContaining('weather')
  .willReturn('Sunny, 72F.');

// Combine matchers — both must match
mock.given.chatCompletion
  .forModel('gpt-4o')
  .withMessageContaining('translate')
  .willReturn('Bonjour!');

| Method | Description | |---|---| | .forModel(model) | Only match requests with this exact model name. | | .withMessageContaining(text) | Only match when any user message contains this substring (case-insensitive). | | .willReturn(content) | Return a chat.completion response with this content. |

Responses API

Stub POST /v1/responses — the newer OpenAI Responses API used by @ai-sdk/openai, opencode, and recent versions of the OpenAI SDK.

// Stubs are shared — chatCompletion stubs automatically work with /v1/responses
mock.given.chatCompletion.willReturn('Works on both endpoints!');

// Or use the dedicated response builder
mock.given.response
  .forModel('gpt-4o')
  .willReturn('Hello from responses!');

// Match by input content (case-insensitive substring)
mock.given.response
  .withInputContaining('weather')
  .willReturn('Sunny, 72F.');

The input field accepts a plain string or an array of messages:

// String input
const response = await client.responses.create({
  model: 'gpt-4o',
  input: 'Hello',
});

// Array input
const response = await client.responses.create({
  model: 'gpt-4o',
  input: [{ role: 'user', content: 'Hello' }],
});

Streaming works with the Responses API event format (response.output_text.delta, etc.):

mock.given.response.willStream(['Hello', ' world']);

const stream = await client.responses.create({
  model: 'gpt-4o',
  input: 'Hi',
  stream: true,
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  }
}

| Method | Description | |---|---| | .forModel(model) | Only match requests with this exact model name. | | .withInputContaining(text) | Only match when the input contains this substring (case-insensitive). | | .willReturn(content) | Return a response object with this content. | | .willStream(chunks) | Return a streamed response with proper Responses API SSE events. |

Note: given.chatCompletion and given.response stubs are interchangeable — both match requests on either endpoint.

Streaming

Return SSE-streamed responses, matching the real OpenAI streaming format.

mock.given.chatCompletion
  .forModel('gpt-4o')
  .willStream(['Hello', ', ', 'world', '!']);

Each string becomes a separate chat.completion.chunk SSE event with delta.content. The stream ends with a chunk containing finish_reason: "stop" followed by data: [DONE].

| Method | Description | |---|---| | .willStream(chunks) | Return a stream of SSE events, one per string in the array. |

Embeddings

Stub POST /v1/embeddings responses.

// Single embedding
mock.given.embedding
  .forModel('text-embedding-3-small')
  .willReturn([0.1, 0.2, 0.3]);

// Batch — multiple vectors for multiple inputs
mock.given.embedding
  .willReturn([
    [0.1, 0.2, 0.3],
    [0.4, 0.5, 0.6],
  ]);

| Method | Description | |---|---| | .forModel(model) | Only match requests with this model. | | .willReturn(vector) | Return a single vector (number[]) or batch of vectors (number[][]). |

Error Simulation

Force error responses to test retry logic, error handling, and fallbacks.

mock.given.chatCompletion.willError(429, 'Rate limit exceeded');
mock.given.chatCompletion.willError(500, 'Internal server error');
mock.given.embedding.willError(400, 'Invalid input');

// Scoped to a specific model
mock.given.chatCompletion
  .forModel('gpt-4o')
  .willError(403, 'Model access denied');

Error responses follow the OpenAI error format:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "api_error",
    "param": null,
    "code": null
  }
}

API Key Validation

Test that your code sends the correct API key.

mock.expect.apiKey('sk-test-key-123');

// Requests without a key or with the wrong key get 401
// { error: { message: "...", type: "authentication_error", code: "invalid_api_key" } }

// Only requests with the correct key succeed
const openai = new OpenAI({
  baseURL: mock.apiBaseUrl,
  apiKey: 'sk-test-key-123', // must match exactly
});

mock.expect configures server constraints at runtime. API key validation applies to all /v1/* endpoints. Admin endpoints (/_admin/*) are always accessible.

Calling mock.clear() resets the API key requirement along with all stubs.

| Method | Description | |---|---| | mock.expect.apiKey(key) | Require Authorization: Bearer <key> on all /v1/* requests. |

Testing auth error handling:

it('handles invalid API key', async () => {
  mock.expect.apiKey('correct-key');
  mock.given.chatCompletion.willReturn('Hello');

  const badClient = new OpenAI({
    baseURL: mock.apiBaseUrl,
    apiKey: 'wrong-key',
  });

  await expect(
    badClient.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Hi' }],
    }),
  ).rejects.toThrow();
});

Stub Matching

When multiple stubs are registered, the most specific match wins:

  1. Specificity — a stub matching both model and content (specificity 2) beats one matching only model (specificity 1), which beats a catch-all (specificity 0).
  2. Registration order — among stubs with equal specificity, the first registered wins.
// Catch-all (specificity 0)
mock.given.chatCompletion.willReturn('Default response');

// Model-specific (specificity 1) — wins over catch-all for gpt-4o
mock.given.chatCompletion
  .forModel('gpt-4o')
  .willReturn('GPT-4o response');

// Model + content (specificity 2) — wins over model-only for matching messages
mock.given.chatCompletion
  .forModel('gpt-4o')
  .withMessageContaining('weather')
  .willReturn('Weather-specific GPT-4o response');

When no stub matches a request, the server returns HTTP 418 with a descriptive error message showing what was requested.

Integration Examples

OpenAI Node.js SDK

import OpenAI from 'openai';
import { MockLLM } from 'phantomllm';

const mock = new MockLLM();
await mock.start();

const openai = new OpenAI({
  baseURL: mock.apiBaseUrl,
  apiKey: 'test-key',
});

// Non-streaming
mock.given.chatCompletion.willReturn('The capital of France is Paris.');

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is the capital of France?' }],
});
console.log(response.choices[0].message.content);
// => "The capital of France is Paris."

// Streaming
mock.given.chatCompletion.willStream(['The capital', ' of France', ' is Paris.']);

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Capital of France?' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// Embeddings
mock.given.embedding
  .forModel('text-embedding-3-small')
  .willReturn([0.1, 0.2, 0.3]);

const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Hello world',
});
console.log(embedding.data[0].embedding); // => [0.1, 0.2, 0.3]

await mock.stop();

Vercel AI SDK

import { generateText, streamText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import { MockLLM } from 'phantomllm';

const mock = new MockLLM();
await mock.start();

const provider = createOpenAI({
  baseURL: mock.apiBaseUrl,
  apiKey: 'test-key',
});

// generateText
mock.given.chatCompletion.willReturn('Paris');

const { text } = await generateText({
  model: provider.chat('gpt-4o'),
  prompt: 'Capital of France?',
});
console.log(text); // => "Paris"

// streamText
mock.given.chatCompletion.willStream(['Par', 'is']);

const result = streamText({
  model: provider.chat('gpt-4o'),
  prompt: 'Capital of France?',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

await mock.stop();

Note: Use provider.chat('model') instead of provider('model') to ensure requests go through /v1/chat/completions.

opencode

Add a provider entry to your opencode.json pointing at the mock:

{
  "provider": {
    "mock": {
      "api": "openai",
      "baseURL": "http://127.0.0.1:PORT/v1",
      "apiKey": "test-key",
      "models": {
        "gpt-4o": { "id": "gpt-4o" }
      }
    }
  }
}

Start the mock and print the URL:

const mock = new MockLLM();
await mock.start();
console.log(`Set baseURL to: ${mock.apiBaseUrl}`);

LangChain

import { ChatOpenAI } from '@langchain/openai';
import { MockLLM } from 'phantomllm';

const mock = new MockLLM();
await mock.start();

mock.given.chatCompletion.willReturn('Hello from LangChain!');

const model = new ChatOpenAI({
  modelName: 'gpt-4o',
  configuration: {
    baseURL: mock.apiBaseUrl,
    apiKey: 'test-key',
  },
});

const response = await model.invoke('Say hello');
console.log(response.content); // => "Hello from LangChain!"

await mock.stop();

Python openai package

The mock server is a real HTTP server — any language can connect to it. Start the mock from Node.js, then use it from Python:

import openai

client = openai.OpenAI(
    base_url="http://127.0.0.1:55123/v1",  # use mock.apiBaseUrl
    api_key="test-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Plain fetch

const response = await fetch(`${mock.baseUrl}/v1/chat/completions`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello' }],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

curl

curl http://127.0.0.1:55123/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Test Framework Integration

Vitest

import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'vitest';
import { MockLLM } from 'phantomllm';
import OpenAI from 'openai';

describe('my LLM feature', () => {
  const mock = new MockLLM();
  let openai: OpenAI;

  beforeAll(async () => {
    await mock.start();
    openai = new OpenAI({ baseURL: mock.apiBaseUrl, apiKey: 'test' });
  });

  afterAll(async () => {
    await mock.stop();
  });

  beforeEach(async () => {
    await mock.clear();
  });

  it('should summarize text', async () => {
    mock.given.chatCompletion
      .withMessageContaining('summarize')
      .willReturn('This is a summary.');

    const res = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Please summarize this article.' }],
    });

    expect(res.choices[0].message.content).toBe('This is a summary.');
  });

  it('should handle rate limits', async () => {
    mock.given.chatCompletion.willError(429, 'Rate limit exceeded');

    await expect(
      openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: 'Hello' }],
      }),
    ).rejects.toThrow();
  });

  it('should stream responses', async () => {
    mock.given.chatCompletion.willStream(['Hello', ' World']);

    const stream = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Hi' }],
      stream: true,
    });

    const chunks: string[] = [];
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) chunks.push(content);
    }
    expect(chunks).toEqual(['Hello', ' World']);
  });
});

Jest

import { MockLLM } from 'phantomllm';
import OpenAI from 'openai';

describe('my LLM feature', () => {
  const mock = new MockLLM();
  let openai: OpenAI;

  beforeAll(async () => {
    await mock.start();
    openai = new OpenAI({ baseURL: mock.apiBaseUrl, apiKey: 'test' });
  });

  afterAll(() => mock.stop());
  beforeEach(() => mock.clear());

  test('returns stubbed response', async () => {
    mock.given.chatCompletion.willReturn('Mocked!');

    const res = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Hi' }],
    });

    expect(res.choices[0].message.content).toBe('Mocked!');
  });
});

Shared Fixture for Multi-File Suites

Start one server for your entire test suite:

tests/support/mock.ts

import { MockLLM } from 'phantomllm';

export const mock = new MockLLM();

export async function setup() {
  await mock.start();
  process.env.PHANTOMLLM_URL = mock.apiBaseUrl;
}

export async function teardown() {
  await mock.stop();
}

vitest.config.ts

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globalSetup: ['./tests/support/mock.ts'],
  },
});

Individual test files

import { mock } from '../support/mock.js';

beforeEach(() => mock.clear());

it('works', async () => {
  mock.given.chatCompletion.willReturn('Hello!');
  // ...
});

Performance

The mock server runs in-process using Fastify — no Docker overhead:

| Metric | Value | |--------|-------| | Server startup | < 5ms | | Chat completion response | ~0.2ms median | | Streaming (8 chunks) | ~0.2ms total | | Embedding (1536-dim) | ~0.3ms median | | Throughput | ~11,000 req/s |

Tips:

  • Start the server once in beforeAll, call mock.clear() between tests.
  • Use a shared fixture for multi-file test suites.

Troubleshooting

| Problem | Cause | Solution | |---|---|---| | ServerNotStartedError | Using baseUrl, given, or clear() before start(). | Call await mock.start() first. | | Stubs leaking between tests | Stubs persist until cleared. | Call await mock.clear() in beforeEach. | | 418 response | No stub matches the request. | Register a stub matching the model/content, or add a catch-all: mock.given.chatCompletion.willReturn('...'). | | AI SDK uses Responses API | provider('model') defaults to /v1/responses in v3+. | Both endpoints are supported. Use provider.chat('model') if you specifically need /v1/chat/completions. |

License

MIT