@copilotkit/llmock
v1.6.0
Published
Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)
Readme
@copilotkit/llmock

Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via OPENAI_BASE_URL / ANTHROPIC_BASE_URL and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
Quick Start
npm install @copilotkit/llmockimport { LLMock } from "@copilotkit/llmock";
const mock = new LLMock({ port: 5555 });
mock.onMessage("hello", { content: "Hi there!" });
const url = await mock.start();
// Point your OpenAI client at `url` instead of https://api.openai.com
// ... run your tests ...
await mock.stop();When to Use This vs MSW
MSW (Mock Service Worker) is a popular API mocking library, but it solves a different problem.
The key difference is architecture. llmock runs a real HTTP server on a port. MSW patches http/https/fetch modules inside a single Node.js process. MSW can only intercept requests from the process that calls server.listen() — child processes, separate services, and workers are unaffected.
This matters for E2E tests where multiple processes make LLM API calls:
Playwright test runner (Node)
└─ controls browser → Next.js app (separate process)
└─ OPENAI_BASE_URL → llmock :5555
├─ Mastra agent workers
├─ LangGraph workers
└─ CopilotKit runtimeMSW can't intercept any of those calls. llmock can — it's a real server on a real port.
Use llmock when:
- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
- You prefer defining fixtures as JSON files rather than code
- You need a standalone CLI server
Use MSW when:
- All API calls originate from a single Node.js process (unit tests, SDK client tests)
- You're mocking many different APIs, not just OpenAI
- You want in-process interception without running a server
| Capability | llmock | MSW |
| ---------------------------- | --------------------- | ------------------------------------------------------------------------- |
| Cross-process interception | Yes (real server) | No (in-process only) |
| OpenAI Chat Completions SSE | Built-in | Manual — build data: {json}\n\n + [DONE] yourself |
| OpenAI Responses API SSE | Built-in | Manual — MSW's sse() sends data: events, not OpenAI's event: format |
| Claude Messages API SSE | Built-in | Manual — build event:/data: SSE yourself |
| Gemini streaming | Built-in | Manual — build data: SSE yourself |
| WebSocket APIs | Built-in | No |
| Fixture file loading (JSON) | Yes | No — handlers are code-only |
| Request journal / inspection | Yes | No — track requests manually |
| Non-streaming responses | Yes | Yes |
| Error injection (one-shot) | Yes | Yes (via server.use()) |
| CLI for standalone use | Yes | No |
| Zero dependencies | Yes | No (~300KB) |
Features
- Multi-provider support — OpenAI Chat Completions, OpenAI Responses, Anthropic Claude, Google Gemini, AWS Bedrock (streaming + Converse), Azure OpenAI, Vertex AI, Ollama, Cohere
- Embeddings API — OpenAI-compatible embedding responses with configurable dimensions
- Structured output / JSON mode —
response_format,json_schema, and function calling - Sequential responses — Stateful multi-turn fixtures that return different responses on each call
- Streaming physics — Configurable
ttft,tps, andjitterfor realistic timing - WebSocket APIs — OpenAI Responses WS, Realtime API, and Gemini Live
- Error injection — One-shot errors, rate limiting, and provider-specific error formats
- Chaos testing — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
- Prometheus metrics — Request counts, latencies, and fixture match rates at
/metrics - Request journal — Record, inspect, and assert on every request
- Fixture validation — Schema validation at load time with
--validate-on-load - CLI with hot-reload — Standalone server with
--watchfor live fixture editing - Docker + Helm — Container image and Helm chart for CI/CD pipelines
- Record-and-replay — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
- Drift detection — Daily CI runs against real APIs to catch response format changes
- Claude Code integration —
/write-fixturesskill teaches your AI assistant how to write fixtures correctly
CLI Quick Reference
llmock [options]| Option | Short | Default | Description |
| -------------------- | ----- | ------------ | ------------------------------------------- |
| --port | -p | 4010 | Port to listen on |
| --host | -h | 127.0.0.1 | Host to bind to |
| --fixtures | -f | ./fixtures | Path to fixtures directory or file |
| --latency | -l | 0 | Latency between SSE chunks (ms) |
| --chunk-size | -c | 20 | Characters per SSE chunk |
| --watch | -w | | Watch fixture path for changes and reload |
| --log-level | | info | Log verbosity: silent, info, debug |
| --validate-on-load | | | Validate fixture schemas at startup |
| --chaos-drop | | 0 | Chaos: probability of 500 errors (0-1) |
| --chaos-malformed | | 0 | Chaos: probability of malformed JSON (0-1) |
| --chaos-disconnect | | 0 | Chaos: probability of disconnect (0-1) |
| --metrics | | | Enable Prometheus metrics at /metrics |
| --record | | | Record mode: proxy unmatched to real APIs |
| --strict | | | Strict mode: fail on unmatched requests |
| --provider-* | | | Upstream URL per provider (with --record) |
| --help | | | Show help |
# Start with bundled example fixtures
llmock
# Custom fixtures on a specific port
llmock -p 8080 -f ./my-fixtures
# Simulate slow responses
llmock --latency 100 --chunk-size 5
# Record mode: proxy unmatched requests to real APIs and save as fixtures
llmock --record --provider-openai https://api.openai.com --provider-anthropic https://api.anthropic.com
# Strict mode in CI: fail if any request doesn't match a fixture
llmock --strict -f ./fixturesDocumentation
Full API reference, fixture format, E2E patterns, and provider-specific guides:
https://llmock.copilotkit.dev/docs.html
Real-World Usage
CopilotKit uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs.
License
MIT
