llmock

v3.3.2

Published

20 days ago

A configurable mock LLM API server for testing and development

0High
0Medium
0Low

piyook

llm mock api server testing development openai chatgpt gemini embeddings

LLMock — Local Mock LLM API

A lightweight local server that simulates LLM APIs for development and testing. Build and test AI-powered applications without API costs or an internet connection.

Why LLMock?

Free and fast — no API costs, instant responses for rapid prototyping
Consistent testing — predictable, repeatable responses for testing UI logic
Offline capable — works without internet connectivity
Full visibility — complete request logging and a live dashboard
Realistic simulation — configurable delays, SSE streaming, and mock embeddings
OpenAI-compatible — works with ChatGPT, Grok, Llama, DeepSeek, Gemini, and any OpenAI-style API

Built on Fastify for high performance and reliability.

Quick Start

Prerequisites: Node.js 20+

The fastest way to get started is with the scaffolding tool, which creates a complete project with configuration files and example templates:

npm create llmock@latest my-project
cd my-project
npm install
npm run llmock:start

Open http://localhost:8001 to see the live dashboard.

That's it. To run against a specific model preset:

npm run llmock:chatgpt      # OpenAI ChatGPT-style (default)
npm run llmock:gemini       # Google Gemini format
npm run llmock:streaming    # OpenAI-style with SSE streaming
npm run llmock:embeddings   # Optimised for embeddings/RAG testing

Scaffolded project layout

my-project/
├── package.json
├── .llmockrc.json
├── README.md
├── requests/
│   ├── openai-chat.json
│   └── gemini-chat.json
└── responses/
    ├── openai-chat-response.json
    └── gemini-chat-response.json

Installation Options

Option 1: Scaffolded project (recommended)

npm create llmock@latest my-project

Generates a ready-to-use project with configuration, templates, and Docker support.

Option 2: Add to an existing project

npm install llmock
# or globally
npm install -g llmock

Then use the CLI directly:

llmock start                        # default: ChatGPT model, port 8001
llmock start --model=gemini
llmock start --port=3000 --stream=true
llmock stop
llmock config                       # show current settings
llmock help

All CLI flags support both --key=value and --key value formats and override .llmockrc.json at runtime.

Foreground Mode

For Docker containers or when you want the server to stay attached to your terminal:

llmock start --foreground

The --foreground flag keeps the server process attached and forwards all output to your console. This is essential for Docker containers and useful for debugging. Without this flag, the server runs as a detached background process.

Windows users: In normal mode, the server may create a minimized terminal window. This is expected behavior for background processes on Windows. Use llmock start --foreground if you want to keep the server visible in your terminal.

Option 3: Docker

npm create llmock@latest my-project
cd my-project
npm run docker:start

See Docker Support for full details.

Configuration

Configuration file (`.llmockrc.json`)

All settings live in .llmockrc.json in your project root. CLI flags always override these values.

{
  "defaultModel": "chatgpt",
  "models": {
    "chatgpt": {
      "name": "openai",
      "model": "gpt-4o",
      "endpoint": "chatgpt/chat/completions",
      "responseType": "lorem",
      "maxLoremParas": 8,
      "validateRequests": true,
      "logRequests": true,
      "debug": false,
      "stream": false,
      "responseDelay": {
        "min": 3000,
        "max": 5000
      },
      "embeddings": {
        "enabled": true,
        "dimensions": 128
      }
    }
  },
  "server": {
    "port": 8001,
    "host": "0.0.0.0"
  }
}

Configuration reference:

| Option | Description | |---|---| | name | LLM provider name (used for template loading) | | model | Model identifier (e.g. gpt-4o, gemini-pro) | | endpoint | API endpoint path | | responseType | "lorem" (random text) or "stored" (predefined responses) | | maxLoremParas | Max sentences in lorem ipsum responses | | validateRequests | Validate incoming requests against templates | | logRequests | Save requests to the log file | | debug | Enable verbose console logging | | stream | Return SSE streaming responses | | responseDelay.min/max | Response delay range in milliseconds | | embeddings.enabled | Enable the /v1/embeddings endpoint | | embeddings.dimensions | Embedding vector size |

Adding custom models

Extend the models object with any additional preset, then start with --model=<name>:

{
  "models": {
    "my-model": {
      "name": "openai",
      "model": "gpt-3.5-turbo",
      "endpoint": "api/v1/chat/completions",
      "responseType": "stored",
      "validateRequests": true,
      "logRequests": false,
      "debug": true,
      "stream": false,
      "responseDelay": { "min": 1000, "max": 2000 },
      "embeddings": { "enabled": false, "dimensions": 64 }
    }
  }
}

llmock start --model=my-model

Response types

Lorem ipsum — generates random placeholder text, good for testing variable-length content in the UI:

{ "responseType": "lorem", "maxLoremParas": 8 }

Stored responses — returns predefined answers from src/data/data.json, useful for domain-specific or reproducible testing. The server randomly selects from this list on each request:

{
  "responseType": "stored"
}

{
  "responses": [
    "This is a custom response for testing.",
    "Another predefined response for consistency."
  ]
}

Streaming responses

Enable OpenAI-style Server-Sent Events (SSE) streaming in your config or via CLI:

{ "stream": true }

llmock start --stream=true

When enabled, the endpoint returns a chunked SSE stream. The first few chunks arrive immediately (mimicking real LLM behaviour), with subsequent chunks following the configured delay.

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-124","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: [DONE]

Response delay simulation

Simulate realistic API latency to test loading states, timeout handling, and UX:

{
  "responseDelay": { "min": 800, "max": 2500 }
}

Set both values to 0 for instant responses. The server picks a random value in the range for each request.

| Profile | min | max | |---|---|---| | Instant (development) | 0 | 0 | | Fast | 100 | 300 | | Realistic production | 800 | 2500 | | Slow / timeout testing | 3000 | 8000 | | Fixed delay | 1000 | 1000 |

Custom API paths

Set the endpoint to match any provider's path structure:

{ "endpoint": "chatgpt/chat/completions" }
// → http://localhost:8001/chatgpt/chat/completions

{ "endpoint": "models/gemini-pro:generateContent" }
// → http://localhost:8001/models/gemini-pro:generateContent

Environment variables

For CI/CD pipelines, set these to switch between mock and production:

TEST_MODE=true
TEST_BASE_URL=http://localhost:8001/chatgpt
TEST_EMBEDDING_URL=http://localhost:8001/v1/embeddings

Setting TEST_MODE=false switches back to real LLM services.

Features

Dashboard

Once running, open http://localhost:8001 for the live dashboard:

LLM Mock Server Page

| URL | Purpose | |---|---| | http://localhost:8001 | Main dashboard | | http://localhost:8001/logs | Request log history | | http://localhost:8001/ping | Health check |

The dashboard shows server status, current configuration, available endpoints, and recent request logs. It refreshes automatically every 2 seconds.

Available endpoints

| Endpoint | Description | |---|---| | Configurable (default: /chatgpt/chat/completions) | Chat completions | | /v1/embeddings | OpenAI-compatible mock embeddings |

Request validation

Validate incoming requests against templates to confirm API compatibility:

Add a template to the request-templates/ folder
Enable validation: "validateRequests": true

Invalid requests return a detailed error describing the mismatch.

Request logging

Enable with "logRequests": true and view at http://localhost:8001/logs, or find the log file at:

| Platform | Log location | |---|---| | Windows | C:\Users\{name}\AppData\Local\llmock-nodejs\Log\ | | macOS | ~/Library/Logs/llmock-nodejs/ | | Linux | ~/.local/share/llmock-nodejs/log/ |

Debug mode

Enable verbose console output to see incoming request details, validation results, response generation steps, and timing:

llmock start --debug=true

Integration Guide

Chat completions

Standard (non-streaming)

curl http://localhost:8001/chatgpt/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 1,
    "stream": false
  }'

{
  "id": "chatcmpl-6sf37lXn5paUcuf8UaurpMIKRMsTe",
  "object": "chat.completion",
  "created": 1678485525,
  "model": "gpt-3.5-turbo-0301",
  "choices": [{"message": {"role": "assistant", "content": "Generated response"}}]
}

Streaming

Enable stream: true in your config, then use the same endpoint:

curl -N http://localhost:8001/chatgpt/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Embeddings API

The mock server provides an OpenAI-compatible embeddings endpoint at /v1/embeddings:

curl http://localhost:8001/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Your text string goes here"
  }'

Pass an array of strings for multiple embeddings in one call:

-d '{"model": "text-embedding-ada-002", "input": ["First text", "Second text"]}'

Response format:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.1234, -0.5678, 0.9012]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 6, "total_tokens": 6 }
}

Key characteristics of mock embeddings: deterministic (same input always returns the same vector), configurable dimensions, model-name-sensitive, and OpenAI-compatible in shape. Note that vectors are pseudo-random — they have the correct shape for testing but are not real semantic embeddings.

Using with LangChain

Point your ChatOpenAI client at the mock server when TEST_MODE is enabled:

import { ChatOpenAI } from '@langchain/openai';

const chatModel = new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'gpt-3.5-turbo',
  configuration:
    process.env.TEST_MODE === 'true'
      ? { baseURL: process.env.TEST_BASE_URL } // http://localhost:8001/chatgpt
      : {},
});

For embeddings, use LangChain's built-in fake embeddings or call the mock endpoint directly:

class MockEmbeddingsAPI {
  async embedDocuments(texts) {
    return Promise.all(texts.map(text => this.embedQuery(text)));
  }

  async embedQuery(text) {
    const response = await fetch(process.env.TEST_EMBEDDING_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ input: text, model: 'text-embedding-ada-002' }),
    });
    const data = await response.json();
    return data.data[0].embedding;
  }
}

const embeddings =
  process.env.TEST_MODE === 'true'
    ? new MockEmbeddingsAPI()
    : new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY });

Supporting Different LLM Providers

LLMock supports any provider that uses the OpenAI chat completion format: ChatGPT, Grok, Llama, DeepSeek, Mistral, Claude, Gemini, and more. For providers with different request/response shapes, create custom templates.

Template locations

The framework checks two locations, in priority order:

./request-templates/ and ./response-templates/ in your project root
src/request-templates/ and src/response-templates/ in the package source

Project-level templates take priority, so you can add custom templates without modifying the package.

Creating a custom provider template

Step 1 — Request template (request-templates/<LLM_NAME>_req.json):

{
  "model": "string",
  "messages": [
    { "role": "string", "content": "string" }
  ]
}

Step 2 — Response template (response-templates/<LLM_NAME>_res.json):

Use DYNAMIC_CONTENT_HERE as the placeholder for generated content:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "DYNAMIC_CONTENT_HERE"
      },
      "finish_reason": "stop"
    }
  ]
}

Step 3 — Model preset (.llmockrc.json):

The name field must match your template filename prefix:

{
  "models": {
    "mymodel": {
      "name": "mymodel",
      "model": "my-custom-model-v1",
      "endpoint": "api/v1/chat/completions",
      "responseType": "lorem",
      "maxLoremParas": 8,
      "validateRequests": true,
      "stream": false,
      "responseDelay": { "min": 1000, "max": 2000 },
      "embeddings": { "enabled": true, "dimensions": 128 }
    }
  }
}

Step 4 — Start and test:

llmock start --model=mymodel

curl http://localhost:8001/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "my-custom-model-v1", "messages": [{"role": "user", "content": "Hello"}]}'

Docker Support

Docker is included when you use the scaffolding tool and is useful for CI/CD pipelines and consistent team environments.

Available scripts

| Script | Description | |---|---| | npm run docker:start | Start the container in detached mode | | npm run docker:stop | Stop the container and remove volumes | | npm run docker:rebuild | Rebuild and restart the container | | npm run docker:restart | Stop and start the container |

Configuration

The Docker container uses the same .llmockrc.json as the local setup, mounted as a read-only volume. Update settings and restart to apply changes — no rebuild required:

vim .llmockrc.json
npm run docker:restart

How Docker Works

The Docker container uses the --foreground flag to keep the LLMock server process attached. This prevents the container from restarting continuously, which would happen if the server ran as a detached background process. The container includes:

Dockerfile: Multi-stage Node.js build with security best practices
docker-compose.yml: Port 8001 exposed, config file mounted, health checks
docker-start script: Runs llmock start --foreground to keep the server attached

Manual Docker commands

docker compose up -d --force-recreate       # build and start
docker compose logs -f                      # view logs
docker compose down --volumes               # stop and clean up
docker compose down --volumes && docker compose up -d --force-recreate --build  # rebuild

Note: LLMock is intended for local development and testing only.

Troubleshooting

Server not responding

Confirm the server is running and the port matches .llmockrc.json. Open http://localhost:8001 — if it's unreachable, the server may not have started.

LLM Mock Server error page

Port already in use

Change the port in .llmockrc.json or pass it as a flag:

llmock start --port=8002

Request validation failures

Confirm your request template matches the provider's API format
Check the request shape at http://localhost:8001/logs
Verify the name field in your model config matches the template filename prefix

Response delays not applied

Ensure both responseDelay.min and responseDelay.max are set and greater than 0, then restart the server.

License

MIT — see LICENSE for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

LLMock — Local Mock LLM API

Table of Contents

Why LLMock?

Quick Start

Scaffolded project layout

Installation Options

Option 1: Scaffolded project (recommended)

Option 2: Add to an existing project

Foreground Mode

Option 3: Docker

Configuration

Configuration file (.llmockrc.json)

Adding custom models

Response types

Streaming responses

Response delay simulation

Custom API paths

Environment variables

Features

Dashboard

Available endpoints

Request validation

Request logging

Debug mode

Integration Guide

Chat completions

Standard (non-streaming)

Streaming

Embeddings API

Using with LangChain

Supporting Different LLM Providers

Template locations

Creating a custom provider template

Docker Support

Available scripts

Configuration

How Docker Works

Manual Docker commands

Troubleshooting

License

Configuration file (`.llmockrc.json`)