kimi-proxy
v0.1.5
Published
An agent-friendly LLM proxy that cleans up Kimi K2 output, fixes tool calls, and normalizes responses for OpenAI/Anthropic-style clients.
Readme
Kimi Proxy
[!WARNING] ⚠️ Experimental: This project is still in development. Use with caution in production.
Makes kimi-k2-thinking usable across multiple LLM providers by normalizing API formats, fixing tool call and thinking format issues, and optionally ensuring the model always uses a tool call for agentic workflows.
The proxy and transformation pipelines are built generically and can be easily extended to support any model and any provider.
Features
Seamlessly route requests to OpenAI-compatible APIs, OpenRouter, or Vertex AI using a unified client model name.
For some providers, kimi-k2-thinking returns tool calls and thinking content in non-standard formats. The proxy normalizes these to the standard Anthropic format that clients expect.
Example: Tool call normalization from content
What the kimi-k2 provider returns (tool calls embedded in content with <|tool_call_begin|> markers):
{
"content": "Let me search for that. <|tool_call_begin|> functions.lookup:42 <|tool_call_argument_begin|> {\"term\":\"express\"} <|tool_call_begin|> "
}What clients receive (normalized):
{
"content": "Let me search for that.",
"tool_calls": [
{
"id": "42",
"type": "function",
"function": {
"name": "lookup",
"arguments": "{\"term\":\"express\"}"
}
}
],
"finish_reason": "tool_calls"
}Example: Thinking tags extraction and cleanup
What kimi-k2 returns:
(no content)(no content) Let me break down... </think> The answer is 42.What clients receive:
{
"content": "The answer is 42.",
"thinking": "Let me break down..."
}Enable with ensure_tool_call: true in model config. The proxy detects missing tool calls and re-prompts the model with a reminder.
When enabled, the proxy also injects a termination tool named done and a system instruction telling the model to call it when finished (optionally with {"final_answer":"..."}), then strips that termination tool call from the final response.
Control the maximum number of re-prompt attempts with ENSURE_TOOL_CALL_MAX_ATTEMPTS (default: 3, max: 5).
Example enforcement flow:
System: You are a helpful assistant with access to tools.
Always reply with at least one tool call so the client can continue.
User: What's the weather in SF?
Assistant: Let me check that for you.
System: Reminder: The client will not continue unless you reply with a tool call.
Assistant: {
"tool_calls": [{
"id": "get_weather:0",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"SF\"}"
}
}]
}All requests and responses are logged to SQLite and viewable through a built-in web dashboard at the root path.
Distribute traffic across providers using round-robin, weighted random, random, or first strategies.
- Extensible architecture for adding new models and providers
- Provider support: OpenAI-compatible APIs, OpenRouter, Vertex AI
- Hybrid logging pipeline: SQLite metadata with filesystem blobs, LiveStore-backed dashboard with virtualized/lazy blob loading
Quick Start (Bun)
bun install
cp .env.example .env
cp model-config.example.yaml model-config.yaml
# Edit .env and model-config.yaml with your provider keys and models
bun run dev # backend
bun --cwd frontend dev # dashboard (Vite)The API runs on http://127.0.0.1:8000 and serves the dashboard (built assets) at /. The dev dashboard uses VITE_API_URL to point at the backend (defaults to same origin).
Configuration
Dashboard & LiveStore
Control LiveStore sync behavior via environment variables:
| Variable | Default | Description |
| ----------------------- | ------- | ----------------------------------------------------------------------------------------------- |
| LIVESTORE_BATCH | 50 | Batch size for dashboard sync (range: 1-500) |
| LIVESTORE_MAX_RECORDS | 500 | Memory sliding window - max records to keep in LiveStore. Set to 0 to disable (not recommended) |
Providers
Set environment variables in .env:
- Generic OpenAI:
OPENAI_BASE_URL,OPENAI_API_KEY - Anthropic:
ANTHROPIC_API_KEY, optionalANTHROPIC_BASE_URL - OpenRouter:
OPENROUTER_API_KEY,OPENROUTER_PROVIDERS(optional),OPENROUTER_ORDER(optional) - Vertex AI:
VERTEX_PROJECT_ID,VERTEX_LOCATION,GOOGLE_APPLICATION_CREDENTIALSGOOGLE_APPLICATION_CREDENTIALScan be a path to the JSON key file or the JSON payload itself. UseVERTEX_CHAT_ENDPOINTto point at a private MaaS endpoint if needed.
Models
Edit model-config.yaml to map client model names to upstream providers:
default_strategy: round_robin
models:
- name: kimi-k2-thinking
provider: vertex
model: moonshotai/kimi-k2-thinking-maas
# Optional: enforce tool call consistency for reliable agentic workflows
ensure_tool_call: true
- name: kimi-k2-thinking
provider: openrouter
model: moonshot-ai/kimi-k2-thinking
weight: 2Dashboard
The web dashboard shows request/response logs and metrics. Access it at the root path when running the proxy. LiveStore metadata sync pulls from /api/livestore/pull in batches (size controlled by LIVESTORE_BATCH) and lazily fetches blobs on expansion. Build the dashboard with bun run build:all to serve static assets from the backend.
Performance Features
- Reverse-chronological loading: Data loads from newest to oldest, providing immediate access to recent logs
- Memory-efficient virtualization: Uses TanStack Virtual to render only visible rows
- Configurable sliding window: Limit browser memory usage by setting
LIVESTORE_MAX_RECORDS(see.env.example) - Automatic garbage collection: Old records beyond the window limit are automatically purged
The dashboard uses reactive queries with TanStack Table and TanStack Virtual for fast, efficient rendering of large datasets.
Development
bun run dev # Run backend with hot reload
bun --cwd frontend dev # Run dashboard
bun test # Run tests
bun run build:all # Build server + dashboardDocker
docker compose up --build -d # Production stack with web dashboard
docker compose -f docker-compose.dev.yml watch # Development with hot reload