@ziuchen/deepseek-api
v2.1.4
Published
OpenAI-compatible API server for DeepSeek
Readme
deepseek-api
An OpenAI-compatible DeepSeek API proxy server implemented with Node.js 20+.
Features
- ✅ Fully compatible with OpenAI API format
- ✅ Supports streaming responses (SSE)
- ✅ Supports deep thinking / reasoning chain (
reasoning_content) - ✅ Supports online search mode
- ✅ OpenAI tools / function-calling shim (translates
toolsto prompt, parses text output back totool_calls) - ✅ Local PoW computation (WASM)
- ✅ Server-side conversation reuse (reduces redundant sessions on DeepSeek)
- ✅ Zero production dependencies
Supported Models
DeepSeek V4 introduced two modes — 快速模式 (Flash) and 专家模式 (Pro) — each independently supporting deep thinking and web search.
| Model ID | Mode | Deep Thinking | Web Search |
|---|---|:---:|:---:|
| deepseek-v4-flash | 快速模式 | ❌ | ❌ |
| deepseek-v4-flash-thinking | 快速模式 | ✅ | ❌ |
| deepseek-v4-flash-search | 快速模式 | ❌ | ✅ |
| deepseek-v4-flash-thinking-search | 快速模式 | ✅ | ✅ |
| deepseek-v4-pro | 专家模式 | ❌ | ❌ |
| deepseek-v4-pro-thinking | 专家模式 | ✅ | ❌ |
| deepseek-v4-pro-search | 专家模式 | ❌ | ✅ |
| deepseek-v4-pro-thinking-search | 专家模式 | ✅ | ✅ |
Quick Start
Install dependencies
pnpm installConfigure environment variables
cp .env.example .env
# Edit the .env file (optional, defaults work out of the box)Development mode
pnpm devProduction build
pnpm build
pnpm startEnvironment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| LISTEN_HOST | Listening address | 127.0.0.1 |
| LISTEN_PORT | Listening port | 5001 |
| DATA_DIR | Data directory for persistent storage (conversations, etc.). Unset = memory only | (unset) |
| DEBUG_LOG_OUTPUT | Enable debug logging (1 or true to enable) | (disabled) |
| ENABLE_TOOLS | Enable the OpenAI tools/function-calling shim (true or 1 to enable) | false |
API Endpoints
GET /v1/models
List available models.
POST /v1/chat/completions
Create a chat completion.
Request example:
curl http://127.0.0.1:5001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_DEEPSEEK_TOKEN" \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Authentication
Pass DeepSeek's Bearer Token directly in the Authorization header.
You can obtain the token from DeepSeek web app (chat.deepseek.com):
- Open browser DevTools (F12)
- Go to Application → Local Storage →
https://chat.deepseek.com - Copy the value of the
userTokenkey
Project Structure
src/
├── index.ts # Main entry - HTTP server
├── types.ts # Type definitions
├── constants.ts # Constants and env config
├── logger.ts # Unified logger (DEBUG_LOG_OUTPUT control)
├── utils.ts # Utility functions (incl. genToolCallId)
├── pow.ts # PoW WASM computation
├── account.ts # Token extraction from request
├── stream-parser.ts # DeepSeek stream response parser
├── tool-call-parser.ts # Text-stream → tool_calls state machine
├── deepseek.ts # DeepSeek API calls + messagesPrepareWithTools
├── conversation-store.ts # Server-side conversation state management
└── routes.ts # Route handlers
test/
├── messagesPrepareWithTools.test.ts
└── tool-call-parser.test.ts
public/
└── sha3_wasm_bg.*.wasm # PoW computation WASM fileTools / Function Calling
Requires
ENABLE_TOOLS=true(off by default).
This proxy implements an OpenAI tools/function-calling protocol adapter: it translates tools definitions into a text prompt for DeepSeek, then parses the model's plain-text output back into structured tool_calls. No tools are executed server-side — the client remains in control of the ReAct loop, exactly like the standard OpenAI protocol.
Supported features
| Feature | Status |
|---|---|
| tools[] / tool_choice (new protocol) | ✅ |
| functions[] / function_call (legacy protocol) | ✅ (auto-converted) |
| Streaming tool_calls delta | ✅ |
| Non-streaming message.tool_calls | ✅ |
| Multi-turn (client sends role: "tool" result) | ✅ |
| Serial single tool call per turn | ✅ (v1) |
| Parallel tool calls | ❌ (v2 backlog) |
| response_format: json_schema | ❌ (out of scope) |
Known limitations
- Thinking models (
-thinkingvariants): the model'sTHINKsegment is not parsed for tool calls — only theRESPONSEsegment is. This means the first tool call response may be delayed significantly. - Search models (
-searchvariants): the model may resolve the query via web search instead of calling the tool. A warning is logged. Prefer non-search models for tool-heavy workloads. tool_choice: "required": if the model ignores the instruction, the proxy returns the model's text response withfinish_reason: "stop"and logs a warning (one retry is not yet implemented in v1).- Tool result size: tool results larger than 32 KB are rejected with HTTP 400.
- Session reuse: requests containing
toolsalways start a fresh DeepSeek session (no conversation reuse) to avoid fingerprint mismatches.
OpenAI SDK usage example
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'http://127.0.0.1:5001/v1',
apiKey: 'YOUR_DEEPSEEK_TOKEN' // passed as Bearer token to DeepSeek
})
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: { city: { type: 'string', description: 'City name' } },
required: ['city']
}
}
}
]
// --- First turn: model requests a tool call ---
const response = await client.chat.completions.create({
model: 'deepseek-v4-flash',
messages: [{ role: 'user', content: "What's the weather in Beijing?" }],
tools,
tool_choice: 'auto',
stream: false
})
const choice = response.choices[0]
if (choice.finish_reason === 'tool_calls' && choice.message.tool_calls) {
const tc = choice.message.tool_calls[0]
const args = JSON.parse(tc.function.arguments) as { city: string }
// Execute the tool locally
const weatherResult = { temp: 22, condition: 'sunny' }
// --- Second turn: send tool result back ---
const final = await client.chat.completions.create({
model: 'deepseek-v4-flash',
messages: [
{ role: 'user', content: "What's the weather in Beijing?" },
choice.message, // assistant message with tool_calls
{
role: 'tool',
tool_call_id: tc.id,
content: JSON.stringify(weatherResult)
}
],
tools,
tool_choice: 'auto',
stream: false
})
console.log(final.choices[0].message.content)
// → "The weather in Beijing is sunny, 22°C."
}Observability
The proxy exposes the number of internal tool-related retries via:
- Response header
x-deepseek-proxy-tool-retries(non-streaming responses). - SSE comment
: x-deepseek-proxy-tool-retries: Njust before[DONE](streaming responses).
A value of 0 means the tool call succeeded on the first attempt.
