qwen-3-coder-proxy
v1.0.1
Published
A proxy server for routing between Cerebras and Chutes for Qwen-3-Coder
Maintainers
Readme
Qwen-3-Coder Proxy Server
A proxy server that routes between Cerebras and Chutes for the Qwen-3-Coder model. This proxy provides an OpenAI-compatible API interface while automatically handling provider selection based on rate limits and availability.
Features
- OpenAI-compatible API endpoints
- Automatic routing between Cerebras (preferred) and Chutes (fallback)
- Rate limit monitoring and management
- Automatic fallback when Cerebras rate limits are approached
- Model name mapping between providers
- Function calling support
- Comprehensive logging and error handling
Prerequisites
- Node.js (v14 or higher)
- npm or yarn
- Cerebras API key
- Chutes API key
Installation
Clone the repository:
git clone <repository-url> cd qwen-3-coder-proxyInstall dependencies:
npm installConfigure environment variables:
cp .env.example .env # Edit .env file with your API keys and configuration
Configuration
The proxy can be configured using environment variables. Create a .env file based on the .env.example template:
# Server configuration
PORT=3000
LOG_LEVEL=INFO
# API keys (required)
CEREBRAS_API_KEY=your_cerebras_api_key
CHUTES_API_KEY=your_chutes_api_key
# Model names (optional, defaults provided)
CEREBRAS_MODEL_NAME=qwen-3-coder-480b
CHUTES_MODEL_NAME=Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
PROXY_MODEL_NAME=qwen-3-coder
# Rate limits (optional, defaults provided)
CEREBRAS_REQUESTS_PER_MINUTE=12
CEREBRAS_TOKENS_PER_MINUTE=132000
CEREBRAS_TOKENS_PER_DAY=19200000
COOLDOWN_PERIOD=300000
REQUEST_TIMEOUT=30000Usage
Start the server:
npm startFor development with auto-reload:
npm run devThe server will start on the configured port (default: 3000)
OpenAI-Compatible Base URL
To use this proxy with OpenAI-compatible tools (like roocode), set the base URL to:
http://localhost:3000/v1For example, with roocode or other tools that support custom OpenAI endpoints:
- Base URL:
http://localhost:3000/v1 - API Key: Any non-empty string (the proxy doesn't validate the API key)
- Model:
qwen-3-coder
Example Configuration for roocode
In your roocode configuration, set:
{
"baseUrl": "http://localhost:3000/v1",
"model": "qwen-3-coder"
}Note: The API key can be any value as the proxy doesn't validate it, but it must be provided as some tools require it.
API Endpoints
The proxy provides OpenAI-compatible endpoints:
GET /v1/models
List available models.
curl http://localhost:3000/v1/modelsResponse:
{
"object": "list",
"data": [
{
"id": "qwen-3-coder",
"object": "model",
"created": 1723670000,
"owned_by": "qwen-3-coder-proxy"
}
]
}POST /v1/chat/completions
Create a chat completion. The proxy will automatically route the request to either Cerebras or Chutes based on availability and rate limits.
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-3-coder",
"messages": [
{
"role": "user",
"content": "Tell me a 250 word story."
}
],
"stream": false,
"max_tokens": 1024,
"temperature": 0.7
}'Function Calling
The proxy supports function calling with both the tools parameter (newer format) and functions parameter (legacy format). These parameters are forwarded as-is to the providers:
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-3-coder",
"messages": [
{
"role": "user",
"content": "What is the weather like in New York?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'How It Works
- Provider Selection: The proxy prefers Cerebras as the primary provider.
- Rate Limit Monitoring: It monitors usage to ensure Cerebras rate limits are not exceeded:
- Requests per minute: 12 (80% of 15)
- Tokens per minute: 132,000 (80% of 165,000)
- Tokens per day: 19,200,000 (80% of 24,000,000)
- Automatic Fallback: When Cerebras rate limits are approached or exceeded, the proxy automatically switches to Chutes.
- Cooldown Period: After hitting a rate limit, the proxy waits for a cooldown period (default: 5 minutes) before trying Cerebras again.
- Model Mapping: The proxy handles model name mapping between the providers.
- Function Calling: The proxy forwards function calling parameters (
toolsandfunctions) to the providers without modification.
Rate Limit Handling
The proxy implements the following rate limit handling strategies:
- Proactive Monitoring: Tracks requests and tokens to prevent hitting limits
- Reactive Handling: Responds to 429 (rate limit) errors from providers
- Cooldown Period: Implements a cooldown period after rate limit errors
- Automatic Switching: Automatically switches between providers based on availability
Function Calling Support
The proxy supports function calling for both Cerebras and Chutes:
- Tools Parameter: The newer format using the
toolsparameter is supported - Functions Parameter: The legacy format using the
functionsparameter is supported - Forwarding: Function calling parameters are forwarded as-is to the providers
- Response Handling: Responses with function calling data are returned unchanged
Note: The exact behavior of function calling may vary between providers. The proxy forwards requests and responses without modification, so any provider-specific differences in function calling behavior will be preserved.
Logging
The proxy provides comprehensive logging for monitoring and debugging:
- Request/response logging
- Provider selection decisions
- Rate limit status
- Function calling parameters and responses
- Error conditions
Log level can be configured using the LOG_LEVEL environment variable (ERROR, WARN, INFO, DEBUG).
Testing
To test the proxy:
- Ensure you have valid API keys in your
.envfile - Start the server:
npm start - Test the models endpoint:
curl http://localhost:3000/v1/models - Test the chat completions endpoint with the example curl command above
License
MIT
