mobygate
v0.9.2
Published
OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out.
Maintainers
Readme
mobygate
OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out, on a single continuous loop.
Point any OpenAI-shaped client (Hermes, OpenClaw, custom tools, SDKs) at http://localhost:3456 and you get Claude Max inference out the other side — without hitting the paid Anthropic API.
- ✓ Real streaming (SSE)
- ✓ Multimodal (image URLs + base64 data URLs)
- ✓ OpenAI-style function calling (
tools,tool_choice-compatibletool_callsresponse) - ✓ Opus 4.7 with native 1M context variant
- ✓ Session resume (map a client key → SDK session ID)
- ✓ OAuth auto-refresh (no more 8-hour-cliff 401 storms)
- ✓ Live web dashboard with per-request tracing
- ✓ Cross-platform service install (macOS / Linux / Windows — one command)
Current release: v0.2.0 · see CHANGELOG.md for history.
Why
The older claude-max-api-proxy spawned a new Claude Code CLI subprocess for every request — ~500 ms overhead per call, Windows stdin pipe hacks, and patches that got nuked on every npm update. mobygate uses the Claude Agent SDK directly: no subprocess spawning, no patches, no maintenance. Same subscription, real streaming, multimodal, tool calling.
Quick start
npm install -g mobygate
mobygate init # interactive setup: config + service install + smoke testOr from source (for hacking on mobygate itself):
git clone https://github.com/khnfrhn/mobygate.git
cd mobygate
npm install
npm link # makes the `mobygate` command available globally
mobygate initThat single init does the full cross-platform install:
| Step | Mac | Linux | Windows |
|---|---|---|---|
| Verify Node ≥ 18, claude CLI on PATH, claude auth login done | ✓ | ✓ | ✓ |
| Write config to ~/.mobygate/config.yaml | ✓ | ✓ | ✓ |
| Install long-running server as user-level service | launchd (ai.mobygate.server) | systemd user unit (mobygate-server.service) | Task Scheduler (mobygate-server) |
| Install 4-hour auth-refresh cron | launchd plist | systemd .timer | Task Scheduler (4h repetition) |
| Redirect stdout/stderr to logs/server.log | ✓ | ✓ | ✓ (via .cmd launcher) |
| Auto-restart on crash | KeepAlive | Restart=on-failure | Task Scheduler RestartCount=3 |
| Smoke-test /health | ✓ | ✓ | ✓ |
No sudo required on any platform. No nssm on Windows. If the auto-install fails for any reason, mobygate init falls back to printing the exact commands to run manually.
Once installed, the service survives reboots and the daily driver commands are:
mobygate status # service state, auth state, /health probe
mobygate logs # tail logs/server.log
mobygate auth # check + force-refresh OAuth token
mobygate start # start service (if stopped)
mobygate stop # stop service
mobygate restart # stop + start
mobygate uninstall # remove services (leaves the repo in place)
mobygate versionOpen http://localhost:3456/ in your browser for the live dashboard (see below).
Linux headless tip: user systemd units stop when you log out. For a mobygate that stays up on a server, run
sudo loginctl enable-linger $USERonce. Then it runs whether you're logged in or not.
After
git pull: always re-runnpm install— new commits can bump the SDK or add packages. If you skip this, the server dies with a readable boxed "Missing package" error pointing atnpm install(ornpm run upwhich does both in one step).
Dashboard
Open http://localhost:3456/ after install for a live, zero-config dashboard:
- Header — whale ASCII ·
mobygate vX.Y.Z· "healthy · live" pill that turns red on disconnect ·clear log/force refresh authbuttons. - KPI strip — Uptime (live-ticking clock), Requests (total + stream/tool/image breakdown), Success rate (with 7-segment progress bar), Avg latency (p50 headline + p95 secondary + 14-bar color-thresholded sparkline).
- Server / Auth / Traffic row — default model, active sessions, context window, build (
v0.2.0 · darwin-arm64); email, plan, auth method, last probe, refresh count; 15-minute rolling req/min column chart. - Live requests — table auto-updates as requests come in. Chips for
stream/tool/img/sync. Inline latency bar (green < 3 s, blue < 15 s, orange > 15 s). Rounded status pill. Click any row → full start + end event JSON modal. Filter buttons:ALL / ERRORS / SLOW > 15 s. - Sessions panel — active session-key map, per-row
expire,expire all. Live-refreshes when a session is created / updated / expires. - Server log tail — last 200 lines of
logs/server.log. Auto-refresh every 2.5 s, smart auto-scroll that doesn't yank you back if you've scrolled up to read. - Footer — clickable endpoint pills, terminal-style
stream · connected | mobygate · tty0 · 0.2.0status line.
Design ported from the Paper artboard (01KPFE5G6MJGMT5E5MGA94DQRF/C1-0) via get_jsx — exact colors, typography, and ASCII.
Run (without the CLI)
If you just want a foreground process without installing services:
node server.js # normal start
npm run dev # auto-reload on changes
npm run up # install deps + start (one command — use after git pull)The server starts on port 3456 (same as the old proxy).
How It Works
Discord / Hermes / OpenClaw → POST localhost:3456/v1/chat/completions → Agent SDK query() → Claude Max- Receives OpenAI-format chat completion requests
- Converts
messages[]array to a single prompt string - Calls
query()from@anthropic-ai/claude-agent-sdk - Streams responses back as SSE (Server-Sent Events) in OpenAI format
Endpoints
OpenAI-compatible:
| Method | Path | Description |
|--------|------|-------------|
| POST | /v1/chat/completions | Chat completions (streaming + non-streaming) |
| GET | /v1/models | List available models with context lengths |
Operations:
| Method | Path | Description |
|--------|------|-------------|
| GET | /health | Liveness + active session count |
| GET | /auth/status | OAuth state (add ?quick=1 to skip the live probe) |
| POST | /auth/refresh | Force an OAuth refresh probe (cron hook) |
Session management:
| Method | Path | Description |
|--------|------|-------------|
| GET | /sessions | List all active sessions |
| GET | /sessions/:key | Inspect one session |
| DELETE | /sessions/:key | Expire a single session |
| DELETE | /sessions | Expire all sessions |
Dashboard feed:
| Method | Path | Description |
|--------|------|-------------|
| GET | / | The live dashboard (HTML) |
| GET | /events | SSE stream of all dashboard events (request.start, request.end, auth.refresh, session.*, server.boot) with 15 s heartbeat |
| GET | /dashboard/recent?limit=N | Ring-buffer snapshot + stats + build meta for initial page load |
| GET | /dashboard/sessions | Per-session detail with idle + TTL-remaining times |
| GET | /dashboard/logs?lines=N | Last N lines of logs/server.log |
Model Mapping
| Input | Resolves To |
|-------|------------|
| claude-opus-4, claude-opus-4-7, opus | claude-opus-4-7[1m] (1M context) |
| claude-opus-4-7-200k | claude-opus-4-7 (standard 200k) |
| claude-opus-4-6 | claude-opus-4-6 |
| claude-sonnet-4, claude-sonnet-4-5, claude-sonnet-4-6, sonnet | claude-sonnet-4-5-20250929 |
| claude-haiku-4, claude-haiku-4-5, haiku | claude-haiku-4-5-20251001 |
Provider prefixes are stripped automatically (e.g., claude-max-proxy/claude-opus-4-7 → claude-opus-4-7).
Client Configuration
OpenClaw (~/.openclaw/openclaw.json)
Add under models.providers:
"claude-max-proxy": {
"baseUrl": "http://localhost:3456/v1",
"apiKey": "claude-max",
"api": "openai-completions",
"models": [
{ "id": "claude-opus-4-7", "contextWindow": 1000000, "maxTokens": 16384 },
{ "id": "claude-opus-4-6", "contextWindow": 200000, "maxTokens": 16384 },
{ "id": "claude-sonnet-4-6", "contextWindow": 200000, "maxTokens": 16384 },
{ "id": "claude-haiku-4-5", "contextWindow": 200000, "maxTokens": 16384 }
]
}Set as default in agents.defaults.model:
"primary": "claude-max-proxy/claude-opus-4-7"Hermes Agent (~/.hermes/config.yaml)
model:
default: claude-opus-4-7
provider: custom # MUST be "custom", not "openai" or "custom:name"
api_key: claude-max
base_url: http://127.0.0.1:3456/v1
context_length: 1000000 # explicit override — ensures 1M context
providers:
claude-max-proxy:
api: http://127.0.0.1:3456/v1
name: Claude Max Proxy
api_key: claude-max
default_model: claude-opus-4-7Also add to ~/.hermes/auth.json credential_pool:
"custom:claude-max-proxy": [{
"id": "a1b2c3",
"label": "Claude Max Proxy",
"auth_type": "api_key",
"priority": 0,
"source": "config:Claude Max Proxy",
"access_token": "claude-max",
"base_url": "http://127.0.0.1:3456/v1",
"request_count": 0
}]Hermes provider caveat: The top-level
model.providermust becustom. Hermes doesn't recognizeopenaias a provider, andcustom:nameonly works indelegationblocks, not at the model level. Thecustomkeyword tells Hermes to readbase_urlandapi_keyfrom themodel:config. Aliases that also work:ollama,lmstudio,vllm,llamacpp.
Any OpenAI-compatible client
base_url: http://localhost:3456/v1
api_key: claude-max (any non-empty string works)
model: claude-opus-4-7Configuration
Precedence, highest wins: env var → ~/.mobygate/config.yaml → built-in default.
mobygate init writes a commented YAML file you can hand-edit. Env vars always override the file, so you can set one-off values (e.g. a different port per shell) without editing config.
| Variable | Config field | Default | Description |
|----------|-------------|---------|-------------|
| PORT | port | 3456 | Server port |
| DEFAULT_MODEL | default_model | claude-opus-4-7[1m] | Fallback model when none specified |
| SESSION_TTL_MINUTES | session_ttl_minutes | 60 | Idle timeout for session keys mapped to SDK sessions |
| AUTH_REFRESH_INTERVAL_HOURS | auth_refresh_interval_hours | 4 | How often the proactive refresh cron fires |
| CLAUDE_BIN | claude_bin | (empty → PATH lookup) | Absolute path to the claude binary if not on PATH |
| LOG_LEVEL | log_level | info | Reserved; currently informational only |
| MOBYGATE_HOME | — | ~/.mobygate | Directory for config + state files |
| MOBYGATE_NODE_BIN | — | process.execPath | Node binary baked into service definitions (launchd/systemd/Task Scheduler) |
| NO_COLOR | — | unset | Disable ANSI color in CLI banner output |
Diagnosing MCP Image Drops
If a client (e.g. Hermes) reports that an MCP tool returned an empty screenshot or image, use mcp-inspect.mjs to bypass the client and talk to the MCP server directly — this isolates whether the image is being dropped in the MCP server itself or in the client's normalization layer.
# stdio transport — spawn the MCP server as a subprocess
node mcp-inspect.mjs --cmd "<server-exe>" --args '["<arg1>"]' --list
node mcp-inspect.mjs --cmd "<server-exe>" --args '["<arg1>"]' \
--tool get_screenshot --params '{"nodeId":"WL-0"}'
# HTTP (StreamableHTTP) transport — e.g. Paper running at localhost:29979/mcp
node mcp-inspect.mjs --url "http://127.0.0.1:29979/mcp" --list
node mcp-inspect.mjs --url "http://127.0.0.1:29979/mcp" \
--tool get_screenshot --params '{"nodeId":"WL-0"}'
# Legacy SSE transport
node mcp-inspect.mjs --url "http://127.0.0.1:1234/sse" --transport sse --listIf the output shows a non-empty image content block with hundreds of KB of base64, the MCP server is fine and the client is stripping the image. If the image block is missing or empty, the MCP server itself is the culprit.
Auth & Token Refresh
The proxy inherits Claude Max OAuth credentials from the local CLI keychain (macOS: Claude Code-credentials; Windows: Credential Manager; Linux: libsecret / GNOME Keyring). Access tokens last ~8 hours and are supposed to refresh silently, but in practice the SDK occasionally surfaces 401 Invalid authentication credentials — either as a thrown error, or as the literal text of a result message on long-uptime processes.
mobygate init installs both defenses automatically; you shouldn't need to touch any of this. Reference only:
1. Reactive retry on 401. Both streaming and non-streaming handlers wrap the SDK query in runWithAuthRetry (see scripts/auth-helper.js). Exception-form 401s AND result-text-form 401s (Failed to authenticate. API Error: 401 ...) trigger a shell to claude -p that forces a token refresh via the still-valid refresh token, then retry the query once. Logs every step: [auth] 401 on sync call — refreshing, [auth] refreshed in 1234 ms — retrying sync call.
2. Proactive 4-hour cron. scripts/auth-refresh.js is cross-platform. mobygate init wires it up via launchd (macOS), systemd .timer (Linux), or Task Scheduler (Windows). Access tokens last ~8 hours, so a 4-hour cadence keeps us comfortably inside the valid window even if one run fails.
CLI helpers:
mobygate auth # show status + run a live probe
npm run auth:status # same via npm script (prints JSON)
npm run auth:status:quick # keychain-only, no live probe (instant)
npm run auth:refresh # force a refresh probe, print JSON resultEscape hatch — full re-auth required: if claude auth status --json reports loggedIn: true but you're still getting 401s after mobygate auth successfully refreshes, the refresh token itself has been revoked. Run claude auth login to do a full OAuth reauth, then mobygate restart. Rare; happens if you've signed out of Claude from another device.
macOS (launchd):
cp launchd/ai.mobygate.auth-refresh.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.mobygate.auth-refresh.plistLinux (cron):
0 */4 * * * cd /path/to/mobygate && /usr/bin/node scripts/auth-refresh.js >> logs/auth-refresh.log 2>&1Or systemd timer: mobygate init generates these by default. To do it by hand, create ~/.config/systemd/user/mobygate-auth.{service,timer} — service runs /usr/bin/node /path/to/mobygate/scripts/auth-refresh.js, timer has OnUnitActiveSec=4h and OnBootSec=1min. Then systemctl --user enable --now mobygate-auth.timer.
Windows (Task Scheduler):
$A = New-ScheduledTaskAction -Execute "node.exe" `
-Argument "scripts\auth-refresh.js" `
-WorkingDirectory "C:\path\to\mobygate"
$T = New-ScheduledTaskTrigger -Once -At (Get-Date) `
-RepetitionInterval (New-TimeSpan -Hours 4)
Register-ScheduledTask -TaskName "mobygate-auth-refresh" -Action $A -Trigger $TMultimodal
OpenAI image_url content parts are translated to Anthropic image content blocks. Both base64 data URLs and remote https: URLs work:
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgo..." } }
]
}When images are present in the request, the proxy switches from a plain-string prompt to an async-iterable SDKUserMessage with mixed-content blocks. Nothing else in the OpenAI shape changes. The dashboard shows an img chip on any request that carried images.
Tool Calling
OpenAI-style function calling is supported via a prompt-embedded protocol (the Agent SDK's native MCP mechanism pollutes session state on abort and gates tools behind ToolSearch — neither works for OpenAI's "emit call, client executes, send result back" flow).
How it works:
- Client sends
tools: [{type: "function", function: {...}}]in the OpenAI request. - Proxy injects the tool schemas into the system prompt and instructs the model to emit
<tool_call>{"name":"...","arguments":{...}}</tool_call>tags. - When a complete
<tool_call>tag is detected in the model's stream, the SDK query is aborted, tags are parsed, and the response is emitted as OpenAItool_callswithfinish_reason: "tool_calls". - On the follow-up request,
role: "tool"messages are translated into<tool_result id="..." name="...">...</tool_result>blocks for the model. - Parallel calls supported — the model can emit multiple
<tool_call>tags in one turn. - Streaming responses with tools are buffered and emitted as a single chunk (OpenAI tool-call streaming deltas are not currently exposed piecewise).
- Built-in SDK tools (Read, Bash, Grep, etc.) are disabled via
allowedTools: []during tool-calling requests so the model can only use client-defined tools.
Limitations:
- Relies on model format compliance (~95% in practice). Malformed JSON inside a
<tool_call>tag is silently dropped. tool_choice(force-tool, specific-tool) is not yet honored — the model decides whether to call a tool based on prompt cues.
Gotchas & Fixes
Things we learned getting this working:
| Issue | Fix |
|-------|-----|
| claude-sonnet-4-6 invalid | SDK resolves it to claude-sonnet-4-6-20250514 which doesn't exist. Mapped to claude-sonnet-4-5-20250929 |
| Old proxy still on port 3456 | Kill stale processes: lsof -ti :3456 \| xargs kill (Mac) or netstat -ano \| findstr 3456 then taskkill /PID <pid> /F (Win) |
| startup aborted — Missing package box on start | You pulled new commits but didn't run npm install yet. Run npm install (or npm run up to do both in one step). Most common cause of "network connection error" / ECONNREFUSED on :3456 — the proxy wasn't running because startup bailed |
| SDK message structure | Assistant text is at message.message.content[] (nested), NOT message.content |
| Double/duplicate responses | SDK emits text in assistant events AND again in result. Only use result as fallback when no assistant content was already sent |
| maxTurns: 1 blocks tools | Set maxTurns: 200 for full agent capability. Use 1 only for pure text responses |
| Rate limiting | Each query() spawns a Claude Code session. Avoid running Claude Code CLI alongside the proxy |
| OpenClaw agents failing | Remove all anthropic fallbacks from openclaw.json — route everything through claude-max-proxy |
| Hermes Unknown provider | Use provider: custom in config.yaml. openai is NOT a valid Hermes provider. custom:name fails at model level — only works in delegation blocks |
| Context shows 0/128K in Hermes | Hermes calls /v1/models to detect context window. Proxy must return context_length in each model object. Without it, Hermes falls back to 128K which can truncate memory injection. Also set model.context_length: 1000000 in config.yaml as explicit override |
| Hermes memories not loading | Caused by 128K context fallback truncating system prompt before memories get injected. Fixing context_length to 1M resolves this |
| Empty result after rate limit | SDK emits rate_limit_event then returns empty result. First request usually succeeds |
| node_modules cross-platform | Delete node_modules and npm install fresh when moving between Windows and Mac |
Testing
node test.jsRuns health, models, validation, non-streaming, and streaming tests.
What This Replaces
| Old (CLI Proxy) | New (SDK Proxy) |
|-----------------|-----------------|
| Spawns CLI subprocess per request | Native SDK query() call |
| ~500ms process overhead | Near-zero overhead |
| Patches nuked on npm update | No patches needed |
| --dangerously-skip-permissions flag | permissionMode: 'bypassPermissions' |
| Windows stdin pipe hack | Not needed |
| manager.js + openai-to-cli.js patches | Single server.js |
Dependencies
Runtime:
@anthropic-ai/claude-agent-sdk— Claude Agent SDK (talks to Claude Max through the CLI keychain)express— HTTP serverjs-yaml— Parses~/.mobygate/config.yamluuid— Request ID generation
Transitive (used in mcp-inspect.mjs):
@modelcontextprotocol/sdk— MCP client for diagnosing image-drop bugs in MCP servers
Frontend (loaded via CDN, no build step):
- Tailwind CSS via
cdn.tailwindcss.com - JetBrains Mono + VT323 via Google Fonts
Releases
Tagged releases live at github.com/khnfrhn/mobygate/releases. Pin by version when cloning for a reproducible install:
git clone https://github.com/khnfrhn/mobygate.git
cd mobygate
git checkout v0.2.0 # or any other tag
npm install && npm link && mobygate initSee CHANGELOG.md for per-version change lists.
Contributing
Designs live in Paper (artboard 01KPFE5G6MJGMT5E5MGA94DQRF). To port a new design into the dashboard:
- Select the node in Paper.
- Export its JSX via the Paper MCP
get_jsxtool. - Hand the JSX to a Claude session along with the current
index.html. - Colors, fonts, spacing, and any ASCII art will translate character-accurately.
This is how the v0.2.0 dashboard was built. Screenshots are fine for review; JSX is the source of truth for implementation.
