mobygate

v0.9.2

Published

8 days ago

OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out.

0High
0Medium
0Low

khnfrhn

claude claude-max anthropic openai-compatible proxy gateway llm ai agent-sdk localhost dashboard

mobygate

OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out, on a single continuous loop.

Point any OpenAI-shaped client (Hermes, OpenClaw, custom tools, SDKs) at http://localhost:3456 and you get Claude Max inference out the other side — without hitting the paid Anthropic API.

✓ Real streaming (SSE)
✓ Multimodal (image URLs + base64 data URLs)
✓ OpenAI-style function calling (tools, tool_choice-compatible tool_calls response)
✓ Opus 4.7 with native 1M context variant
✓ Session resume (map a client key → SDK session ID)
✓ OAuth auto-refresh (no more 8-hour-cliff 401 storms)
✓ Live web dashboard with per-request tracing
✓ Cross-platform service install (macOS / Linux / Windows — one command)

Current release: v0.2.0 · see CHANGELOG.md for history.

Why

The older claude-max-api-proxy spawned a new Claude Code CLI subprocess for every request — ~500 ms overhead per call, Windows stdin pipe hacks, and patches that got nuked on every npm update. mobygate uses the Claude Agent SDK directly: no subprocess spawning, no patches, no maintenance. Same subscription, real streaming, multimodal, tool calling.

Quick start

npm install -g mobygate
mobygate init      # interactive setup: config + service install + smoke test

Or from source (for hacking on mobygate itself):

git clone https://github.com/khnfrhn/mobygate.git
cd mobygate
npm install
npm link            # makes the `mobygate` command available globally
mobygate init

That single init does the full cross-platform install:

| Step | Mac | Linux | Windows | |---|---|---|---| | Verify Node ≥ 18, claude CLI on PATH, claude auth login done | ✓ | ✓ | ✓ | | Write config to ~/.mobygate/config.yaml | ✓ | ✓ | ✓ | | Install long-running server as user-level service | launchd (ai.mobygate.server) | systemd user unit (mobygate-server.service) | Task Scheduler (mobygate-server) | | Install 4-hour auth-refresh cron | launchd plist | systemd .timer | Task Scheduler (4h repetition) | | Redirect stdout/stderr to logs/server.log | ✓ | ✓ | ✓ (via .cmd launcher) | | Auto-restart on crash | KeepAlive | Restart=on-failure | Task Scheduler RestartCount=3 | | Smoke-test /health | ✓ | ✓ | ✓ |

No sudo required on any platform. No nssm on Windows. If the auto-install fails for any reason, mobygate init falls back to printing the exact commands to run manually.

Once installed, the service survives reboots and the daily driver commands are:

mobygate status     # service state, auth state, /health probe
mobygate logs       # tail logs/server.log
mobygate auth       # check + force-refresh OAuth token
mobygate start      # start service (if stopped)
mobygate stop       # stop service
mobygate restart    # stop + start
mobygate uninstall  # remove services (leaves the repo in place)
mobygate version

Open http://localhost:3456/ in your browser for the live dashboard (see below).

Linux headless tip: user systemd units stop when you log out. For a mobygate that stays up on a server, run sudo loginctl enable-linger $USER once. Then it runs whether you're logged in or not.

After git pull: always re-run npm install — new commits can bump the SDK or add packages. If you skip this, the server dies with a readable boxed "Missing package" error pointing at npm install (or npm run up which does both in one step).

Dashboard

Open http://localhost:3456/ after install for a live, zero-config dashboard:

Header — whale ASCII · mobygate vX.Y.Z · "healthy · live" pill that turns red on disconnect · clear log / force refresh auth buttons.
KPI strip — Uptime (live-ticking clock), Requests (total + stream/tool/image breakdown), Success rate (with 7-segment progress bar), Avg latency (p50 headline + p95 secondary + 14-bar color-thresholded sparkline).
Server / Auth / Traffic row — default model, active sessions, context window, build (v0.2.0 · darwin-arm64); email, plan, auth method, last probe, refresh count; 15-minute rolling req/min column chart.
Live requests — table auto-updates as requests come in. Chips for stream / tool / img / sync. Inline latency bar (green < 3 s, blue < 15 s, orange > 15 s). Rounded status pill. Click any row → full start + end event JSON modal. Filter buttons: ALL / ERRORS / SLOW > 15 s.
Sessions panel — active session-key map, per-row expire, expire all. Live-refreshes when a session is created / updated / expires.
Server log tail — last 200 lines of logs/server.log. Auto-refresh every 2.5 s, smart auto-scroll that doesn't yank you back if you've scrolled up to read.
Footer — clickable endpoint pills, terminal-style stream · connected | mobygate · tty0 · 0.2.0 status line.

Design ported from the Paper artboard (01KPFE5G6MJGMT5E5MGA94DQRF/C1-0) via get_jsx — exact colors, typography, and ASCII.

Run (without the CLI)

If you just want a foreground process without installing services:

node server.js         # normal start
npm run dev            # auto-reload on changes
npm run up             # install deps + start (one command — use after git pull)

The server starts on port 3456 (same as the old proxy).

How It Works

Discord / Hermes / OpenClaw → POST localhost:3456/v1/chat/completions → Agent SDK query() → Claude Max

Receives OpenAI-format chat completion requests
Converts messages[] array to a single prompt string
Calls query() from @anthropic-ai/claude-agent-sdk
Streams responses back as SSE (Server-Sent Events) in OpenAI format

Endpoints

OpenAI-compatible:

| Method | Path | Description | |--------|------|-------------| | POST | /v1/chat/completions | Chat completions (streaming + non-streaming) | | GET | /v1/models | List available models with context lengths |

Operations:

| Method | Path | Description | |--------|------|-------------| | GET | /health | Liveness + active session count | | GET | /auth/status | OAuth state (add ?quick=1 to skip the live probe) | | POST | /auth/refresh | Force an OAuth refresh probe (cron hook) |

Session management:

| Method | Path | Description | |--------|------|-------------| | GET | /sessions | List all active sessions | | GET | /sessions/:key | Inspect one session | | DELETE | /sessions/:key | Expire a single session | | DELETE | /sessions | Expire all sessions |

Dashboard feed:

| Method | Path | Description | |--------|------|-------------| | GET | / | The live dashboard (HTML) | | GET | /events | SSE stream of all dashboard events (request.start, request.end, auth.refresh, session.*, server.boot) with 15 s heartbeat | | GET | /dashboard/recent?limit=N | Ring-buffer snapshot + stats + build meta for initial page load | | GET | /dashboard/sessions | Per-session detail with idle + TTL-remaining times | | GET | /dashboard/logs?lines=N | Last N lines of logs/server.log |

Model Mapping

| Input | Resolves To | |-------|------------| | claude-opus-4, claude-opus-4-7, opus | claude-opus-4-7[1m] (1M context) | | claude-opus-4-7-200k | claude-opus-4-7 (standard 200k) | | claude-opus-4-6 | claude-opus-4-6 | | claude-sonnet-4, claude-sonnet-4-5, claude-sonnet-4-6, sonnet | claude-sonnet-4-5-20250929 | | claude-haiku-4, claude-haiku-4-5, haiku | claude-haiku-4-5-20251001 |

Provider prefixes are stripped automatically (e.g., claude-max-proxy/claude-opus-4-7 → claude-opus-4-7).

Client Configuration

OpenClaw (`~/.openclaw/openclaw.json`)

Add under models.providers:

"claude-max-proxy": {
  "baseUrl": "http://localhost:3456/v1",
  "apiKey": "claude-max",
  "api": "openai-completions",
  "models": [
    { "id": "claude-opus-4-7", "contextWindow": 1000000, "maxTokens": 16384 },
    { "id": "claude-opus-4-6", "contextWindow": 200000, "maxTokens": 16384 },
    { "id": "claude-sonnet-4-6", "contextWindow": 200000, "maxTokens": 16384 },
    { "id": "claude-haiku-4-5", "contextWindow": 200000, "maxTokens": 16384 }
  ]
}

Set as default in agents.defaults.model:

"primary": "claude-max-proxy/claude-opus-4-7"

Hermes Agent (`~/.hermes/config.yaml`)

model:
  default: claude-opus-4-7
  provider: custom          # MUST be "custom", not "openai" or "custom:name"
  api_key: claude-max
  base_url: http://127.0.0.1:3456/v1
  context_length: 1000000   # explicit override — ensures 1M context

providers:
  claude-max-proxy:
    api: http://127.0.0.1:3456/v1
    name: Claude Max Proxy
    api_key: claude-max
    default_model: claude-opus-4-7

Also add to ~/.hermes/auth.json credential_pool:

"custom:claude-max-proxy": [{
  "id": "a1b2c3",
  "label": "Claude Max Proxy",
  "auth_type": "api_key",
  "priority": 0,
  "source": "config:Claude Max Proxy",
  "access_token": "claude-max",
  "base_url": "http://127.0.0.1:3456/v1",
  "request_count": 0
}]

Hermes provider caveat: The top-level model.provider must be custom. Hermes doesn't recognize openai as a provider, and custom:name only works in delegation blocks, not at the model level. The custom keyword tells Hermes to read base_url and api_key from the model: config. Aliases that also work: ollama, lmstudio, vllm, llamacpp.

Any OpenAI-compatible client

base_url: http://localhost:3456/v1
api_key:  claude-max   (any non-empty string works)
model:    claude-opus-4-7

Configuration

Precedence, highest wins: env var → ~/.mobygate/config.yaml → built-in default.

mobygate init writes a commented YAML file you can hand-edit. Env vars always override the file, so you can set one-off values (e.g. a different port per shell) without editing config.

| Variable | Config field | Default | Description | |----------|-------------|---------|-------------| | PORT | port | 3456 | Server port | | DEFAULT_MODEL | default_model | claude-opus-4-7[1m] | Fallback model when none specified | | SESSION_TTL_MINUTES | session_ttl_minutes | 60 | Idle timeout for session keys mapped to SDK sessions | | AUTH_REFRESH_INTERVAL_HOURS | auth_refresh_interval_hours | 4 | How often the proactive refresh cron fires | | CLAUDE_BIN | claude_bin | (empty → PATH lookup) | Absolute path to the claude binary if not on PATH | | LOG_LEVEL | log_level | info | Reserved; currently informational only | | MOBYGATE_HOME | — | ~/.mobygate | Directory for config + state files | | MOBYGATE_NODE_BIN | — | process.execPath | Node binary baked into service definitions (launchd/systemd/Task Scheduler) | | NO_COLOR | — | unset | Disable ANSI color in CLI banner output |

Diagnosing MCP Image Drops

If a client (e.g. Hermes) reports that an MCP tool returned an empty screenshot or image, use mcp-inspect.mjs to bypass the client and talk to the MCP server directly — this isolates whether the image is being dropped in the MCP server itself or in the client's normalization layer.

# stdio transport — spawn the MCP server as a subprocess
node mcp-inspect.mjs --cmd "<server-exe>" --args '["<arg1>"]' --list
node mcp-inspect.mjs --cmd "<server-exe>" --args '["<arg1>"]' \
  --tool get_screenshot --params '{"nodeId":"WL-0"}'

# HTTP (StreamableHTTP) transport — e.g. Paper running at localhost:29979/mcp
node mcp-inspect.mjs --url "http://127.0.0.1:29979/mcp" --list
node mcp-inspect.mjs --url "http://127.0.0.1:29979/mcp" \
  --tool get_screenshot --params '{"nodeId":"WL-0"}'

# Legacy SSE transport
node mcp-inspect.mjs --url "http://127.0.0.1:1234/sse" --transport sse --list

If the output shows a non-empty image content block with hundreds of KB of base64, the MCP server is fine and the client is stripping the image. If the image block is missing or empty, the MCP server itself is the culprit.

Auth & Token Refresh

The proxy inherits Claude Max OAuth credentials from the local CLI keychain (macOS: Claude Code-credentials; Windows: Credential Manager; Linux: libsecret / GNOME Keyring). Access tokens last ~8 hours and are supposed to refresh silently, but in practice the SDK occasionally surfaces 401 Invalid authentication credentials — either as a thrown error, or as the literal text of a result message on long-uptime processes.

mobygate init installs both defenses automatically; you shouldn't need to touch any of this. Reference only:

1. Reactive retry on 401. Both streaming and non-streaming handlers wrap the SDK query in runWithAuthRetry (see scripts/auth-helper.js). Exception-form 401s AND result-text-form 401s (Failed to authenticate. API Error: 401 ...) trigger a shell to claude -p that forces a token refresh via the still-valid refresh token, then retry the query once. Logs every step: [auth] 401 on sync call — refreshing, [auth] refreshed in 1234 ms — retrying sync call.

2. Proactive 4-hour cron. scripts/auth-refresh.js is cross-platform. mobygate init wires it up via launchd (macOS), systemd .timer (Linux), or Task Scheduler (Windows). Access tokens last ~8 hours, so a 4-hour cadence keeps us comfortably inside the valid window even if one run fails.

CLI helpers:

mobygate auth           # show status + run a live probe
npm run auth:status     # same via npm script (prints JSON)
npm run auth:status:quick  # keychain-only, no live probe (instant)
npm run auth:refresh    # force a refresh probe, print JSON result

Escape hatch — full re-auth required: if claude auth status --json reports loggedIn: true but you're still getting 401s after mobygate auth successfully refreshes, the refresh token itself has been revoked. Run claude auth login to do a full OAuth reauth, then mobygate restart. Rare; happens if you've signed out of Claude from another device.

macOS (launchd):

cp launchd/ai.mobygate.auth-refresh.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.mobygate.auth-refresh.plist

Linux (cron):

0 */4 * * * cd /path/to/mobygate && /usr/bin/node scripts/auth-refresh.js >> logs/auth-refresh.log 2>&1

Or systemd timer: mobygate init generates these by default. To do it by hand, create ~/.config/systemd/user/mobygate-auth.{service,timer} — service runs /usr/bin/node /path/to/mobygate/scripts/auth-refresh.js, timer has OnUnitActiveSec=4h and OnBootSec=1min. Then systemctl --user enable --now mobygate-auth.timer.

Windows (Task Scheduler):

$A = New-ScheduledTaskAction -Execute "node.exe" `
  -Argument "scripts\auth-refresh.js" `
  -WorkingDirectory "C:\path\to\mobygate"
$T = New-ScheduledTaskTrigger -Once -At (Get-Date) `
  -RepetitionInterval (New-TimeSpan -Hours 4)
Register-ScheduledTask -TaskName "mobygate-auth-refresh" -Action $A -Trigger $T

Multimodal

OpenAI image_url content parts are translated to Anthropic image content blocks. Both base64 data URLs and remote https: URLs work:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgo..." } }
  ]
}

When images are present in the request, the proxy switches from a plain-string prompt to an async-iterable SDKUserMessage with mixed-content blocks. Nothing else in the OpenAI shape changes. The dashboard shows an img chip on any request that carried images.

Tool Calling

OpenAI-style function calling is supported via a prompt-embedded protocol (the Agent SDK's native MCP mechanism pollutes session state on abort and gates tools behind ToolSearch — neither works for OpenAI's "emit call, client executes, send result back" flow).

How it works:

Client sends tools: [{type: "function", function: {...}}] in the OpenAI request.
Proxy injects the tool schemas into the system prompt and instructs the model to emit <tool_call>{"name":"...","arguments":{...}}</tool_call> tags.
When a complete <tool_call> tag is detected in the model's stream, the SDK query is aborted, tags are parsed, and the response is emitted as OpenAI tool_calls with finish_reason: "tool_calls".
On the follow-up request, role: "tool" messages are translated into <tool_result id="..." name="...">...</tool_result> blocks for the model.
Parallel calls supported — the model can emit multiple <tool_call> tags in one turn.
Streaming responses with tools are buffered and emitted as a single chunk (OpenAI tool-call streaming deltas are not currently exposed piecewise).
Built-in SDK tools (Read, Bash, Grep, etc.) are disabled via allowedTools: [] during tool-calling requests so the model can only use client-defined tools.

Limitations:

Relies on model format compliance (~95% in practice). Malformed JSON inside a <tool_call> tag is silently dropped.
tool_choice (force-tool, specific-tool) is not yet honored — the model decides whether to call a tool based on prompt cues.

Gotchas & Fixes

Things we learned getting this working:

| Issue | Fix | |-------|-----| | claude-sonnet-4-6 invalid | SDK resolves it to claude-sonnet-4-6-20250514 which doesn't exist. Mapped to claude-sonnet-4-5-20250929 | | Old proxy still on port 3456 | Kill stale processes: lsof -ti :3456 \| xargs kill (Mac) or netstat -ano \| findstr 3456 then taskkill /PID <pid> /F (Win) | | startup aborted — Missing package box on start | You pulled new commits but didn't run npm install yet. Run npm install (or npm run up to do both in one step). Most common cause of "network connection error" / ECONNREFUSED on :3456 — the proxy wasn't running because startup bailed | | SDK message structure | Assistant text is at message.message.content[] (nested), NOT message.content | | Double/duplicate responses | SDK emits text in assistant events AND again in result. Only use result as fallback when no assistant content was already sent | | maxTurns: 1 blocks tools | Set maxTurns: 200 for full agent capability. Use 1 only for pure text responses | | Rate limiting | Each query() spawns a Claude Code session. Avoid running Claude Code CLI alongside the proxy | | OpenClaw agents failing | Remove all anthropic fallbacks from openclaw.json — route everything through claude-max-proxy | | Hermes Unknown provider | Use provider: custom in config.yaml. openai is NOT a valid Hermes provider. custom:name fails at model level — only works in delegation blocks | | Context shows 0/128K in Hermes | Hermes calls /v1/models to detect context window. Proxy must return context_length in each model object. Without it, Hermes falls back to 128K which can truncate memory injection. Also set model.context_length: 1000000 in config.yaml as explicit override | | Hermes memories not loading | Caused by 128K context fallback truncating system prompt before memories get injected. Fixing context_length to 1M resolves this | | Empty result after rate limit | SDK emits rate_limit_event then returns empty result. First request usually succeeds | | node_modules cross-platform | Delete node_modules and npm install fresh when moving between Windows and Mac |

Testing

node test.js

Runs health, models, validation, non-streaming, and streaming tests.

What This Replaces

| Old (CLI Proxy) | New (SDK Proxy) | |-----------------|-----------------| | Spawns CLI subprocess per request | Native SDK query() call | | ~500ms process overhead | Near-zero overhead | | Patches nuked on npm update | No patches needed | | --dangerously-skip-permissions flag | permissionMode: 'bypassPermissions' | | Windows stdin pipe hack | Not needed | | manager.js + openai-to-cli.js patches | Single server.js |

Dependencies

Runtime:

@anthropic-ai/claude-agent-sdk — Claude Agent SDK (talks to Claude Max through the CLI keychain)
express — HTTP server
js-yaml — Parses ~/.mobygate/config.yaml
uuid — Request ID generation

Transitive (used in mcp-inspect.mjs):

@modelcontextprotocol/sdk — MCP client for diagnosing image-drop bugs in MCP servers

Frontend (loaded via CDN, no build step):

Tailwind CSS via cdn.tailwindcss.com
JetBrains Mono + VT323 via Google Fonts

Releases

Tagged releases live at github.com/khnfrhn/mobygate/releases. Pin by version when cloning for a reproducible install:

git clone https://github.com/khnfrhn/mobygate.git
cd mobygate
git checkout v0.2.0   # or any other tag
npm install && npm link && mobygate init

See CHANGELOG.md for per-version change lists.

Contributing

Designs live in Paper (artboard 01KPFE5G6MJGMT5E5MGA94DQRF). To port a new design into the dashboard:

Select the node in Paper.
Export its JSX via the Paper MCP get_jsx tool.
Hand the JSX to a Claude session along with the current index.html.
Colors, fonts, spacing, and any ASCII art will translate character-accurately.

This is how the v0.2.0 dashboard was built. Screenshots are fine for review; JSX is the source of truth for implementation.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mobygate

Why

Quick start

Dashboard

Run (without the CLI)

How It Works

Endpoints

Model Mapping

Client Configuration

OpenClaw (~/.openclaw/openclaw.json)

Hermes Agent (~/.hermes/config.yaml)

Any OpenAI-compatible client

Configuration

Diagnosing MCP Image Drops

Auth & Token Refresh

Multimodal

Tool Calling

Gotchas & Fixes

Testing

What This Replaces

Dependencies

Releases

Contributing

OpenClaw (`~/.openclaw/openclaw.json`)

Hermes Agent (`~/.hermes/config.yaml`)