anymodel

v1.17.0

Published

a month ago

Universal AI model proxy — route any coding tool through OpenRouter, Ollama, LMStudio, llama.cpp, or any LLM provider

0High
0Medium
0Low

aabyzov

openrouter ollama lmstudio llama.cpp llm proxy ai cli model-router

AnyModel

Universal AI coding tool — use GPT-5.4, Gemini 3.1, DeepSeek R1, Codex, Llama, and 300+ models through one interface.

AnyModel is an AI coding assistant that works with any model. It includes a proxy that routes requests to OpenRouter (300+ cloud models), local backends (Ollama, LMStudio, llama.cpp), or any OpenAI-compatible API — with smart retries, format translation, and zero dependencies.

anymodel.dev — full docs, presets, and FAQ.

Watch the Demo

Quick Start

# Terminal 1 — start AnyModel proxy with a model:
OPENROUTER_API_KEY=sk-or-v1-your-key npx anymodel proxy deepseek

# Terminal 2 — launch AnyModel:
npx anymodel

The model is set on the proxy via preset or --model. Connecting is always just npx anymodel.

Get your free OpenRouter key at openrouter.ai/keys — no credit card for free models.

Presets

# Paid models:
npx anymodel proxy gpt        # → openai/gpt-5.4                       (paid)
npx anymodel proxy codex      # → openai/gpt-5.3-codex                 (paid, coding)
npx anymodel proxy gemini     # → google/gemini-3.1-flash-lite-preview  (paid)
npx anymodel proxy deepseek   # → deepseek/deepseek-r1-0528            (paid)
npx anymodel proxy mistral    # → mistralai/devstral-2512               (paid, coding)
npx anymodel proxy gemma      # → google/gemma-4-31b-it                (paid, coding)

# Free models:
npx anymodel proxy qwen       # → qwen/qwen3-coder:free                (free)
npx anymodel proxy nemotron   # → nvidia/nemotron-3-super-120b-a12b:free (free)
npx anymodel proxy llama      # → meta-llama/llama-3.3-70b-instruct:free (free)

Or any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508

How It Works

AnyModel client → anymodel proxy (:9090) → OpenRouter / Ollama / LMStudio / llama.cpp

The proxy intercepts requests, strips provider-specific fields, handles retries with exponential backoff, and streams responses back.

Multiple Models at Once

Run separate instances on different ports:

npx anymodel proxy --port 9090 --model openai/gpt-5.4
npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528
npx anymodel proxy --port 9092 --model google/gemini-3.1-flash-lite-preview

Local Backends

No internet, no API key — run everything on your machine. AnyModel treats Ollama, LMStudio, and llama.cpp as first-class backends, each with its own preset:

npx anymodel proxy ollama --model gemma3n            # Ollama    (:11434)
npx anymodel proxy lmstudio --model qwen3-coder      # LMStudio  (:1234/v1)
npx anymodel proxy llamacpp --model my-model         # llama.cpp (:8080/v1)

| Backend | Port | API | Best for | |---------|------|-----|----------| | Ollama | 11434 | Native (think:false suppresses reasoning-token waste on qwen3/deepseek) | One-line model pulls, managed model library | | LMStudio | 1234/v1 | OpenAI-compatible | GUI model browser, easy swapping between loaded models | | llama.cpp | 8080/v1 | OpenAI-compatible | Rawest/smallest footprint, max control (context, GPU layers, batch, quantization) |

GGUF portability: The same GGUF model file runs across all three — only the wrapper UX differs. Download once, use anywhere. llama.cpp is the inference engine under Ollama and LMStudio.

Override endpoints via env:

LMSTUDIO_BASE_URL=http://192.168.1.50:1234/v1 npx anymodel proxy lmstudio
LLAMACPP_BASE_URL=http://localhost:9000/v1    npx anymodel proxy llamacpp

Auto-detection priority when no preset is given: OpenRouter key → OpenAI key → Ollama → LMStudio → llama.cpp.

Local-provider smart defaults (1.11.0+)

When you connect to a local provider, AnyModel automatically suppresses your globally-configured MCP servers — which are usually the single biggest cause of slow first-response times (50–60 K tokens of tool schemas that local models can't handle).

npx anymodel on a local provider → loads project ./.claude/.mcp.json if present, else no MCP
Keeps project skills, agents, CLAUDE.md
Remote providers (openrouter, openai) unchanged
Opt out: --full-mcp flag or ANYMODEL_FULL_MCP=1

See LOCAL_SETUP.md for the full guide, including 32 K context setup and full isolation.

Universal Skills (1.16.0+)

SKILL.md is one shared open standard — Claude Code, OpenAI/Codex, Gemini/Antigravity, Cursor, and Copilot all read the same format (a <name>/SKILL.md directory with YAML frontmatter + Markdown body). AnyModel auto-discovers your skills no matter which tool's convention you used, with zero format translation.

At launch, AnyModel scans these roots in both the project working directory and $HOME:

.claude/skills/    .agents/skills/    .codex/skills/    .gemini/skills/    .agent/skills/

Each discovered skill is symlinked into a per-session temp .claude/skills shadow that is passed to the client via --add-dir, so the client's native SKILL.md reader and progressive disclosure handle everything.

Project wins on collision — a project .claude/skills/<name> shadows a foreign-root skill of the same name.
Duplicates and unlinkable skills are logged — foreign-root name collisions and any skills that can't be symlinked are surfaced, not silently dropped.
Add or override roots with ANYMODEL_SKILL_ROOTS — a colon-separated list of absolute paths merged into discovery.

ANYMODEL_SKILL_ROOTS=/opt/shared/skills:/Users/me/extra/skills npx anymodel

OpenAI-Compatible APIs

Works with OpenAI, Azure, Together, Groq, vLLM, and any OpenAI-compatible endpoint:

OPENAI_API_KEY=sk-your-key npx anymodel proxy openai --model gpt-4o

# Terminal 2:
npx anymodel

Bidirectional translation: Anthropic Messages API ↔ OpenAI Chat Completions.

Claude Code --effort / /effort is forwarded as OpenAI reasoning_effort for compatible OpenAI reasoning/codex models on the official OpenAI API. Local OpenAI-compatible servers do not receive it by default; set ANYMODEL_FORWARD_EFFORT=1 only if your endpoint accepts that field.

CLI Reference

anymodel                              # launch AnyModel (connect to proxy)
anymodel proxy <preset>               # start proxy with preset
anymodel proxy --model <id>           # start proxy with any model
anymodel proxy ollama --model <name>  # proxy with local Ollama    (:11434)
anymodel proxy lmstudio --model <id>  # proxy with LMStudio        (:1234/v1)
anymodel proxy llamacpp --model <id>  # proxy with llama.cpp       (:8080/v1)
anymodel claude                       # run with native Claude (no proxy)

Options:
  --model, -m     Model ID
  --port, -p      Port (default: 9090)
  --free-only     Block paid models
  --token, -t     Require auth token for requests
  --rpm           Rate limit requests/min (default: 60)
  --help, -h      Help

Ollama Performance Optimizations

When proxying to Ollama, AnyModel automatically applies several optimizations to make local models work well with coding tools:

System prompt condensing — AI tool prompts are 50-100KB; AnyModel condenses them to fit Ollama's context window (OLLAMA_MAX_SYSTEM_CHARS)
Tool description trimming — truncates verbose tool descriptions to save context (OLLAMA_MAX_TOOL_DESC, default 100 chars)
Tool count limiting — limits tools sent to the model, always keeping core tools (Bash/Read/Write/Edit/Grep/Glob) (OLLAMA_MAX_TOOLS)
Prefix-aware caching — stabilizes system prompt + tool ordering for Ollama KV cache reuse across requests, with date normalization and description-independent hashing
HTTP keep-alive — reuses TCP connections to Ollama
count_tokens mock — responds to /v1/messages/count_tokens locally, preventing cascading 500 errors

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | OPENROUTER_API_KEY | — | Your OpenRouter key (get one free) | | OPENROUTER_MODEL | — | Default model override | | OPENAI_API_KEY | — | Key for OpenAI-compatible APIs | | OPENAI_BASE_URL | https://api.openai.com/v1 | Custom endpoint for the openai provider | | LMSTUDIO_BASE_URL | http://localhost:1234/v1 | LMStudio endpoint override | | LLAMACPP_BASE_URL | http://localhost:8080/v1 | llama.cpp (llama-server) endpoint override | | PROXY_PORT | 9090 | Proxy port | | ANYMODEL_CLIENT | — | Path to custom Claude-compatible client; otherwise AnyModel uses bundled cli.js, cwd cli.js, then global claude | | ANYMODEL_TOKEN | — | Auth token for remote mode | | ANYMODEL_SKILL_ROOTS | — | Colon-separated absolute paths added to skill discovery roots | | ANYMODEL_FORWARD_EFFORT | auto | 1/0 override for forwarding Claude effort as OpenAI reasoning_effort | | OLLAMA_NUM_CTX | 8192 | Ollama context window size | | OLLAMA_KEEP_ALIVE | 30m | How long Ollama keeps model in GPU memory | | OLLAMA_MAX_SYSTEM_CHARS | 4000 | System prompt condensing threshold | | OLLAMA_MAX_MSG_CHARS | max(4000, num_ctx*3) | Message history threshold | | OLLAMA_TOOLS | auto | Tool capability: auto/on/off | | OLLAMA_MAX_TOOLS | 0 (unlimited) | Max tools to send (core tools always kept) | | OLLAMA_MAX_TOOL_DESC | 100 | Max tool description length in chars |

OPENROUTER_API_KEY is only needed when starting the proxy. OLLAMA_* variables only apply to the Ollama provider.

License

MIT — Anton Abyzov