@auwra/n8n-nodes-ai-router
v0.1.6
Published
Automatically route AI tasks to the most appropriate and cost-effective model across multiple providers
Readme
@auwra/n8n-nodes-ai-router
An n8n community node that automatically routes each prompt to the best AI model across Anthropic, OpenAI, Google Gemini, Mistral AI, Groq, and local Ollama — based on what the task actually needs.
Instead of hardcoding one model, the AI Router detects whether a prompt is a coding task, analysis, creative writing, summarization, vision, or chat, then picks the optimal model for your priority: cheapest, fastest, highest quality, or balanced. It falls back to the next-best model automatically if the first one fails.
Table of contents
- Quick start
- Installation
- Configuration
- How routing works
- Choosing the right mode
- Model registry
- Keeping the registry up to date
- Adding a custom model
- Example workflows
- Changelog
- Contributing
- License
Quick start
- Install the node (see Installation)
- Add the AI Router node to any workflow
- Set up credentials — paste in at least one API key (Groq has a free tier)
- Connect your prompt source and run
That's it. The node detects the task, picks the best model, calls it, and returns response. No configuration required for basic use.
Installation
Via n8n Community Nodes UI (recommended)
- Go to Settings → Community Nodes
- Click Install
- Enter
@auwra/n8n-nodes-ai-router - Click Install and restart if prompted
Via npm (self-hosted)
cd ~/.n8n
npm install @auwra/n8n-nodes-ai-router
# Restart n8nCredentials setup
The node uses a single credential object called AI Router Credentials that holds all your API keys in one place. Fill in only the providers you have — the router automatically skips providers with no key.
| Field | Where to get it |
|---|---|
| Anthropic API Key | console.anthropic.com |
| OpenAI API Key | platform.openai.com |
| Google Gemini API Key | aistudio.google.com |
| Mistral AI API Key | console.mistral.ai |
| Groq API Key (free tier) | console.groq.com |
| Ollama Base URL | http://localhost:11434 (no key needed) |
One provider is enough to get started. Groq is the easiest: free tier, no credit card.
Configuration
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
| Prompt | string | — | The user message to send to the AI model |
| System Prompt | string | — | Optional system-level instruction: persona, output format, constraints |
| Temperature | number | 0.7 | Sampling temperature 0–2. 0 = deterministic, 2 = very creative. Ignored by reasoning models. |
Routing
| Parameter | Type | Default | Description |
|---|---|---|---|
| Routing Mode | enum | auto | How to prioritise model selection |
| Task Hint | enum | auto-detect | Override automatic task detection |
Filtering / budget
| Parameter | Type | Default | Description |
|---|---|---|---|
| Allowed Providers | multiselect | all | Which providers are eligible |
| Max Cost Per 1K Tokens | number | 0 (no limit) | Hard budget cap in USD — models above this are excluded |
Generation
| Parameter | Type | Default | Description |
|---|---|---|---|
| Max Tokens | number | 0 (provider default) | Maximum tokens to generate |
Behaviour
| Parameter | Type | Default | Description |
|---|---|---|---|
| Enable Fallback | boolean | true | Retry with next-best model on 429/5xx errors (up to 3 attempts) |
| Dry Run (Routing Only) | boolean | false | Select the best model and return routing info but do NOT call any API — no tokens spent |
| Max Items Per Execution | number | 10 | Hard cap on items per run. Set to 0 to disable. |
Output options
| Parameter | Type | Default | What it adds to output |
|---|---|---|---|
| Include Model Info | boolean | false | modelUsed, providerUsed, attemptsTaken, inputTokens, outputTokens |
| Include Detected Task | boolean | false | detectedTask, detectedTaskConfidence |
| Include Score Breakdown | boolean | false | scoreBreakdown — top-3 candidates with final score and per-criterion sub-scores |
| Include Estimated Cost | boolean | false | estimatedCostUSD — calculated from token counts × registry pricing |
Routing modes
| Mode | Best for | What it optimises |
|---|---|---|
| auto | General-purpose workflows | Balanced mix of quality, cost, and speed |
| quality | Critical outputs, production content | Task-specific model quality above all else |
| cost | High-volume, budget-sensitive workflows | Cheapest model that can do the job |
| speed | Real-time, latency-sensitive workflows | Lowest-latency model first |
| local | Privacy-sensitive data, offline use | Ollama only — zero cost, no data leaves your machine |
Task hint values
| Value | Auto-detected when prompt contains |
|---|---|
| coding | Code snippets, language names, file extensions, debug/refactor/implement |
| writing | write/draft/compose + document type (email, blog, essay, story, ad copy…) |
| analysis | analyze, evaluate, compare, pros and cons, explain why, root cause |
| summarization | summarize, tl;dr, key points, in N bullets, executive summary |
| classification | classify, categorize, sentiment, true/false, spam detection |
| vision | Image URLs, base64 image data, OCR, visual content |
| embeddings | embed, vector, semantic search, RAG, cosine similarity |
| chat | Greetings, open-ended questions (default fallback) |
How routing works
flowchart TD
A([Prompt received]) --> B{Task hint set?}
B -- Yes --> D[Use hint as task type]
B -- No --> C[taskDetector\nweighted regex patterns]
C --> D
D --> E[scoreModels\nfilter + score all candidates]
E --> F{allowedProviders filter}
F --> G{maxCostPer1K budget cap}
G --> H{capability requirements\nvision / embeddings}
H --> I{context window\n≥ prompt length}
I --> J[Score each model\ntaskFit · cost · latency · contextSize]
J --> K[Sort descending — best first]
K --> L[executeWithFallback\nattempt 1: top model]
L -- success --> M([Output])
L -- 429 / 5xx / network --> N{fallback enabled?}
N -- Yes --> O[attempt 2: next model]
O -- success --> M
O -- fail --> P[attempt 3: next model]
P -- success --> M
P -- all fail --> Q([Error])
N -- No --> Q
L -- 400 / 401 / 403 --> QScoring formula
Each candidate model gets a score (0–1):
score = w_taskFit × taskAffinity[task]
+ w_cost × (1 − blendedPer1K / maxInPool)
+ w_latency × (1 − (latencyTier − 1) / 2)
+ w_context × log(contextWindow + 1) / log(maxInPool + 1)Context uses log normalization so a single model with a huge context window (e.g. 10M tokens) doesn't collapse every other model's score to near zero.
Weights by mode:
| Mode | taskFit | cost | latency | contextSize | |---|---|---|---|---| | auto | 0.35 | 0.25 | 0.20 | 0.20 | | quality | 0.70 | 0.05 | 0.05 | 0.20 | | cost | 0.20 | 0.60 | 0.10 | 0.10 | | speed | 0.25 | 0.15 | 0.50 | 0.10 | | local | 0.40 | 0.40 | 0.10 | 0.10 |
Choosing the right mode
Use quality when: output accuracy matters (production content, customer-facing responses, complex reasoning). The router will pick the model most specialised for the detected task — Claude Opus for analysis, Devstral for code, Gemini Pro for vision.
Use cost when: you're running high volume and the task is simple (classification, summarization, short chat). Expect Groq or Gemini Flash Lite to win most of the time.
Use speed when: you need sub-second responses (real-time chat, live autocomplete). All tier-1 models are fast; the router picks the most capable one among them.
Use auto when: you're unsure. It's a sensible middle ground — it won't pick the most expensive model for a simple greeting, but it won't use the cheapest one for a complex analysis either.
Use local when: prompts contain sensitive data you can't send to cloud APIs, or you're working offline.
Combine mode with Allowed Providers for precise control: quality mode with only anthropic + openai ensures only flagship models are used.
Model registry
Pricing verified April 2026. blendedPer1K = (input×0.7 + output×0.3) / 1000.
Anthropic
| Model ID | Input/1M | Output/1M | Context | Best for |
|---|---|---|---|---|
| claude-opus-4-6 | $5.00 | $25.00 | 1M | Complex analysis, deep reasoning |
| claude-sonnet-4-6 | $3.00 | $15.00 | 1M | Balanced quality across all tasks |
| claude-haiku-4-5-20251001 | $1.00 | $5.00 | 200K | Fast chat, classification, vision |
OpenAI
| Model ID | Input/1M | Output/1M | Context | Best for |
|---|---|---|---|---|
| gpt-4.1 | $2.00 | $8.00 | 1M | General chat, coding, vision |
| gpt-4o | $2.50 | $10.00 | 128K | Multimodal, vision-heavy tasks |
| o3 | $2.00 | $8.00 | 200K | Deep reasoning, complex analysis (no streaming) |
| o4-mini | $1.10 | $4.40 | 200K | Cheaper reasoning, STEM, code |
| gpt-4o-mini | $0.15 | $0.60 | 128K | Cheap chat, classification, vision |
Google Gemini
| Model ID | Input/1M | Output/1M | Context | Best for |
|---|---|---|---|---|
| gemini-3.1-pro-preview | $2.00 | $12.00 | 1M | Cutting-edge quality (preview) |
| gemini-2.5-pro | $1.25 | $10.00 | 1M | Long-context analysis, vision |
| gemini-3-flash-preview | $0.50 | $3.00 | 1M | Fast next-gen tasks (preview) |
| gemini-2.5-flash | $0.30 | $2.50 | 1M | Fast summarization, cheap vision |
| gemini-2.5-flash-lite | $0.10 | $0.40 | 1M | Ultra-cheap classification |
Mistral
| Model ID | Input/1M | Output/1M | Context | Best for |
|---|---|---|---|---|
| mistral-large-2512 | $0.50 | $1.50 | 262K | Cost-efficient coding, analysis |
| mistral-medium-3 | $0.40 | $2.00 | 131K | Balanced general tasks |
| mistral-small-4-0-26-03 | $0.10 | $0.30 | 262K | Creative writing, chat |
| devstral-2-25-12 | $0.10 | $0.30 | 256K | Code generation (SWE-bench 72.2%) |
Groq (ultra-fast inference)
| Model ID | Input/1M | Output/1M | Context | Best for |
|---|---|---|---|---|
| moonshotai/kimi-k2-instruct | $1.00 | $3.00 | 1M | Long-context analysis, agentic |
| llama-3.3-70b-versatile | $0.59 | $0.79 | 128K | Low-latency general tasks |
| qwen/qwen3-32b | $0.29 | $0.59 | 128K | Coding, multilingual, reasoning |
| openai/gpt-oss-120b | $0.15 | $0.60 | 128K | Balanced quality at ~500 t/s |
| meta-llama/llama-4-scout-17b-16e-instruct | $0.11 | $0.34 | 10M | Huge-context vision, ultra-cheap |
| openai/gpt-oss-20b | $0.075 | $0.30 | 128K | Fastest throughput (~1000 t/s) |
| llama-3.1-8b-instant | $0.05 | $0.08 | 128K | Cheapest, sub-100ms responses |
Ollama (local)
Any model you've pulled via ollama pull <model> works. Set Ollama Model to the model name and Ollama Base URL to your instance address.
Keeping the registry up to date
Provider APIs change quickly. Use the built-in sync script to check for stale or new model IDs:
npm run build
npm run sync:modelsThe script hits each provider's live /models endpoint and reports:
- Stale — IDs in the registry that no longer exist
- New — IDs available on the provider not yet in the registry
What must still be updated manually in modelRegistry.ts:
- Pricing (check each provider's pricing page)
- Task affinity scores
- Latency tier and context window size
Recommended cadence: run sync:models monthly or after a major model release.
Adding a custom model
Edit only one file: nodes/AiRouter/router/modelRegistry.ts. Append a new entry to MODEL_REGISTRY:
{
id: 'your-model-api-id', // exact string sent in API requests
provider: 'openai', // must match an existing ProviderType
displayName: 'My Model',
pricing: {
inputPer1M: 1.00,
outputPer1M: 4.00,
blendedPer1K: 0.0019, // (1.00×0.7 + 4.00×0.3) / 1000
},
capabilities: {
supportsVision: false,
supportsEmbeddings: false,
supportsStreaming: true,
supportsReasoningMode: false,
isLocal: false,
contextWindow: 128_000,
},
latencyTier: 1, // 1=fast 2=moderate 3=slow/reasoning
taskAffinity: {
coding: 0.88,
chat: 0.85,
// Omit tasks where the model has no particular strength (defaults to 0.5)
},
},Then rebuild: npm run build
For a new provider (new API format), see CONTRIBUTING.md.
Example workflows
Basic chatbot with smart routing
- Webhook → receives
{ "message": "..." } - AI Router
- Prompt:
{{ $json.message }} - Mode:
auto - Enable Fallback: on
- Prompt:
- Respond to Webhook →
{{ $json.response }}
The router detects whether the message is a coding question, analysis request, or casual chat and picks accordingly.
Quality-first content pipeline
- Schedule Trigger → fires daily
- HTTP Request → fetches data to process
- AI Router
- Prompt:
Analyze the following data and write a professional summary: {{ $json.data }} - Mode:
quality - Allowed Providers: Anthropic, OpenAI, Google
- Include Model Info: on
- Prompt:
- Google Sheets → saves
response,modelUsed, token counts
Mode quality with flagship providers ensures you always get the best model for the task. Token counts let you track spend.
Budget-capped high-volume classification
- Spreadsheet Trigger → rows to classify
- AI Router
- Prompt:
Classify this support ticket as "billing", "technical", or "general": {{ $json.ticket }} - Task Hint:
classification - Mode:
cost - Max Cost Per 1K Tokens:
0.001 - Max Items Per Execution:
100
- Prompt:
- Spreadsheet → write back
{{ $json.response }}
Hard-coding classification as the task hint skips detection overhead and ensures the cost-efficient classification models are preferred. The budget cap keeps costs bounded.
Full output (all options enabled)
{
"response": "Here is the TypeScript function you requested:\n\n```typescript\nfunction debounce...",
"modelUsed": "devstral-2-25-12",
"providerUsed": "mistral",
"attemptsTaken": 1,
"inputTokens": 25,
"outputTokens": 459,
"estimatedCostUSD": 0.0000073,
"detectedTask": "coding",
"detectedTaskConfidence": 0.91,
"scoreBreakdown": [
{ "model": "devstral-2-25-12", "provider": "mistral", "score": 0.9289, "breakdown": { "taskFit": 1.000, "cost": 0.985, "latency": 0.500, "contextSize": 0.772 } },
{ "model": "moonshotai/kimi-k2-instruct", "provider": "groq", "score": 0.8800, "breakdown": { "taskFit": 0.880, "cost": 0.855, "latency": 1.000, "contextSize": 0.857 } },
{ "model": "o3", "provider": "openai", "score": 0.8632, "breakdown": { "taskFit": 0.970, "cost": 0.655, "latency": 0.000, "contextSize": 0.757 } }
]
}Dry-run output
When Dry Run is enabled, no API call is made and the output is:
{
"dryRun": true,
"selectedModel": "devstral-2-25-12",
"selectedProvider": "mistral",
"selectedScore": 0.9289,
"detectedTask": "coding",
"detectedTaskConfidence": 0.91,
"scoreBreakdown": [ ... ]
}Changelog
v0.1.6
- Add:
System Promptparameter — optional system-level instruction passed to all providers - Add:
Temperatureparameter (0–2, default 0.7) — ignored automatically for reasoning models - Add:
Dry Runtoggle — returns routing decision without spending any tokens; includes selected model, score, detected task, and score breakdown - Add:
Include Detected Taskoutput option — exposesdetectedTaskanddetectedTaskConfidencein the output - Add:
Include Score Breakdownoutput option — exposes top-3 ranked candidates with final scores and per-criterion sub-scores (taskFit, cost, latency, contextSize) - Add:
Include Estimated Costoutput option — computesestimatedCostUSDfrom token counts × registry pricing
v0.1.5
- Fix: Quality mode now reliably selects flagship models — context score uses log normalization (prevents a single 10M-context model from collapsing all 1M-context scores to 0.1), and quality-mode weights raised
taskFitto 0.70 - Add:
Max Items Per Executionparameter (default10) — hard cap on items processed per run to prevent cost drain from accidental loops or large batches
v0.1.4
- Fix: Anthropic requests no longer hang indefinitely — timeout now correctly catches
AbortErrorin Node.js - Fix:
max_tokensalways included in Anthropic requests (required by the API) - Fix: Anthropic responses from reasoning models parsed correctly — text block found by type, not position
v0.1.2
- Initial public release
Contributing
See CONTRIBUTING.md for:
- How to add a new model (one object in an array)
- How to add a new provider adapter
- Commit conventions
- How to test locally
