otto-agent
v0.1.0
Published
AI agent with behavioral routing. One key, every model, personality-aware inference.
Maintainers
Readme
Otto
An AI agent with behavioral routing. One key, every model.
npx otto-agentWhat it does
Otto is a personal AI agent that automatically routes your prompts to the best model for the task:
- Code tasks → Claude Sonnet (highest code quality per dollar)
- Reasoning → Claude Sonnet or Opus (behavioral consistency: 0.89)
- Creative → GPT-4o (most creative output)
- Simple questions → Claude Haiku (fast, cheap)
- Sensitive topics → Claude Opus (highest manipulation resistance)
Routing decisions are based on ConstellationBench — an open behavioral benchmark that measures how models actually behave under pressure, not just how they score on multiple choice tests.
Setup
One requirement: an OpenRouter API key. One key gives you access to every model.
npx otto-agentFirst run walks you through setup:
- Your name
- Agent name (default: Otto)
- OpenRouter API key
- Personality vibe: Chill / Direct / Hype / Coach
Config saved to ~/.otto/config.yaml. Soul file at ~/.otto/soul.md.
Commands
| Command | What it does |
|---------|-------------|
| /help | Show all commands |
| /models | List available models with consistency scores |
| /route <text> | Show which model would handle a prompt |
| /budget | Show today's token spending vs daily limit |
| /clear | Clear conversation history |
| /soul | Show the agent's soul file |
| /vibe <type> | Change personality (chill/direct/hype/coach) |
| /exit | Quit |
Features
- Behavioral routing — automatically picks the best model based on task type
- Token budget — daily spending limit with auto-concise mode over 70%
- Streaming — responses stream in real-time
- Soul file — customize the agent's personality in markdown
- BYOK — your key, your models, your data. Nothing goes through us.
- 9 models — Claude, GPT-4o, Gemini, DeepSeek, Llama, Mistral, Kimi K2.6
How routing works
Every prompt is classified by task type (code, reasoning, creative, simple, sensitive, general). Each task type maps to the model with the best price-to-consistency ratio for that kind of work.
Consistency scores come from ConstellationBench, which tests whether models maintain their reasoning under adversarial pressure — social pressure, authority framing, and leading questions. A model that scores 0.89 means it holds its position 89% of the time when challenged. A model at 0.42 folds nearly half the time.
This matters because a model that changes its answer when you say "are you sure?" is not reliable — regardless of how well it scores on MMLU.
Privacy
- Your API key is stored locally in
~/.otto/config.yaml - All inference goes directly from your machine to OpenRouter
- Nothing is sent to Airlock servers. Ever.
- No telemetry. No analytics. No tracking.
License
MIT — Airlock Technologies LLC
