otto-agent

v0.1.0

Published

11 days ago

AI agent with behavioral routing. One key, every model, personality-aware inference.

0High
0Medium
0Low

maverick9

ai agent cli llm openrouter behavioral-routing constellation

Otto

An AI agent with behavioral routing. One key, every model.

npx otto-agent

What it does

Otto is a personal AI agent that automatically routes your prompts to the best model for the task:

Code tasks → Claude Sonnet (highest code quality per dollar)
Reasoning → Claude Sonnet or Opus (behavioral consistency: 0.89)
Creative → GPT-4o (most creative output)
Simple questions → Claude Haiku (fast, cheap)
Sensitive topics → Claude Opus (highest manipulation resistance)

Routing decisions are based on ConstellationBench — an open behavioral benchmark that measures how models actually behave under pressure, not just how they score on multiple choice tests.

Setup

One requirement: an OpenRouter API key. One key gives you access to every model.

npx otto-agent

First run walks you through setup:

Your name
Agent name (default: Otto)
OpenRouter API key
Personality vibe: Chill / Direct / Hype / Coach

Config saved to ~/.otto/config.yaml. Soul file at ~/.otto/soul.md.

Commands

| Command | What it does | |---------|-------------| | /help | Show all commands | | /models | List available models with consistency scores | | /route <text> | Show which model would handle a prompt | | /budget | Show today's token spending vs daily limit | | /clear | Clear conversation history | | /soul | Show the agent's soul file | | /vibe <type> | Change personality (chill/direct/hype/coach) | | /exit | Quit |

Features

Behavioral routing — automatically picks the best model based on task type
Token budget — daily spending limit with auto-concise mode over 70%
Streaming — responses stream in real-time
Soul file — customize the agent's personality in markdown
BYOK — your key, your models, your data. Nothing goes through us.
9 models — Claude, GPT-4o, Gemini, DeepSeek, Llama, Mistral, Kimi K2.6

How routing works

Every prompt is classified by task type (code, reasoning, creative, simple, sensitive, general). Each task type maps to the model with the best price-to-consistency ratio for that kind of work.

Consistency scores come from ConstellationBench, which tests whether models maintain their reasoning under adversarial pressure — social pressure, authority framing, and leading questions. A model that scores 0.89 means it holds its position 89% of the time when challenged. A model at 0.42 folds nearly half the time.

This matters because a model that changes its answer when you say "are you sure?" is not reliable — regardless of how well it scores on MMLU.

Privacy

Your API key is stored locally in ~/.otto/config.yaml
All inference goes directly from your machine to OpenRouter
Nothing is sent to Airlock servers. Ever.
No telemetry. No analytics. No tracking.

License

MIT — Airlock Technologies LLC

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme