@therr/tiny-code
v0.4.0
Published
A small, extensible CLI coding agent with interchangeable Anthropic, Gemini, OpenAI, DeepSeek, Qwen, and local (Ollama) models.
Downloads
0
Maintainers
Readme
tiny-code
A small, extensible CLI coding agent built around one constraint: keep token usage low. As coding-agent costs climb, tiny-code automates the savings so you don't have to. Interactive terminal REPL, interchangeable Anthropic, Gemini, OpenAI, DeepSeek, Qwen Coder, and local (Ollama) models, and just the core features you actually use: read/write/edit files, run shell commands, search code, and a custom commands/skills system. No business logic baked in.
Run cheap, open-weight models locally and escalate heavy work to a frontier model only when needed — see Local models & cost-aware routing.
Status: early (v0.x). Published as
@therr/tiny-code; the binary istiny-code. APIs and config may still change between minor versions.
Install
npm install -g @therr/tiny-codeOr run from source:
npm install
npm run build
node dist/cli.jsSetup
Provide at least one API key. If several are set, the default is the first available in this order: Anthropic, Gemini, OpenAI, DeepSeek, Qwen.
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...
export OPENAI_API_KEY=sk-...
export DEEPSEEK_API_KEY=sk-...
export QWEN_API_KEY=sk-... # Alibaba DashScope key (DASHSCOPE_API_KEY also works)OpenAI, DeepSeek, and Qwen are hosted, OpenAI-compatible models. Override their
endpoints with TINY_CODE_OPENAI_URL (e.g. for Azure OpenAI), TINY_CODE_DEEPSEEK_URL,
or TINY_CODE_QWEN_URL (or openaiBaseUrl / deepseekBaseUrl / qwenBaseUrl in config)
— e.g. to point Qwen at the international DashScope host.
Usage
tiny-code # start the REPL (uses an available key)
tiny-code --provider gemini # force a provider
tiny-code --model claude-opus-4-8
tiny-code --provider openai --model gpt-4.1 # OpenAI (also o3, o4-mini, …)
tiny-code --provider deepseek --model deepseek-v4-pro # DeepSeek's coding model
tiny-code --provider qwen --model qwen3-coder-plus # Qwen Coder
tiny-code --provider ollama --model gemma3:12b # run a local model (no API cost)In the REPL: type a request, watch it work. Mutating actions (writes, edits, shell commands) prompt for approval unless pre-approved in config.
/help— list commands/costs— session token usage, estimated $ cost, and cost-saving tips/clear— clear the conversation history and start fresh/models— show known models, pricing, and the active one (see below)/priority [performance|balanced|cost]— show or switch the cost/performance priority mid-session; re-picks the auto-selected model unless one is pinned (see below)/improve— reflect on the session and propose an improvement PR (see below)/<name> [args]— run a custom command (see below)/exit— quit
Local models & cost-aware routing
tiny-code talks to a local Ollama server over its
OpenAI-compatible API, so any model you've pulled is available — including
Google Gemma 3 (gemma3:4b, gemma3:12b, gemma3:27b) and
qwen2.5-coder (the default, which tool-calls reliably).
ollama serve
ollama pull qwen2.5-coder:7b
tiny-code --provider ollama --model qwen2.5-coder:7bMind the compute cost. Local models are free of API charges but use your machine's RAM/VRAM. On startup with an Ollama model, tiny-code prints how much memory the model needs versus what's free, and warns if it likely won't fit or if the model is too small (≤3B) to tool-call reliably. Rough guide (≈Q4):
| Model | ~RAM needed | Good for |
| ------------ | ----------- | --------------------------------- |
| gemma3:1b | ~1 GB | trivial text (poor at tool calls) |
| gemma3:4b | ~3 GB | lightweight edits, search |
| gemma3:12b | ~7 GB | most coding tasks |
| gemma3:27b | ~16 GB | stronger reasoning |
Local-first routing. Set a routing of local-first with an escalateTo
target: every turn starts on the cheap/local model, and tiny-code escalates to
the frontier model when a turn looks heavy (refactors, debugging, multi-file
work) or when the local model gets stuck and calls the built-in escalate tool.
You get local speed and zero cost for the bulk of the work, and frontier power
only for the hard parts. Run /costs any time for usage, spend, and tips.
{
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"routing": "local-first",
"escalateTo": { "provider": "anthropic", "model": "claude-opus-4-8" }
}Project context
On start, the agent walks up from the working directory looking for AGENTS.md
(or CLAUDE.md) and includes it in the system prompt — put your project
conventions there.
Custom commands (skills)
Drop markdown files with YAML frontmatter into ./.agent/commands/ (per project)
or ~/.config/tiny-code/commands/ (global):
---
name: review
description: Review the staged diff for bugs
argument-hint: "[--strict]"
---
Review the staged git diff. Run `git diff --cached`, then report any bugs,
risky changes, or missing tests. $ARGUMENTSInvoke with /review --strict. $ARGUMENTS is replaced with whatever you type
after the command (appended if the placeholder is absent).
Configuration
Optional tiny-code.config.json in the project root (or
~/.config/tiny-code/config.json). Precedence: defaults < config file < env <
CLI flags.
{
"provider": "anthropic",
"model": "claude-opus-4-8",
"ollamaBaseUrl": "http://localhost:11434/v1",
"priority": "balanced",
"maxTokens": 16000,
"thinking": true,
"effort": "high",
"maxIterations": 50,
"routing": "off",
"escalateTo": { "provider": "anthropic", "model": "claude-opus-4-8" },
"allow": {
"tools": [],
"bash": ["npm test", "git status", "git diff"],
"write": ["src/**"]
}
}allow pre-approves mutating actions so they skip the confirmation prompt:
bash matches command prefixes, write matches path globs for write/edit.
routing: "local-first" plus escalateTo enables cost-aware routing (see
above); it defaults to local-first
automatically whenever escalateTo is present. ollamaBaseUrl points at your
Ollama server's OpenAI-compatible endpoint; deepseekBaseUrl / qwenBaseUrl
override the DeepSeek and Qwen (DashScope) endpoints.
Approximate cloud pricing used for the /costs estimate lives in the model
catalog (src/models/catalog.ts) — edit it to match current vendor rates.
Token efficiency
Minimizing token usage is a first-class goal — coding-agent bills grow fast, and you shouldn't need a complex setup to control them. tiny-code automates the savings where it can:
Usage visible by default. Every assistant turn prints
↑ in ↓ out tokenswith no configuration. On exit you get a session total.Bounded tool output. grep caps at 200 matches, glob at 500 files, and
read_filesupportsoffset/limitto pull only the lines you need — preventing runaway context growth automatically.Minimal system prompt. The built-in persona is kept short. Tool schemas are generated from Zod (no duplicate prose). Project context is opt-in.
Concise agent instructions. The agent is explicitly told to avoid restating the task or narrating completed steps.
Effort control. For Anthropic models,
efforttunes the adaptive thinking budget. Drop it from the default"high"to"medium"or"low"for simpler, cheaper tasks:{ "effort": "medium" }Or set it per-session with
TINY_CODE_EFFORT=medium.
Coming: automatic conversation compaction once histories grow long
(see TODO.md), which will keep input-token counts from compounding across
many turns without any user action.
Model awareness & cost control
tiny-code ships a small, curated catalog of coding models
(src/models/catalog.ts) with each model's pricing, context window, and a
relative coding-aptitude score. It uses this to turn raw token counts into real
money and to pick a model that fits your cost/performance preference.
Dollar cost, not just tokens. Per-turn usage and the session total show an estimated USD cost next to the token counts, priced from the active model's rate — so the bill is visible as you work, not a surprise later.
/modelslists the catalog (cheapest first) with pricing and scores, marks the active model, and shows the session's running cost.Priority-driven selection. When you don't pin a
model, tiny-code picks one for you based onpriority:|
priority| Picks | | --------------- | --------------------------------------------------------------- | |balanced| The best capability-per-dollar among capable models (default). | |performance| The most capable model, ignoring price. | |cost| The cheapest still-capable model. |balancedis the default: it ranks capable models bycodingScore / blendedCostPerMTok(a model's coding aptitude per blended dollar, weighting input 80% / output 20%) behind a quality floor, so you get strong-but-sensibly-priced models without opting in.{ "priority": "performance" }Or per-session with
TINY_CODE_PRIORITY=cost, or on the fly with the/prioritycommand (e.g./priority performanceto jump to the most capable model when a task gets hard, then/priority balancedto drop back). Pinningmodel(config, env, or--model) always overrides the recommendation.
The catalog is curated and offline (tiny-code has no live model-discovery yet —
see TODO.md), so its prices carry an "as of" date; keep it current as vendors
ship new models and change pricing.
Self-improvement
tiny-code can learn from how it's used. When a session ends (or when you run
/improve), it reflects on the conversation transcript looking for recurring
friction — tool errors, repeated retries, denied permissions, missing
capabilities. If it finds a concrete improvement, it asks for your permission to
open a pull request.
That PR contains only a single markdown file under improvements/
describing the proposed change, targeting main for a maintainer to review and
implement separately. It never contains code changes — this is enforced
structurally (the PR creator only ever stages one regex-validated markdown path),
so a prompt-injected session cannot smuggle code into a PR.
PRs are opened via the gh CLI, which must be
installed and authenticated (gh auth login); the working tree must be clean.
{
"improve": {
"enabled": true,
"baseBranch": "main",
"onSessionEnd": true
}
}The feature is on by default. Set improve.enabled to false (or export
TINY_CODE_IMPROVE=0) to disable it entirely; set onSessionEnd to false to
keep /improve but skip the automatic reflection at exit.
Development
npm install
npm run typecheck
npm run lint
npm test
npm run buildSee TODO.md for the deferred-features backlog (sub-agents, web search, MCP,
rich TUI, one-shot mode, conversation persistence).
Contributing
Contributions are accepted from authorized contributors only, and are governed by the License — by submitting a contribution you agree to those terms (including the assignment of rights in clause 4). This is not an open-source project: please do not fork, mirror, or redistribute the codebase (see the License).
Workflow for an authorized change:
Open an issue first to discuss the change and get sign-off from a maintainer.
Branch from
main, keep changes focused, and follow the existing code style.Before opening a PR, make sure the full check suite passes:
npm run typecheck npm run lint npm test npm run buildAdd or update tests for any behavior change, and add a changeset (
npx changeset) when the change is user-facing.Open a PR targeting
mainfor maintainer review.
Self-improvement workflow
tiny-code can also propose its own improvements — a low-friction way to feed
ideas back to maintainers without writing code. When a session ends (or when you
run /improve), the agent reflects on the transcript for recurring friction
(tool errors, repeated retries, denied permissions, missing capabilities) and,
if it finds something concrete, asks permission to open a PR.
That PR contains only a single markdown file under improvements/
describing the proposed change — never code. This is enforced structurally (the
PR creator only ever stages one regex-validated markdown path), so a
prompt-injected session cannot smuggle code into a PR. A maintainer then reviews
the proposal and implements it separately. See
Self-improvement above for configuration.
License
Proprietary — All Rights Reserved. See LICENSE.
You are granted a limited, revocable license to install and use tiny-code
(e.g. via npm install) for your own internal or personal use. You may not
fork, copy, modify, redistribute, resell, or offer it as a hosted service. It is
free during the beta period; future versions may be offered under subscription
or other paid terms.
