@matheuskrumenauer/tanya
v0.17.12
Published
A live, tool-using AI CLI for DeepSeek and OpenAI-compatible providers.
Maintainers
Readme
Tanya
A Claude-Code-style coding agent that actually works with DeepSeek.
Existing tools (Cursor, Claude Code, and Chinese-native CLIs) produce malformed tool calls, dropped schemas, and silent failures on DeepSeek. Tanya is built specifically to handle DeepSeek's quirks - permissive tool-call parsing, retry-with-correction, schema flattening, reasoning-model support - without compromising the deterministic verifier that catches hallucinations cheap models would otherwise sneak past you.
Works with: DeepSeek (primary), Qwen, Grok, Groq, Ollama, and any OpenAI-compatible endpoint.
Why this exists
I have a PhD in AI and I use DeepSeek every day. Every coding-agent CLI I tried either broke tool calls, silently dropped schema details, or made verification feel like an afterthought. I built Tanya so I could actually work with DeepSeek and still have a verifier watching what the model changed.
Install
npm i -g @matheuskrumenauer/tanya
export DEEPSEEK_API_KEY=sk-...
tanyaLocal development:
npm install
npm run link:local
tanyaFrom GitHub, once published:
npm install -g github:matheusjkweber/tanyaFrom npm, once published:
npm install -g @matheuskrumenauer/tanyaThe unscoped tanya name is taken on npm, so the package publishes under the @matheuskrumenauer scope.
Docker/container installs that cannot infer platform metadata may need npm platform flags for Tanya's image tooling dependency:
npm install -g --os=linux --cpu=arm64 --libc=glibc @matheuskrumenauer/tanyaUse --cpu=amd64 on x64 containers. Tracking issue:
https://github.com/matheusjkweber/tanya/issues/9.
Quick start
tanya ask "explain this repo"
tanya run --verify "npm test" "fix the failing test"
tanya providers test --provider deepseekWhat makes it work with DeepSeek
- Permissive tool-call parsing recovers missing IDs, stringified arguments, missing wrappers, and other almost-OpenAI-compatible responses before a run falls over.
- Retry-with-correction turns malformed tool calls into explicit repair prompts instead of silent no-ops.
- Schema flattening keeps narrow providers from rejecting tool definitions with
$reforoneOfshapes. - Reasoning-model support separates
deepseek-reasonerthinking from final answers, archives it, and tracks reasoning tokens in cost reports. - The verifier checks changed files, expected artifacts, validation output, and blockers after the model acts, so cheap-model drift has to pass deterministic review.
- Defaults to
deepseek-v4-proand tracks DeepSeek's API roadmap; legacy aliases still work but warn before their scheduled deprecation.
Contributing
Start with CONTRIBUTING.md for local setup, tool and skill-pack conventions, tests, and PR expectations.
Beginner-friendly tasks are tagged
good first issue.
For roadmap context, read docs/claude-code-gap-analysis.md.
Configuration
Create .env from .env.example:
DEEPSEEK_API_KEY=...
DEEPSEEK_BASE_URL=https://api.deepseek.com
TANYA_MODEL=deepseek-v4-proUse the reasoner profile for harder coding/planning tasks:
TANYA_PROFILE=reasonerOr pass it per command:
tanya run --profile reasoner "Plan this refactor"
tanya chat --profile reasonerCustom OpenAI-compatible provider:
TANYA_PROVIDER=custom
TANYA_API_KEY=...
TANYA_BASE_URL=https://provider.example.com
TANYA_MODEL=provider-model-nameOptional Obsidian logging:
TANYA_OBSIDIAN_VAULT=/path/to/Obsidian/VaultWhen set, Tanya appends a summary of completed tasks to the vault daily note. tanya run also searches the vault for task-relevant notes and materializes safe excerpts into .tania/context/obsidian so they can be read as normal workspace context.
DeepSeek documents its API as OpenAI-compatible for chat completions: https://api-docs.deepseek.com/
- Tracks the DeepSeek API roadmap: warns when legacy model names approach
deprecation, with a documented migration path in
docs/providers.md.
Backward compatibility
The old tania command remains as a binary alias for tanya, so existing
automation that runs tania run --json keeps working.
Configuration reads TANYA_* variables first and falls back to legacy
TANIA_* names when the new variable is absent. When only a legacy variable is
used, Tanya prints a one-line deprecation warning. If both names are set,
TANYA_* wins.
The workspace state directory remains .tania/ for historical compatibility.
Existing run logs, context files, artifact materialization, and memory files are
not moved or renamed.
Permissions
Tanya has an opt-in pre-execution permission layer for native tools and
project-local slash commands. The default mode in v0.x is bypass so existing
automation keeps full access until a workspace opts in.
Modes:
bypassskips gating and logs decisions for audit.defaultapplies configured rules; unmatched calls are allowed.askapplies configured rules; unmatched calls prompt the host.plandenies all tool execution so the model must respond with text only.
Rules live in ~/.tanya/permissions.json for user scope and
.tania/permissions.json for project scope. Project rules merge over user
rules. A minimal deny rule:
{
"version": 1,
"mode": "default",
"alwaysDeny": ["run_shell:.*rm -rf.*"]
}Generate a starter config from recent runs:
tanya permissions migrate --cwd . > .tania/permissions.suggested.jsonSpend rules can gate projected token or USD budgets before a tool runs. For
example, /cost --enforce --max-usd 0.50 writes a session-scoped rule.
See docs/permissions.md for the full schema, precedence, audit log, and worked examples.
Commands
tanya # live chat
tanya chat --profile reasoner # live chat with the reasoner profile
tanya ask "Explain this" # one-shot answer
tanya run "Fix the test" # agent task with tools
tanya run --profile reasoner "Fix this bug" # run with TANYA_PROFILE=reasoner
tanya run --verify "npm run typecheck" --verify "npm run build" "Fix the test"
tanya run --no-auto-brief "Fix the test" # skip deterministic project/artifact brief
tanya run --no-obsidian-context "Fix the test" # skip Obsidian context retrieval
tanya run --retries 2 "Fix this task" # retry blocked runs with context carry-forward
tanya run --plan --retries 2 "Implement the feature" # reasoner plan plus retries
tanya run --no-post-check "Long native build task" # skip independent typecheck/test re-checks
tanya run --json "Fix lint" # JSONL events for machine consumers
tanya run --context-file ./context.json --prompt-file ./prompt.md
tanya benchmark profiles # list runnable regression benchmark profiles
tanya benchmark run --all # execute the benchmark suite locally
tanya benchmark validate # validate recent benchmark signatures
tanya runs # show recent run logs with cost/status
tanya video presets # list available video presets
tanya video one-terminal-simctl # generate the exact transparent terminal asset
tanya providers test # provider smoke test
tanya mcp serve # expose Tanya verifier/run/skills over MCP stdio
tanya doctor # local environment checkSlash commands
Interactive chat accepts built-in slash commands without sending them to the model:
/clear # reset only the active conversation history
/skills # list matched skill packs and token cost
/verify # print the deterministic verifier report for the cwd
/cost # show persisted token usage and estimated cost
/memory --limit 5 # list recent golden-task memory
/mcp # list connected MCP servers and toolsProject-local commands live in .tania/commands/*.{js,ts,sh} and appear in
/help with a project: prefix, for example /project:say-hi. Shell commands
run directly; JavaScript and TypeScript commands export a default
CommandDefinition.
Project-local commands are arbitrary code execution and are gated by the same permission engine as native tools.
Sub-agent tool
The task tool delegates a bounded child run while keeping the parent in
control:
{
"prompt": "Map the auth module and report blockers.",
"workspace": "src/auth",
"max_turns": 12,
"token_budget": { "max_tokens": 12000 },
"treat_failure_as": "warning"
}Children inherit the parent's skill packs, permission rules, workspace, and
budget. They may narrow those constraints but cannot loosen them. Depth is
capped at 2 by default (TANYA_SUBTASK_MAX_DEPTH), and active children share a
default parallel cap of 3 (TANYA_SUBTASK_MAX_PARALLEL).
Every child runs its own verifier. Failed child verdicts become parent blockers
by default; treat_failure_as can demote a specific child to warning or
ignore when the caller wants advisory work only. Child events stream into the
parent log with a subRunId, and parent cancellation propagates into active
children.
See docs/sub-agents.md for permission inheritance, budget-ledger semantics, cancellation, verifier composition, and memory rollup.
MCP integration
Tanya can consume external Model Context Protocol servers and expose Tanya's own verifier and memory primitives to MCP-speaking clients.
Client configuration is allowlist-only. User-global servers are read from
~/.tanya/mcp.json with a fallback read of ~/.tania/mcp.json; project servers
live in .tania/mcp.json and override same-named user servers. Connected tools
are registered as normal Tanya tools named mcp:<server>:<tool>, so permission
rules, audit logging, truncation, and verifier visibility apply exactly as they
do for native tools.
{
"version": 1,
"servers": [
{
"name": "filesystem",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
}
]
}Use /mcp in the REPL to inspect connected servers. Use tanya mcp serve to
start Tanya's MCP server over stdio; it exposes tanya.verify,
tanya.golden_task_search, tanya.run, and tanya.skills_list.
MCP servers are untrusted code. Tanya refuses undeclared servers, gates every
MCP tool call through the permission engine, captures stdio server stderr under
.tania/mcp/logs/, restarts crashed servers up to three times, and rejects
schema-invalid tool responses before they reach model history.
See docs/mcp.md for the full schema, transports, server tools, and security model.
Multi-model routing
Tanya can route each agent step to a different provider/model. Planning and simple tool-call turns can use cheap chat models, while synthesis, verification, and reasoning turns can use stronger models only when needed.
Default route profile:
| Step | Route | Fallback |
| --- | --- | --- |
| planning | deepseek/deepseek-chat | qwen/qwen3-coder-plus |
| tool_call | deepseek/deepseek-chat | groq/llama-3.3-70b-versatile |
| synthesis | deepseek/deepseek-reasoner | openai/gpt-4.1-mini |
| verification | deepseek/deepseek-reasoner | openai/gpt-4.1-mini |
| reasoning | deepseek/deepseek-reasoner | qwen/qwen3-coder-plus |
Project routes live in .tania/routes.json; user-global routes live in
~/.tanya/routes.json with a legacy read fallback from ~/.tania/routes.json.
Use /route in the REPL to inspect the effective table, /route show
<stepType> to inspect one step, /route set <stepType> <provider>/<model> for
a session-only patch, and /route reset to clear session patches.
Escalations are visible: if a cheap route exhausts the malformed tool-call
repair budget, Tanya emits escalation_event and uses the route fallback once,
up to TANYA_ESCALATION_CAP per session.
Per-turn reasoning budgets fall back to TANYA_REASONING_CAP_SHORT (default
2000) and TANYA_REASONING_CAP_LONG (default 8000) when a route pins no
reasoningCap of its own.
See docs/routing.md for schema, examples, context-window guards, per-tool model overrides, sub-agent model pins, and reasoning budgets.
Live status
Interactive tanya chat sessions show a compact status footer derived from the
same events already sent to the human sink:
[deepseek:deepseek-chat | tool_call | $0.04 | 2 tools | 1 child]
[awaiting permission: run_shell]
[escalated deepseek:deepseek-chat->openai:gpt-4.1-mini: parse_failure]The footer is TTY-only. Piped output and JSONL output stay byte-stable and
receive no ANSI cursor control bytes. Disable it with
TANYA_LIVE_STATUS=0 or the legacy TANIA_LIVE_STATUS=0 alias.
See docs/live-status.md for the surfaced fields, streaming strategy, and TTY fallback behavior.
Reasoning models
Reasoning routes such as deepseek-reasoner, qwen3-thinking-*, and
grok-3-reasoning are handled as a separate stream. Tanya archives reasoning to
.tania/runs/<runId>/reasoning.jsonl, emits reasoning_chunk events, and keeps
assistant history reasoning-free so replay and verifier inputs stay stable.
Reasoning tokens appear separately in /cost and /budget. Route rules can set
reasoningCap.maxTokens; built-in defaults are 2k for planning-like turns and
8k for synthesis/verification/reasoning turns. If the cap is exceeded, Tanya
emits reasoning_truncated and asks the model to finish.
Use /memory --reasoning <runId> to inspect archived reasoning. Use
TANYA_HIDE_REASONING=1 to hide reasoning from the human UI while preserving
JSONL events. Verifier reasoning annotations are off by default; enable
them with --verbose-verifier or TANYA_VERIFIER_INCLUDE_REASONING=1.
See docs/reasoning.md for provider notes, billing math, budget defaults, and UX modes.
--verify adds required verification commands to the run context. Tanya must run and report each exact command before finishing the coding task.
tanya benchmark run --all currently exercises 23 executable low-to-medium regression fixtures: targeted edits, new files, dependency/lockfile updates, framework-style migrations, failing-test repair, frontend smoke checks, artifact/context reuse, streaming long-tool execution, compaction-boundary recovery, run-history logging, dirty worktrees, and report repair.
By default, tanya run also performs an independent post-check after the agent finishes. If the workspace has a typecheck script, Tanya reruns that exact script with the local package manager (npm, pnpm, yarn, or bun). If not, it falls back to npx tsc --noEmit --pretty false when a tsconfig is present. If the workspace has a test script, Tanya reruns that as well unless the run already reported a passing test verification.
By default, tanya run builds a generic task brief from local instructions, contracts, artifact indexes, project shape, and package scripts. Coding-shaped tasks get verification/report expectations automatically. If reusable artifact candidates are found, Tanya must read a relevant artifact or create a reusable one before changing code.
Per-project persistent instructions can be stored in .tania/INSTRUCTIONS.md. Tanya injects this file into the system prompt for runs started inside that workspace. Create a starter file with:
tanya init
tanya init --cwd /path/to/projectTool Visibility
Human mode shows tools as they run:
> search
input: {"query":"describe("}
ok: Found 3 match lines.JSON mode emits machine-readable events:
{"type":"tool_call","id":"call_1","tool":"search","input":{"query":"describe("}}
{"type":"tool_result","id":"call_1","tool":"search","ok":true,"summary":"Found 3 match lines."}Streaming tool execution
Long-running run_shell calls stream throttled stdout/stderr chunks to the active event sink while the model only receives the final tool_result.
tanya run "Run npm test" # emits tool_progress while the command runs; Ctrl-C cancels the active shell and returns partial_outputLong sessions
Tanya handles context pressure as a cascade instead of truncating abruptly:
- Microcompact folds empty/no-op tool-call pairs in place.
- Snip removes low-signal history such as duplicate file reads and empty read-only tool results.
- Auto-compact reacts to provider
413/ context-window errors by summarizing older turns into a[compaction summary: ...]system message and retrying once normally, then once more aggressively. - Archive writes compacted messages to
.tania/runs/<runId>/archive.jsonlbefore they leave live history, so verifier scans and future memory tools can still inspect them.
Runs are capped at three total auto-compactions. If the provider still rejects the context, Tanya raises CompactionExhaustedError and asks the user to narrow the task, clear the session, or split the work.
See docs/long-sessions.md for details.
Token economy
Tanya trims model-visible tokens while keeping state reversible and auditable.
- Lite prompts can be enabled with
TANYA_LITE_PROMPT=1for cheap-provider exploration turns. The legacyTANIA_LITE_PROMPTalias is still accepted. - System prompts are automatically capped to the active provider context window. Tune the default 25% cap with
TANYA_PROMPT_BUDGET_RATIO. - Large shell/tool outputs are shortened for the model with a visible
<truncated ...>marker. Useexpand_resultwith the marker'stool_call_idto fetch the full output or a byte range. - Repeated unchanged
read_filecalls return a reference marker instead of resending the same content. Passforce: truewhen the agent genuinely needs the full file again. /budgetreports token usage, cost estimates, expensive turns, and one deterministic optimization suggestion./budget --enforce --max-usd <amount>persists a session spend rule through the permission engine.
See docs/token-economy.md for the full model, cache locations, and tool-definition knobs.
Benchmarks
Tanya includes an eval harness for verifier-stress suites, SWE-bench-Lite
adapters, integration-provided suites, and the eco-30 token-economy bench.
tanya eval --suite tanya-native --dry-run
tanya eval --suite tanya-native --out .tania/eval/results/tanya-native.json
tanya eval report .tania/eval/results/tanya-native.json
tanya eval compare docs/benchmarks/tanya-native-latest.json .tania/eval/results/tanya-native.json --format markdownPublic snapshots live in docs/benchmarks. The eval result schema and determinism contract are documented in docs/eval-format.md.
eco-30 is the token-economy suite. Its reports include total cost, cost per
pass, tokens per pass, reasoning share, and cost-regression checks. The
verifier-self-test suite is the verifier moat regression net: known-correct
and known-incorrect artifacts where the expected outcome is the verifier's
classification, not the model's output.
Edit blocks
edit_block applies bounded search/replace edits without falling back to a
full-file rewrite:
{
"path": "src/example.ts",
"search": "const state = \"pending\";",
"replace": "const state = \"complete\";",
"expectedCount": 1,
"matchPolicy": "exact"
}Exact mode is the default and fails closed when the block is missing, ambiguous,
or appears a different number of times than expected. Fuzzy mode is opt-in via
matchPolicy: "fuzzy" and requires an explicit M3 permission allow rule. Fuzzy
recovery only accepts whitespace-normalized or nearby-context candidates with
confidence >= 0.95; otherwise Tanya returns a structured error and asks the
model to re-read the file.
Successful edit blocks emit before/after hashes and a unified diff. Fuzzy successes also add candidate metadata to the audit log. The final verifier still reads the changed workspace independently; edit-block success is not a verifier pass.
See docs/edit-blocks.md for the full tool reference, permission model, confidence threshold, and failure modes.
Structural repo-map
Lite prompts can include a generated structural map from
.tania/index/repo-map.json. The map lists workspace-relative files, language,
parser provenance, top-level symbols, imports, and exports so cheap providers
can target likely files before spending turns on blind reads.
Tanya indexes TypeScript/JavaScript, Python, Go, Swift, and Kotlin with a lightweight ripgrep-style parser and falls back to path-only entries when file content cannot be read. Generated, binary, ignored, and oversized files are skipped. The repo-map is advisory context only: agents must still read files before editing, and the verifier remains the final authority.
Use TANYA_LITE_PROMPT=1 to inject a ranked repo-map excerpt. Tune the default
1000-token section budget with TANYA_REPO_MAP_PROMPT_BUDGET; the legacy
TANIA_* alias is also accepted. If the prompt budget is tight, the repo-map
drops before skill packs because it is generated and recoverable.
Use inspect_repo_map when the model needs more structural detail by file,
symbol, or language without burning prompt tokens on the whole map.
See docs/repo-map.md for schema, parser status, ranking, budget interaction, and cache invalidation.
Context files are generic JSON envelopes for caller-supplied task metadata, artifacts, instructions, and verification commands.
Current Tools
list_filesread_filesearchinspect_repo_mapinspect_project_contextfind_reusable_artifactsbuild_task_briefsearch_obsidian_noteswrite_fileapply_patchsearch_replaceedit_blockcopy_filecopy_dirapply_artifactcreate_ios_splashcreate_android_splashgenerate_app_iconscreate_android_foundationcommit_platform_changesresize_imagerender_svg_to_pngcreate_apple_app_icon_setcreate_android_launcher_icon_setvalidate_apple_app_icon_setvalidate_android_launcher_icon_setvalidate_api_contract_routesvalidate_android_project_configvalidate_apple_project_filesvalidate_fastlane_configvalidate_prisma_schemascan_secretsgenerate_video_assetrun_commandrun_shell
All file paths are constrained to the selected workspace.
Video Assets
Tanya can generate short compositable video assets locally with headless Chrome and ffmpeg:
tanya video one-terminal-simctl --output-dir assets/video --basename simctl-failThe one-terminal-simctl preset recreates the native 980x1012, 30fps, 3s transparent Terminal asset with failing xcrun simctl commands. terminal-simctl is kept as an alias.
To make variants, override terminal copy with repeated --line flags:
tanya video one-terminal-simctl \
--output-dir assets/video \
--basename install-failure \
--line '$ xcrun simctl install booted DemoApp.app' \
--line 'error: unable to find a booted simulator' \
--line '$ xcrun simctl io booted screenshot out.png' \
--line 'xcrun: error: selected device is not available'Outputs default to WebM VP9 alpha, ProRes 4444 MOV alpha, and a transparent poster PNG. Chrome/Chromium and ffmpeg must be installed; set TANYA_CHROME_PATH or TANYA_FFMPEG_PATH if Tanya cannot find them.
