kimiflare
v0.29.1
Published
Kimiflare — a terminal coding agent powered by Kimi-K2.6 on Cloudflare Workers AI. Image understanding, plan mode, MCP server integration, 262k context.
Maintainers
Readme
💸 Heads up — this runs on your Cloudflare account. We recommend setting a budget cap on Workers AI and checking your Cloudflare billing regularly while using KimiFlare.
🚀 Stay up to date. Newer versions are significantly more token-efficient and cheaper to run. Run
/updateinside KimiFlare ornpm update -g kimiflareto get the latest release.
Stay in the loop
Kimiflare is shipping quickly.
Get:
- release notes,
- technical write-ups,
- early experimental features,
- architecture notes on building coding agents on Cloudflare.
→ Get updates: https://kimiflare.com
Shipping fast
Recently shipped:
- Cloudflare Code Mode support
- Local agent memory
- Major token cost reductions (70–90% lower)
- Better session compaction
Coming next:
- OpenCode parity improvements
- Cost attribution dashboard
- Cloudflare Artifacts experiments
Full changelog and notes at https://kimiflare.com
Why kimiflare
- 262k context window — Read entire modules, large configs, and full stack traces without the model losing track.
- Image understanding — Drop image paths into your prompt (PNG, JPG, WebP, GIF, BMP). The model sees them inline — great for UI reviews, diagrams, screenshots, and mockups.
- Direct by default — No proxy, no OpenAI SDK. Your traffic goes straight to Workers AI from your account, with optional AI Gateway routing for user-owned logging, caching, and analytics.
- Plan mode — Ask the agent to research and produce a plan without touching your filesystem. Review it, then exit plan mode to execute.
Quick start
npm install -g kimiflare
kimiflareOn first run, an interactive onboarding wizard asks for your Cloudflare Account ID and API Token. That's it — you're ready.
Or run without installing:
npx kimiflareRequires Node.js ≥ 20.
For release notes and rapid feature drops: https://kimiflare.com
Features
| Feature | What it does |
|---------|-------------|
| Plan / Edit / Auto modes | plan blocks all mutating tools for safe research. edit (default) prompts per mutating call. auto approves everything for trusted tasks. |
| Live task panel | For multi-step work, the agent publishes a task list with progress icons (■ active, ☐ pending, ✓ done), elapsed time, and token deltas. |
| 14 terminal themes | dark, light, high-contrast, dracula, nord, one-dark, monokai, solarized-dark/light, tokyo-night, gruvbox-dark/light, catppuccin-mocha, rose-pine. Interactive picker with live preview (Ctrl+T). |
| Paste collapse | Large pastes (≥200 chars or ≥2 newlines) collapse to [pasted N lines #id]. Full content still goes to the model — scrollback stays clean. |
| Type-ahead queue | Type your next prompt while the model is still working. Queued prompts show as ⏳ … and fire in order. Ctrl-C aborts current + clears queue. |
| Auto-compaction | At ~80% context usage, kimiflare nudges you to run /compact. It summarizes older turns into a dense summary, keeping the last 4 turns intact. |
| Streaming reasoning | Toggle the model's chain-of-thought with /reasoning or Ctrl-R. See how it thinks in real time. |
| Image understanding | Drop image paths (PNG, JPG, WebP, GIF, BMP up to 5 MB) into any prompt. The model sees them inline — perfect for UI reviews, diagrams, and screenshots. |
| Live cost tracking | Status bar shows real-time cost based on Cloudflare pricing: $0.95/M input, $0.16/M cached, $4.00/M output. |
| Optional AI Gateway | Route Workers AI traffic through your own Cloudflare AI Gateway for request logs, cache status, and analytics while keeping your API token local. |
| Session persistence | Every turn is auto-saved. /resume lists past sessions (with message counts) in a paginated picker. |
| Smart permissions | Bash session-allow is keyed by the first token (e.g., allow all git commands). Write/edit show a unified diff before you approve. |
| Project context (/init) | Scans your repo and writes a concise KIMI.md — build commands, layout, conventions. Auto-loaded on every launch. |
| MCP server integration | Plug in external tools via the Model Context Protocol — local stdio servers or remote SSE endpoints. GitHub, Sentry, docs search, databases, etc. |
| Co-author auto-append | Detects git commit commands and auto-injects Co-authored-by: kimiflare <[email protected]>. |
| Local structured memory | SQLite + embeddings cross-session memory. Extracts facts, instructions, and preferences at compaction time; recalls them via hybrid search (FTS5 + vector + exact) in future sessions. Team-shareable via .kimiflare/memory.db. |
| Resilient transport | Retries Cloudflare capacity errors (code 3040) and 5xx with exponential backoff up to 5 attempts. |
Configure
Get credentials from Cloudflare:
- https://dash.cloudflare.com → your account → copy Account ID.
- https://dash.cloudflare.com/profile/api-tokens → Create Token → Custom token with Account › Workers AI › Read on your account → Create → copy.
Then either export them each shell:
export CLOUDFLARE_ACCOUNT_ID=...
export CLOUDFLARE_API_TOKEN=...
# Optional: route through a Cloudflare AI Gateway you own
export KIMIFLARE_AI_GATEWAY_ID=...
# Optional: enable local structured memory
export KIMIFLARE_MEMORY_ENABLED=1
export KIMIFLARE_MEMORY_DB_PATH=.kimiflare/memory.db
export KIMIFLARE_MEMORY_MAX_AGE_DAYS=90
export KIMIFLARE_MEMORY_MAX_ENTRIES=1000or save them once (chmod 600 automatically):
mkdir -p ~/.config/kimiflare
cat > ~/.config/kimiflare/config.json <<'EOF'
{
"accountId": "YOUR_ACCOUNT_ID",
"apiToken": "YOUR_API_TOKEN",
"model": "@cf/moonshotai/kimi-k2.6",
"aiGatewayId": "YOUR_GATEWAY_NAME"
}
EOF
chmod 600 ~/.config/kimiflare/config.jsonOptional AI Gateway
kimiflare talks directly to Workers AI unless aiGatewayId is configured. When set, chat completions are sent to Cloudflare's native Workers AI Gateway endpoint:
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/{model_id}Create a gateway in the Cloudflare dashboard under AI > AI Gateway, then set aiGatewayId in ~/.config/kimiflare/config.json or export KIMIFLARE_AI_GATEWAY_ID. The same Workers AI API token stays on your machine and is sent to Cloudflare.
Optional per-request controls:
{
"aiGatewayCacheTtl": 3600,
"aiGatewaySkipCache": false,
"aiGatewayCollectLogPayload": false,
"aiGatewayMetadata": { "tool": "kimiflare" }
}cf-aig-cache-status from AI Gateway is shown separately from Workers AI prompt-token caching (cached_tokens). If you enable gateway logs, kimiflare records metadata such as log id, cache hit/miss, tokens, duration, and status when Cloudflare returns it; prompt and response bodies are not stored by kimiflare.
MCP servers (Model Context Protocol)
kimiflare supports external tools via MCP. Add servers to your ~/.config/kimiflare/config.json:
{
"accountId": "YOUR_ACCOUNT_ID",
"apiToken": "YOUR_API_TOKEN",
"mcpServers": {
"github": {
"type": "local",
"command": ["npx", "-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_xxx" }
},
"fetch": {
"type": "local",
"command": ["uvx", "mcp-server-fetch"]
},
"my-remote": {
"type": "remote",
"url": "https://example.com/mcp",
"headers": { "Authorization": "Bearer token123" }
}
}
}type:"local"(stdio subprocess) or"remote"(SSE/HTTP endpoint)command: array with executable and args (local only)url: endpoint URL (remote only)env: environment variables for local serversheaders: HTTP headers for remote serversenabled: set tofalseto skip a server
MCP tools appear prefixed as mcp_<server>_<tool> alongside built-in tools.
Commands:
/mcp list— show connected servers and tool counts/mcp reload— disconnect and reconnect all configured servers
Local structured memory
kimiflare can remember facts, instructions, and preferences across sessions using a local SQLite database with vector search.
How it works:
- At compaction time, the agent extracts structured memories from the conversation
- Memories are stored with embeddings (
@cf/baai/bge-base-en-v1.5) in a local SQLite database - On future sessions, relevant memories are recalled via hybrid search (FTS5 full-text + vector similarity + exact file-path matching)
- Supports team-shared memory:
.kimiflare/memory.dbin your repo root (add to.gitignore)
Enable:
export KIMIFLARE_MEMORY_ENABLED=1Or in ~/.config/kimiflare/config.json:
{
"memoryEnabled": true,
"memoryDbPath": ".kimiflare/memory.db",
"memoryMaxAgeDays": 90,
"memoryMaxEntries": 1000,
"memoryEmbeddingModel": "@cf/baai/bge-base-en-v1.5"
}Commands:
/memory— show memory stats (total count, DB size, by category)/memory search <query>— manual hybrid search over stored memories/memory clear— wipe all memories for the current repo
Storage & cleanup:
- Default retention: 90 days, 1000 memories per repo
- Automatic deduplication of near-identical memories
- Cleanup runs on startup and after every compaction
- Typical size: ~4–5 KB per memory; ~15 MB/month under heavy use
Usage
Interactive TUI
kimiflare # launch TUI
kimiflare --model @cf/moonshotai/kimi-k2.6 # override modelPrint mode (one-shot, non-interactive)
kimiflare -p "summarize PLAN.md" # stream answer to stdout
kimiflare -p "..." --dangerously-allow-all # auto-approve mutating tools (for scripts)
kimiflare -p "..." --reasoning # include chain-of-thought in stderrImage understanding
Reference image files directly in your prompt — the model sees them inline:
kimiflare
› fix the layout bug in this screenshot docs/bug.png
› convert this mockup design.png to Tailwind HTML
› explain this architecture diagram.pngSupported formats: PNG, JPG, JPEG, WebP, GIF, BMP (up to 5 MB each, 10 per message).
CLI flags
| Flag | Short | Description |
|------|-------|-------------|
| --print <prompt> | -p | One-shot mode: send prompt, stream reply, exit |
| --model <id> | -m | Model ID (default: @cf/moonshotai/kimi-k2.6) |
| --dangerously-allow-all | — | Auto-approve every permission prompt (print mode only) |
| --reasoning | — | Stream chain-of-thought to stderr (print mode only) |
| --version | -V | Show version |
| --help | -h | Show help |
Slash commands
| Command | Effect |
|---------|--------|
| /mode edit\|plan\|auto | Switch mode. edit prompts for permission (default), plan is read-only research, auto auto-approves every tool call. |
| /plan /auto /edit | Shortcuts for the three modes. |
| /thinking low\|medium\|high | Reasoning effort. low = fastest, shallow; medium = balanced (default); high = deepest, slowest. Saved to config. |
| /theme | Interactive theme picker with live preview (Ctrl+T). Saved to config. |
| /theme NAME | Set theme by name directly. |
| /resume | Pick a past conversation to restore. |
| /compact | Summarize older turns to free context. Suggested automatically at ~80% full. Extracts memories if memory is enabled. |
| /init | Scan the repo and write a KIMI.md so future agents have project context. |
| /memory | Show memory stats (total count, DB size, by category). |
| /memory search <query> | Search stored memories manually. |
| /memory clear | Wipe all memories for the current repo. |
| /mcp list | List connected MCP servers and their tools. |
| /mcp reload | Disconnect and reconnect all configured MCP servers. |
| /reasoning | Toggle chain-of-thought display. |
| /clear | Reset the current conversation. |
| /cost | Show token usage for the current turn. |
| /model | Show current model. |
| /update | Check for updates manually. |
| /logout | Clear saved credentials. |
| /help | List all commands. |
| /exit | Quit. |
Keyboard shortcuts
Global
| Shortcut | Action |
|----------|--------|
| Ctrl+C / Esc | Interrupt current turn when busy; exit when idle |
| Ctrl+R | Toggle reasoning display |
| Ctrl+O | Toggle verbose tool output |
| Ctrl+T | Open theme picker |
| Shift+Tab | Cycle mode (edit → plan → auto) |
| ↑ / ↓ | Walk prompt history |
Editing (macOS / Linux)
| Shortcut | Action |
|----------|--------|
| ⌥← / ⌥→ | Jump word left/right |
| ⌘← / ⌘→ | Jump to start / end of line |
| ⌥⌫ | Delete word backward |
| ⌘⌫ | Delete to start of line |
| ⌥⌦ | Delete word forward |
| Ctrl+A / Ctrl+E | Start / end of line |
| Ctrl+W / Ctrl+U / Ctrl+K | Delete word backward / to start / to end of line |
Modes
- edit — default. The agent calls tools freely for read-only work; mutating tools (
write,edit,bash) pause for your approval. - plan — read-only. Mutating tools are hard-blocked. Ask "plan a refactor" and the agent will investigate and produce a plan without touching the filesystem. Exit plan mode to execute.
- auto — autonomous. Every tool call is auto-approved. Use for trusted, well-scoped tasks.
Thinking level (quality vs speed)
Kimi-K2.6 always reasons, but you can cap the effort:
- low — fastest. Best for chat, small edits, running commands.
- medium — balanced (default). Solid reasoning on real edits without the latency of deep thinking on trivial prompts.
- high — deepest. Best for multi-file refactors, subtle bugs, architectural decisions.
Set with /thinking medium (persists), or per-launch via KIMI_REASONING_EFFORT=high.
Tools
All tool calls show inline; mutating ones require per-call approval the first time, with an option to allow for the rest of the session.
| Tool | Permission | What it does |
|------|------------|--------------|
| read | auto | Read a text file (≤ 2MB) with optional line range. |
| write | prompt | Create or overwrite a file. Shows a unified diff before you approve. |
| edit | prompt | Replace an exact substring. Fails unless old_string is unique (or replace_all=true). |
| bash | prompt | Run a shell command via bash -lc. Session-allow is keyed by the first token of the command. |
| glob | auto | Match files by pattern (**/*.ts), sorted by mtime. |
| grep | auto | Regex search. Uses rg if installed; falls back to a JS walk. |
| web_fetch | auto | Fetch a URL, convert HTML → markdown (≤ 100KB). |
| tasks_set | auto | Publish a live task list for multi-step work. |
How it works
┌───────────────────────────────────────────────────────────┐
│ kimiflare (Node.js TUI) │
user ─▶ │ │
│ user msg ─▶ agent loop ─▶ runKimi() ──[POST SSE]──▶ │
│ ▲ │
│ │ │
│ tool result ◀──tool executor──◀ tool_calls │
│ (permission modal for write / edit / bash) │
└───────────────────────────────────────────────────────────┘
│
▼
api.cloudflare.com/client/v4
/accounts/{ID}/ai/run/
@cf/moonshotai/kimi-k2.6Direct fetch to Workers AI by default, or the native Workers AI AI Gateway endpoint when aiGatewayId is configured. The payload remains OpenAI-compatible messages + tools, with an SSE stream containing reasoning + content + tool-call deltas accumulated by index.
Development
git clone https://github.com/sinameraji/kimiflare
cd kimiflare
npm install
npm run build
npm link # or: ln -s "$PWD/bin/kimiflare.mjs" ~/.local/bin/kimiflareScripts:
npm run build— bundle with tsup (dist/+bin/kimiflare.mjs)npm run dev— run via tsx (tsx src/index.tsx)npm run typecheck—tsc --noEmitnpm start— run compiled bin
Contributing
Contributions are welcome!
- Fork the repository
- Create a branch:
git checkout -b feat/your-feature - Make your changes
- Run
npm run typecheckandnpm run build - Commit:
git commit -m "feat: description" - Push:
git push origin feat/your-feature - Open a Pull Request
Testing MCP locally
You don't need a real MCP server to test the integration. Here's a minimal test server you can save as test-mcp-server.js:
// test-mcp-server.js — a minimal MCP server for testing
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server({ name: "test-server", version: "1.0.0" }, { capabilities: { tools: {} } });
server.setRequestHandler("tools/list", async () => ({
tools: [
{
name: "greet",
description: "Greet someone by name",
inputSchema: {
type: "object",
properties: { name: { type: "string" } },
required: ["name"],
},
},
{
name: "add",
description: "Add two numbers",
inputSchema: {
type: "object",
properties: { a: { type: "number" }, b: { type: "number" } },
required: ["a", "b"],
},
},
],
}));
server.setRequestHandler("tools/call", async (req) => {
if (req.params.name === "greet") {
return { content: [{ type: "text", text: `Hello, ${req.params.arguments.name}!` }] };
}
if (req.params.name === "add") {
const sum = req.params.arguments.a + req.params.arguments.b;
return { content: [{ type: "text", text: String(sum) }] };
}
throw new Error("Unknown tool");
});
const transport = new StdioServerTransport();
await server.connect(transport);Then add it to your config:
{
"mcpServers": {
"test": {
"type": "local",
"command": ["node", "/path/to/test-mcp-server.js"]
}
}
}Launch kimiflare and try:
/mcp list— should showtest (local) — 2 toolsuse mcp_test_greet with name "kimiflare"— should returnHello, kimiflare!use mcp_test_add with a 3 and b 5— should return8
For a real-world test, try the official GitHub MCP server:
{
"mcpServers": {
"github": {
"type": "local",
"command": ["npx", "-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_xxx" }
}
}
}Then ask: search for issues labeled bug in sinameraji/kimiflare
Credits
- Cloudflare Agent Memory — This feature was inspired by Cloudflare's Agent Memory announcement. While Cloudflare's managed service requires a platform binding, kimiflare implements a local self-hosted equivalent using SQLite + Workers AI embeddings so you can use it today with your own account.
License
MIT © Sina Meraji
