agentkairo
v0.2.2
Published
Local AI coding agent CLI — an autonomous terminal agent running Qwen2.5-Coder 14B fully offline on your NVIDIA GPU (CUDA). OpenAI-compatible, no API keys.
Maintainers
Readme
agentkairo
⚠️ First install downloads an 8.99 GB AI model (Qwen2.5-Coder-14B) during
npm install. Requires Linux x64 + an NVIDIA GPU (CUDA 12, ~12 GB VRAM). No GPU? Use kairolite (CPU, ~2 GB).
Kairo is a local coding-agent CLI copied out of the AI Workshop Tauri terminal agent and packaged for npm.
It talks to a local OpenAI-compatible chat server. The default setup target is
Qwen/Qwen2.5-Coder-14B-Instruct-GGUF:Q4_K_M, an Apache-2.0 Qwen Coder 14B
GGUF model served through the bundled tinyq4 server.
The npm package includes the tinyq4 inference engine but not the 8.99 GB
model file. The model is downloaded once at install time (during
npm install), so there is no manual setup step.
Requirements (read this first)
The bundled tinyq4 engine is Linux x86-64 + NVIDIA/CUDA only — there is no
CPU fallback and no Windows/macOS build yet. You need:
- Linux on x86-64 (Windows/macOS are not supported;
npm installwill refuse to install on them). - An NVIDIA GPU with ~12 GB VRAM (the 14B Q4 model uploads ~9.8 GB).
- The CUDA 12 runtime and a recent driver (the engine links
libcudart.so.12). - glibc ≥ 2.34 (Ubuntu 22.04+/Debian 12+; older LTS releases won't load it).
Run agentkairo doctor to check your machine. On an unsupported machine the
9 GB model download is skipped and Kairo tells you why instead of crashing.
If you already run your own OpenAI-compatible server (any OS), you can skip the
engine entirely and point Kairo at it: agentkairo --url http://host:port.
Install
npm install -g agentkaironpm install downloads the ~8.99 GB Qwen Coder model once into the package's
models/ directory. If that download is interrupted (e.g. a network drop), the
install still succeeds — the model is fetched automatically the first time you
run agentkairo, or you can fetch it on demand with agentkairo model.
Run
cd /path/to/project
agentkairoThat's it. Running agentkairo automatically starts the bundled tinyq4
model server, waits for it to load, and drops you into the agent. The server is
owned by that session and is shut down when you quit (freeing its VRAM).
While it works the model streams with a live "reading prompt" indicator, and
each reply ends with a dim stats line (142 tok · 1.8s · 79 tok/s). Press
Ctrl-C to cancel a running reply and return to the prompt; press it again at
an empty prompt to quit.
No external dependencies like llama.cpp are required. The only requirement is
Python 3 on your PATH.
One-shot / scripting
Run a single prompt and exit — handy for scripts and pipes. The model's answer goes to stdout; all status chrome goes to stderr, so redirects stay clean:
agentkairo -p "summarize what this project does"
echo "explain src/main.py" | agentkairo -p -
agentkairo -p "add a docstring to utils.py" --yolo > result.txtWithout --yolo, write/bash tools are auto-declined in one-shot mode, so -p
is read-only by default.
Commands
agentkairo # Start the agent (auto-starts the model server)
agentkairo -p "<text>" # Run one prompt non-interactively, then exit
agentkairo model # Download the model on demand, or show its status
agentkairo serve # Run only the model server in the foreground
agentkairo doctor # Show setup status and local server detection
agentkairo --url <url> # Use an already-running OpenAI-compatible server
agentkairo --yolo # Run write/bash tools without confirmation(kairo is kept as an alias for agentkairo.)
Inside Kairo:
/yolo— toggle write/bash confirmations./tools— list available tools./model— show model status / re-download if needed./save [file]— save the conversation to a markdown transcript./cwd— show the workspace directory./stats— toggle the per-reply token/speed line./clear— reset the chat./help— list commands./exit— quit.
⚠️ Disclaimer
Kairo is an autonomous coding agent: it can run shell commands, and read,
write, and overwrite files in your workspace. It is experimental software,
provided "AS IS" with no warranty. It may execute commands you did not intend,
make confident-but-wrong assumptions, or proceed after ambiguous input —
especially with --yolo, which disables all confirmations.
You are responsible for everything it does on your system. By using it you accept the full terms in DISCLAIMER.md. You pressed enter; the machine listened.
License
Apache-2.0. The bundled model is licensed separately — see MODEL_LICENSES.md.
