kairolite
v0.1.3
Published
Lightweight local AI coding agent CLI — an autonomous terminal agent running Qwen2.5-Coder 3B fully offline on CPU (Linux + Windows, no GPU). OpenAI-compatible, no API keys.
Maintainers
Readme
kairolite
⚠️ First install downloads a ~2 GB AI model (Qwen2.5-Coder-3B) into the package. Runs on CPU — Linux or Windows x64, no GPU required.
A lightweight, CPU-only local coding-agent CLI — the cross-platform little sibling of agentkairo (the GPU/CUDA build). Same agent, same tools, no GPU required.
It runs Qwen/Qwen2.5-Coder-3B-Instruct-GGUF:Q4_K_M (Apache-2.0) through the
bundled tinyq4 engine. The ~2 GB model is downloaded once at install time.
Requirements
- Linux or Windows on x86-64 (macOS isn't supported yet).
- ~3 GB of free RAM for the 3B model. No GPU needed.
- Python 3 on your
PATH.
The engine runs on any 64-bit CPU. If your CPU has AVX2 + FMA (most CPUs since ~2013) it automatically uses a faster code path; otherwise it falls back to a portable scalar path — it never crashes on older hardware.
Got an NVIDIA GPU? Run
kairolite gpu— if it finds a capable card it'll point you to the GPU build (agentkairo), which is ~20× faster.
Install
npm install -g kairolitenpm install downloads the ~2 GB model once. If the download is interrupted,
the install still succeeds — it's fetched on first run, or on demand with
kairolite model.
Run
cd /path/to/project
kairoliteRunning kairolite auto-starts the local engine, waits for it to load, and
drops you into the agent. The server is shut down when you quit, freeing memory.
A note on speed: kairolite runs on CPU, so the first response waits while the
prompt is read (you'll see a live "reading prompt" progress indicator — it's
working, not frozen). Generation speed depends on your CPU; more cores + AVX2 =
faster. For heavy use on an NVIDIA machine, prefer the GPU build (/gpu).
One-shot / scripting
The model's answer goes to stdout; all status chrome goes to stderr, so redirects stay clean:
kairolite -p "summarize what this project does"
echo "explain src/main.py" | kairolite -p -
kairolite -p "add a docstring to utils.py" --yolo > result.txtWithout --yolo, write/bash tools are auto-declined in one-shot mode, so -p
is read-only by default.
Commands
kairolite # Start the agent (auto-starts the local engine)
kairolite -p "<text>" # Run one prompt non-interactively, then exit
kairolite model # Download the model on demand, or show its status
kairolite gpu # Scan for an NVIDIA GPU and link the faster GPU build
kairolite stop # Stop the local engine and free memory
kairolite serve # Run only the engine in the foreground
kairolite doctor # Show setup status and local server detection
kairolite --url <url> # Use an already-running OpenAI-compatible server
kairolite --yolo # Run write/bash tools without confirmationInside Kairo:
/yolo— toggle write/bash confirmations./tools— list available tools./model— show model status / re-download if needed./gpu— scan for an NVIDIA GPU and link the faster GPU build./save [file]— save the conversation to a markdown transcript./cwd— show the workspace directory./stats— toggle the per-reply token/speed line./clear— reset the chat./help— list commands./exit— quit.
⚠️ Disclaimer
Kairo is an autonomous coding agent: it can run shell commands, and read,
write, and overwrite files in your workspace. It is experimental software,
provided "AS IS" with no warranty. It may execute commands you did not intend,
make confident-but-wrong assumptions, or proceed after ambiguous input —
especially with --yolo, which disables all confirmations.
You are responsible for everything it does on your system. By using it you accept the full terms in DISCLAIMER.md. You pressed enter; the machine listened.
License
Apache-2.0. The bundled model is licensed separately — see MODEL_LICENSES.md.
