npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

little-coder

v1.8.1

Published

A pi-based coding agent optimized for small local language models. Reproduces the whitepaper's scaffold-model-fit adaptations as pi extensions.

Readme

little-coder — a coding agent for the laptop in front of you

little-coder

A coding agent tuned for small local models, built on top of pi.

The research story behind all this — why scaffold–model fit matters, how a 9.7 B Qwen beat frontier entries on Aider Polyglot, and what the load-bearing mechanisms actually do — is written up on Substack: Honey, I Shrunk the Coding Agent. Start there if you want the "why"; stay here for the "how".

How it relates to pi

pi is the minimal substrate — agent loop, multi-provider API, TUI, session tree, compaction, extension model. Four built-in tools (read / write / edit / bash) and a ~1000-token system prompt.

little-coder is pi + 20 extensions + 30 skill markdown files + a Python benchmark harness. It doesn't fork pi or shadow its CLI — pi is a plain dependency in package.json, and everything little-coder-specific lives under .pi/extensions/, skills/, and benchmarks/. The launcher runs pi with --no-extensions and wires in exactly the bundled set, so you add your own extension by dropping a directory into .pi/extensions/ (or passing little-coder -e /path/to/ext/index.ts at launch) and remove one of ours by deleting its directory. Note this also means a globally pi install'd package won't load inside little-coder — pi install registers into pi's settings, which --no-extensions skips.

If you've never used pi, it's useful to skim pi.dev first — the rest of this doc assumes pi's model of --agent-import-path, --mode rpc, and .pi/extensions/ auto-discovery.

Install

One-line install (Node.js 22.19+ required):

curl -fsSL https://raw.githubusercontent.com/itayinbarr/little-coder/main/install.sh | bash

Or with npm directly:

npm install -g little-coder

Or with bun:

bun add -g little-coder

That's the whole install. No clone, no npm install in a workspace, no PATH fiddling. little-coder is now on your PATH and works from any directory.

Note for bun add -g users. The launcher (bin/little-coder.mjs) is a Node.js script with #!/usr/bin/env node at the top, so Node ≥ 22.19 still has to be on your PATH for the binary to start — bun is fine for installing/updating the package, but the runtime is Node. If you want a fully node-less setup, replace the shebang in $(bun pm bin -g)/little-coder with #!/usr/bin/env bun.

Run

cd ~/your-project
little-coder --model llamacpp/qwen3.6-35b-a3b

This is the canonical setup little-coder is tuned for: a local llama.cpp server hosting Qwen3.6-35B-A3B. See Local model setup (optional) below for how to serve it.

Cloud models work the same way:

little-coder --model anthropic/claude-haiku-4-5
little-coder --model openai/gpt-4o-mini "What does this codebase do?"
little-coder --model ollama/qwen3.5             # local Ollama
little-coder --model lmstudio/local-model       # local LM Studio (whatever model you have loaded)
little-coder --list-models                      # see everything pi knows about

The agent uses the directory you launched it from as its working directory — Read / Write / Edit / Bash operate on your project, not on little-coder's install path.

For local providers (llama.cpp, Ollama, LM Studio) pi expects some value in the API-key env even though local servers ignore it:

export LLAMACPP_API_KEY=noop
export OLLAMA_API_KEY=noop
export LMSTUDIO_API_KEY=noop

LLAMACPP_BASE_URL, OLLAMA_BASE_URL, and LMSTUDIO_BASE_URL override the defaults (http://127.0.0.1:8888/v1, http://127.0.0.1:11434/v1, http://127.0.0.1:1234/v1).

For cloud providers, set the standard env (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.) and pi will discover it.

Local model setup (optional)

Skip this section if you're using a cloud model.

Option A — llama.cpp (fastest for local; supports Qwen3.6-35B-A3B MoE):

# One-time: build llama.cpp with CUDA (sm_XXX = your GPU arch; Blackwell = 120)
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=120 -DLLAMA_CURL=ON
cmake --build build --config Release -j

# Fetch the model GGUF and the matching vision projector.
# The mmproj (~900 MB) is what lets the model see attached screenshots.
pip install -U "huggingface_hub[cli]"
hf download unsloth/Qwen3.6-35B-A3B-GGUF Qwen3.6-35B-A3B-UD-Q4_K_M.gguf --local-dir ~/models
hf download unsloth/Qwen3.6-35B-A3B-GGUF mmproj-F16.gguf            --local-dir ~/models

# Serve it (MoE trick: experts in RAM, attention on GPU → 22 GB model on 8 GB VRAM)
build/bin/llama-server -m ~/models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \
   --mmproj ~/models/mmproj-F16.gguf \
   --host 127.0.0.1 --port 8888 --jinja \
   -c 16384 -ngl 99 --n-cpu-moe 999 --flash-attn on

If you only need text and want to skip the projector download, drop the second hf download line and the --mmproj flag — little-coder still works text-only, but the TUI's image attachment will be rejected by the server with a 4xx.

Option B — Ollama (simpler, but slower on MoE):

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3.5        # 9.7B — the paper's model
# or: ollama pull qwen3.6-35b-a3b

Option C — LM Studio (GUI; OpenAI-compatible server on port 1234):

  1. Install LM Studio and download a model (e.g. Qwen3.6 35B A3B GGUF).
  2. Open the Developer / Local Server tab, load the model, and click Start Server (default http://127.0.0.1:1234).
  3. Run little-coder:
    export LMSTUDIO_API_KEY=noop
    little-coder --model lmstudio/local-model
    The shipped lmstudio/local-model id routes to whatever model LM Studio currently has loaded — no extra config needed for the single-model case. If you serve on a non-default port, set LMSTUDIO_BASE_URL=http://127.0.0.1:<port>/v1. To target a specific model when you have several loaded, add an entry to ~/.config/little-coder/models.json (see Configuring models below).

Serving from another machine on your LAN. Each provider's *_BASE_URL env var accepts any host, not just 127.0.0.1, so you can run inference on a beefier box and connect from a laptop or another device on the same WiFi.

On the server (the box with the GPU):

  • llama.cpp: start llama-server with --host 0.0.0.0 (or your specific LAN interface) instead of 127.0.0.1. Everything else from Option A unchanged.
  • LM Studio: in the Server tab, enable Serve on local network so it binds 0.0.0.0:1234 instead of 127.0.0.1:1234.
  • Ollama: OLLAMA_HOST=0.0.0.0:11434 ollama serve (or set OLLAMA_HOST=0.0.0.0 in the user systemd unit).
  • If ufw / firewalld is active, allow your LAN subnet to the relevant port (e.g. sudo ufw allow from 192.168.0.0/16 to any port 8888 proto tcp).
  • Find the LAN IP with hostname -I (Linux) or ipconfig getifaddr en0 (macOS).

On the client (the machine running little-coder):

# Pick the env vars matching whichever provider is running on the server
export LLAMACPP_API_KEY=noop
export LLAMACPP_BASE_URL=http://<server-lan-ip>:8888/v1

# Sanity check reachability before launching the agent
curl -s http://<server-lan-ip>:8888/v1/models | head

little-coder --model llamacpp/qwen3.6-35b-a3b

The streaming chat-completions adapter works over a local network the same way it does over loopback — no client code change, no proxy needed. The per-model profile in .pi/settings.json (context/thinking-budget/temperature) still applies because it's keyed by <provider>/<model-id>, which the client picks regardless of where the server lives.

All small-model-specific extensions auto-disable for large/cloud models so they don't interfere.


Configuring models

The shipped model list lives in models.json at the package root. The llama-cpp-provider extension reads it at startup and registers each provider via pi's registerProvider(). Editing this file in your global install does take effect — but it's overwritten on npm install -g little-coder@latest, so for anything you want to keep, use a user override file instead.

User override resolution (first match wins):

  1. $LITTLE_CODER_MODELS_FILE — explicit path, useful for ad-hoc tests.
  2. $XDG_CONFIG_HOME/little-coder/models.json
  3. ~/.config/little-coder/models.json

Merge semantics: each top-level provider key in your override file fully replaces the same key in the shipped models.json. Providers only in your file are added; providers only in the shipped file are kept. (We don't deep-merge per-model fields — you redeclare the whole provider entry, which avoids "your override silently inherited new fields from a future package release" surprises.)

Example — switch the llama.cpp port and bump qwen3.6-35b-a3b to a 150K context, leave ollama untouched:

{
  "providers": {
    "llamacpp": {
      "api": "openai-completions",
      "baseUrl": "http://127.0.0.1:1234/v1",
      "apiKey": "LLAMACPP_API_KEY",
      "models": [
        {
          "id": "qwen3.6-35b-a3b",
          "name": "Qwen3.6-35B-A3B (local llama.cpp, 150K)",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 150000,
          "maxTokens": 4096,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Then verify with little-coder --list-models — you should see your overridden entry.

LLAMACPP_BASE_URL, OLLAMA_BASE_URL, and LMSTUDIO_BASE_URL env vars still beat both files for those three providers.

.pi/settings.json is a separate concern: it controls per-model profiles (context_limit, thinking_budget, temperature, benchmark_overrides) referenced by the <provider>/<id> key. Profiles don't register or describe models — they only tune how little-coder runs against models that are already registered.


Permissions

little-coder gates Bash tool calls against a built-in safe-prefix whitelist (ls, cat, head, tail, git log/status/diff, find, grep, cp, mv, mkdir, touch, etc.) before pi's own confirmation flow ever sees them. rm and sudo are intentionally not on the list — add them via LITTLE_CODER_BASH_ALLOW per deployment if you really need them.

Two env vars control the gate:

| Env var | Values | Effect | |---|---|---| | LITTLE_CODER_PERMISSION_MODE | auto (default) / accept-all / manual | auto: block any bash command not on the whitelist. accept-all: skip the gate entirely, every bash call passes (the benchmark runner sets this). manual: same as auto but with a different rejection message. | | LITTLE_CODER_BASH_ALLOW | comma-separated prefixes | Extra allow-prefixes merged with the built-in list. Trailing whitespace is meaningful: "make " allows make test but not makefoo; "make" allows both. |

Examples:

# Add 'make' (with word-boundary) and 'docker compose ps' on top of the defaults
export LITTLE_CODER_BASH_ALLOW="make ,docker compose ps"

# Skip the gate entirely (use this only inside controlled environments)
export LITTLE_CODER_PERMISSION_MODE=accept-all

Write/Edit confirmations are pi's responsibility; little-coder doesn't intercept those.


Paper / benchmark results

| Release | Model | Benchmark | Result | |---|---|---|---| | v0.0.2 (commit 1d62bde) — the paper | Qwen3.5-9B via Ollama | Aider Polyglot (225 exercises) | 45.56 % mean of two runs; matched-model vanilla Aider baseline 19.11 %. Paper: Honey, I Shrunk the Coding Agent on Substack. | | v0.0.5 — pre-pi Python | Qwen3.6-35B-A3B via llama.cpp | Aider Polyglot | 78.67 %. Full narrative. | | v0.1.4 — on pi | Qwen3.6-35B-A3B via llama.cpp | Terminal-Bench-Core v0.1.1 (80 tasks) | 40.0 % in 6 h 50 min. Write-up. | | v0.1.13 — on pi, TB 2.0 leaderboard | Qwen3.6-35B-A3B via llama.cpp | Terminal-Bench 2.0 (89 tasks × 5 trials = 445) | 24.6 % ± 3.2 — accepted to the Terminal-Bench 2.0 leaderboard (rank 120). | | v0.1.24 — on pi, TB 2.0 leaderboard, smaller model | Qwen3.5-9B (Q4_K_M) via llama.cpp (5.3 GB on GPU, 2× faster per-token than the 35B-A3B) | Terminal-Bench 2.0 (89 tasks × 5 trials = 445) | 9.2 % ± 2.4 — accepted to the Terminal-Bench 2.0 leaderboard (rank 142). | | v0.1.27 — on pi, GAIA validation | Qwen3.6-35B-A3B via llama.cpp | GAIA validation set (165 tasks) | 40.00 % (66 / 165). L1 60.4 % / L2 37.2 % / L3 7.7 %. Test-split run pending. |

All runs used a consumer laptop: i9-14900HX, 32 GB RAM, 8 GB VRAM on RTX 5070 Laptop (Blackwell). No cloud inference at any point.


Roadmap

Phase 1 — wide benchmark baseline: complete. The paper established that scaffold–model fit moves a 9.7 B model from 19 % to 45 % on Aider Polyglot, and the goal of Phase 1 was to find out how wide that impact radius is. We now have a four-benchmark baseline on a single laptop-class GPU:

  1. Aider Polyglot — 45.56 % (paper, Qwen3.5-9B) and 78.67 % (v0.0.5, Qwen3.6-35B-A3B).
  2. Terminal-Bench-Core v0.1.1 — 40.0 % (v0.1.4).
  3. Terminal-Bench 2.0 — accepted to the official leaderboard: Qwen3.6-35B-A3B at 24.6 % ± 3.2 (rank 120) and Qwen3.5-9B at 9.2 % ± 2.4 (rank 142). The v0.1.24 prompt-repetition fix (re-add tool descriptions + concision guideline, validated by a 4 / 4 pilot on the previously-regressing prove-plus-comm task) was the prompt for both submissions.
  4. GAIA — validation set at v0.1.27: 40.00 % (66 / 165) on Qwen3.6-35B-A3B. Per-level L1 60.4 % / L2 37.2 % / L3 7.7 %.

That spans short coding exercises (Polyglot), interactive shell-bound tasks (Terminal-Bench), and tool-using research (GAIA), all on the same scaffold. The data needed to choose what to fix next is now in hand.

Phase 2 — operating real knowledge bases as day-to-day work: the current focus. The motivating question shifts from how wide is the impact radius? to can a small local model reliably operate and traverse a large, messy knowledge base? little-coder's day-to-day target is now real work over many markdown files at once — reading, cross-referencing, and updating sprawling note/log collections in the most token-efficient way a small local model can manage. Features are being implemented and tested across several real pipelines in parallel:

  • Domains — medical, athletic, and educational knowledge bases, each with its own structure, vocabulary, and citation needs.
  • Scale — 10+ years of logs, tens of thousands of entries of varied kinds, stressing retrieval, compaction, and the context-budgeting extensions on histories far longer than any single benchmark task.
  • Messy real-world inputs — validation against conflicting OCR extractions of the same source, and multilingual content where the same fact recurs across languages.

This is where the scaffolding work now compounds: knowledge injection/selection, evidence handling, compaction fidelity, and the harness-intervention behaviors. Expect smaller, more frequent releases driven by what little-coder actually struggles with on this work rather than by a benchmark suite.

Benchmarks (deferred). The four-benchmark baseline above stands as the scaffold-fit reference point. Fresh runs — ProgramBench, SWE-bench Verified (multi-file real-world patches), a GAIA test split — come back into scope once the knowledge-base work has produced enough scaffolding signal to make a new measurement worth running.


Troubleshooting

little-coder: command not found — npm's global bin directory isn't on your PATH. Run npm config get prefix to see where it installed; add <prefix>/bin to your PATH. Or reinstall with sudo if your prefix needs root.

ECONNREFUSED 127.0.0.1:8888 — llama.cpp isn't running. Start llama-server first, or switch --model to an Ollama/cloud ID.

LAN client times out (no RST, just hangs) — the inference box's firewall is dropping the SYN. The usual cause is ufw with a default-deny policy that allow-lists only SSH / a few dev ports. From the server: sudo ufw status verbose to confirm; sudo ufw allow from <your-lan-subnet>/24 to any port 8888 proto tcp to fix (scoped to the LAN so you're not exposing the box). Docker-published ports bypass ufw via PREROUTING NAT, which is why a Docker container can be reachable while a plain llama-server on the same host isn't.

Image attachment is accepted but the request returns 4xx — your llama-server is running without a vision projector. Re-launch it with --mmproj ~/models/mmproj-F16.gguf (or another mmproj variant from the same GGUF repo). The --list-models images column reflects what the client will attempt to send, not what the server can answer; the projector is what gives the model eyes.

No API key env var warning — pi expects some key even for local providers. Export LLAMACPP_API_KEY=noop (or OLLAMA_API_KEY=noop) before launching.

No pi "Update Available" banner — that's intentional. little-coder defaults PI_SKIP_VERSION_CHECK=1 so the bundled pi runtime doesn't nag about updating itself; little-coder pins pi to a known-good version per release. If you actually want the banner back, export PI_SKIP_VERSION_CHECK=0 before launching.

Extension load failures on startup — run little-coder --list-models --verbose; extension errors surface there. If the install looks corrupt: npm uninstall -g little-coder && npm install -g little-coder.

Node version too old — little-coder needs Node ≥ 22.19.0 (matching the minimum of the bundled @earendil-works/pi-coding-agent v0.75+). Check with node --version. Easiest fix: nvm install 22 && nvm use 22.


Developing little-coder locally

If you want to hack on the extensions or skills:

git clone https://github.com/itayinbarr/little-coder.git
cd little-coder
npm install
npm link            # makes the local checkout available as `little-coder`
little-coder --model llamacpp/qwen3.6-35b-a3b

To unlink: npm unlink -g little-coder.

The benchmarks harness (benchmarks/) is dev-only and not shipped with the npm package. Run it from a clone with python3 benchmarks/aider_polyglot.py … etc.


Architecture

little-coder/
├── .pi/
│   ├── settings.json               # per-model profiles + benchmark_overrides (terminal_bench, gaia)
│   └── extensions/                 # 23 TypeScript extensions, auto-discovered by pi
│       ├── branding/               # little-coder startup header + terminal title (replaces pi's built-in)
│       ├── llama-cpp-provider/     # data-driven provider registration from models.json — ships llamacpp, ollama, lmstudio (+ user override file)
│       ├── write-guard/            # Write refuses on existing files; rewrites root-bare /foo.md paths to cwd
│       ├── read-guard/             # trims a Read that would overflow the context window to its first 30 lines + a search-instead directive
│       ├── extra-tools/            # glob, webfetch, websearch (pi ships grep/find)
│       ├── skill-inject/           # per-turn tool-skill selection (error > recency > intent)
│       ├── knowledge-inject/       # algorithm cheat-sheet scoring (word=1.0, bigram=2.0, threshold=2.0)
│       ├── output-parser/          # repair malformed ```tool, <tool_call>, bare JSON
│       ├── quality-monitor/        # empty / hallucinated / loop detection + correction follow-up
│       ├── thinking-budget/        # cap thinking tokens per turn, retry with thinking off
│       ├── permission-gate/        # bash whitelist (ls, cat, git log/status/diff, etc.)
│       ├── checkpoint/             # snapshot files before Write/Edit
│       ├── tool-gating/            # enforces _allowed_tools at exec + schema levels
│       ├── turn-cap/               # max_turns abort (Polyglot unbounded, TB 40, GAIA 30)
│       ├── benchmark-profiles/     # reads settings.json → systemPromptOptions + sets temperature
│       ├── shell-session/          # ShellSession[Cwd|Reset] — tmux-proxy + subprocess backends
│       ├── browser/                # Playwright BrowserNavigate/Click/Type/Scroll/Extract/Back/History
│       ├── evidence/               # EvidenceAdd/Get/List — per-session store, 1 KB snippet cap
│       └── evidence-compact/       # preserves evidence across pi's auto-compaction
├── skills/                         # 30 markdown files the extensions inject on demand
│   ├── tools/*.md                  #   14 tool-usage cards
│   ├── knowledge/*.md              #   13 algorithm cheat sheets
│   └── protocols/*.md              #    3 research/cite/decomposition workflows
├── benchmarks/
│   ├── rpc_client.py               # PiRpc — spawns `pi --mode rpc`, demuxes events + UI requests
│   ├── aider_polyglot.py           # Polyglot driver with per-language transforms
│   ├── tb_adapter/                 # Terminal-Bench 1.0 BaseAgent (tmux-proxy)
│   ├── harbor_adapter/             # Terminal-Bench 2.0 BaseAgent (async env.exec proxy)
│   ├── tb_pilot.sh / harbor_pilot.sh
│   ├── tb_status.sh / harbor_status.sh
│   └── test_rpc_client.py
├── AGENTS.md                       # project system prompt (pi discovers it automatically)
├── models.json                     # canonical provider registration (loaded by llama-cpp-provider; user override at $XDG_CONFIG_HOME/little-coder/models.json)
└── docs/
    ├── benchmark-*.md              # per-benchmark narratives
    └── architecture.md             # v0.0.5-era Python architecture (historical)

Key invariant. pi is a minimal base by design. Every little-coder mechanism ships as a pi extension that hooks pi's lifecycle events (before_agent_start, context, before_provider_request, tool_call, tool_result, turn_end, session_compact). Extensions are independent: the launcher discovers every .pi/extensions/*/index.ts and loads it explicitly with --extension, and pi runs with --no-extensions, so the bundled set is exactly what loads — no more, no less. If you don't want one, delete its directory; if you want to add another, drop it next to the existing ones (or pass -e <path> at launch).


Reproducing the paper (v0.0.2)

git clone https://github.com/itayinbarr/little-coder.git
cd little-coder
git checkout v0.0.2
# Follow that version's README for its Python setup (pip install -e .)

The paper ran ollama/qwen3.5 through the Python little-coder at commit 1d62bde (tag v0.0.2). The 45.56 % mean figure is the average of two full 225-exercise runs on that exact codebase. For the 78.67 % headline, check out tag v0.0.5 — both are pre-pi Python and follow the pre-pi setup.


Citation

@misc{inbar2026littlecoder,
  title        = {little-coder: A Coding Agent Optimized for Small Local Language Models},
  subtitle     = {Architectural Adaptation Lets a 9.7B Model Outperform Frontier Models on Aider Polyglot},
  author       = {Inbar, Itay},
  year         = {2026},
  month        = apr,
  howpublished = {\url{https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent}},
  note         = {White paper}
}

Attribution

little-coder v0.0.x was a derivative work of CheetahClaws / ClawSpring by SafeRL-Lab, Apache 2.0. That upstream provided the Python agent substrate, tool system, multi-provider support, and REPL.

little-coder v0.1.0+ replaces that substrate with pi by Mario Zechner — Apache 2.0 / MIT. The npm package was renamed from @mariozechner/pi-coding-agent to @earendil-works/pi-coding-agent in upstream's 0.74 release; little-coder v1.4.2+ ships with the new package. pi provides the agent loop, provider abstraction, TUI, and extension model. little-coder rebuilds its small-model adaptations on top of pi as extensions.

All little-coder-specific mechanisms — Write-vs-Edit invariant, skill / knowledge injection, thinking-budget cap, output-parser, quality-monitor, per-model profiles, per-benchmark overrides, ShellSession / Browser / Evidence tool families, evidence-aware compaction — are preserved across versions.


License

Apache 2.0 — see LICENSE for details. NOTICE tracks upstream attribution.